date:20170807

Re: [HACKERS] Adding hook in BufferSync for backup purposes

2017-08-07 Thread Andrey Borodin

Alvaro, Tom, thank you for your valuable comments.


> Alvaro:
> I remember discussing the topic of differential base-backups with
> somebody (probably Marco and Gabriele).  The idea we had was to have a
> new relation fork which stores an LSN for each group of pages,
> indicating the LSN of the newest change to those pages.  The backup tool
> "scans" the whole LSN fork, and grabs images of all pages that have LSNs
> newer than the one used for the previous base backup.
Thanks for the pointer, I’ve found the discussions and now I’m in a process of 
extraction of the knowledge from there

> I think it should be at the point where the buffer is
> modified (i.e. when WAL is written) rather than when it's checkpointed
> out.
WAL is flushed before actual pages are written to disk(sent to kernel). I’d 
like to notify extensions right after we exactly sure pages were flushed.
But you are right, BufferSync is not good place for this:
1. It lacks LSNs
2. It’s not the only place to flush: bgwriter goes through nearby function 
FlushBuffer() and many AMs write directly to smgr (for example when matapge is 
created)

BufferSync() seemed sooo comfortable and efficient place for flashing info on 
dirty pages, already sorted and grouped by tablespace, but it is absolutely 
incorrect to do it there. I’ll look for the better place.

> 
> 7 авг. 2017 г., в 18:37, Tom Lane  написал(а):
> 
> Yeah.  Keep in mind that if the extension does anything at all that could
> possibly throw an error, and if that error condition persists across
> multiple tries, you will have broken the database completely: it will
> be impossible to complete a checkpoint, and your WAL segment pool will
> grow until it exhausts disk.  So the idea of doing something that involves
> unspecified extension behavior, especially possible interaction with
> an external backup agent, right there is pretty terrifying.
I think that API for extensions should tend to protect developer from breaking 
everything, but may allow it with precaution warnings in docs and comments. 
Please let me know if this assumption is incorrect.

> 
> Other problems with the proposed patch: it misses coverage of
> BgBufferSync, and I don't like exposing an ad-hoc structure like
> CkptTsStatus as part of an extension API.  The algorithm used by
> BufferSync to schedule buffer writes has changed multiple times
> before and doubtless will again; if we're going to have a hook
> here it should depend as little as possible on those details.
OK, now I see that «buf_internals.h» had word internals for a reason. Thanks 
for pointing that out, I didn’t knew about changes in these algorithms.

Best regards, Andrey Borodin, Yandex.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-07 Thread Masahiko Sawada

On Thu, Aug 3, 2017 at 11:31 PM, Tom Lane  wrote:
> Masahiko Sawada  writes:
>> If we want to create other tables and load data to them as we want we
>> can do that using psql -f. So an alternative ways is having a flexible
>> style option for example --custom-initialize = { [load, create_pkey,
>> create_fkey, vacuum], ... }. That would solve this in a better way.
>
> FWIW, I like that significantly better than your original proposal.
> It'd allow people to execute parts of pgbench's standard initialization
> sequence and then do other things in between (in psql).  Realistically,
> that's probably about as much win as we need here --- if you're veering
> far enough away from the standard scenario that that doesn't do it for
> you, you might as well just write an all-custom setup script in psql.
>

Attached patch introduces --custom-initialize option that allows us to
specify the initialization step and its order. For example, If you
want to skip building primary keys you can specify
--custom-initialize="create_table, load_data, vacuum". Since each
custom initialization commands is invoked in specified order, for
example we also can create primary keys *before* loading data. The
data-generation is doing on client side, so progress information for
initialization is still supported.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

2017-08-07 Thread Amit Langote

On 2017/08/08 4:34, Robert Haas wrote:
> On Mon, Aug 7, 2017 at 2:54 AM, Amit Langote
>  wrote:
>> As long as find_all_inheritors() is a place only to determine the order in
>> which partitions will be locked, it's fine.  My concern is about the time
>> of actual locking, which in the current planner implementation is too soon
>> that we end up needlessly locking all the partitions.
> 
> I don't think avoiding that problem is going to be easy.  We need a
> bunch of per-relation information, like the size of each relation, and
> what indexes it has, and how big they are, and the statistics for each
> one.  It was at one point proposed by someone that every partition
> should be required to have the same indexes, but (1) we didn't
> implement it like that and (2) if we had done that it wouldn't solve
> this problem anyway because the sizes are still going to vary.

Sorry, I didn't mean to say we shouldn't lock and open partitions at all.
We do need their relation descriptors for planning and there is no doubt
about that.  I was just saying that we should do that only for the
partitions that are not pruned.  But, as you say, I can see that the
planner changes required to be able to do that might be hard.

>> The locking-partitions-too-soon issue, I think, is an important one and
>> ISTM, we'd want to lock the partitions after we've determined the specific
>> ones a query needs to scan using the information returned by
>> RelationGetPartitionDispatchInfo.  That means the latter had better locked
>> the relations whose cached partition descriptors will be used to determine
>> the result that it produces.  One way to do that might be to lock all the
>> tables in the list returned by find_all_inheritors that are partitioned
>> tables before calling RelationGetPartitionDispatchInfo.  It seems what the
>> approach you've outlined below will let us do that.
> 
> Yeah, I think so.  I think we could possibly open and lock partitioned
> children only, then prune away leaf partitions that we can determine
> aren't needed, then open and lock the leaf partitions that are needed.

Yes.

>> BTW, IIUC, there will be two lists of OIDs we'll  have: one in the
>> find_all_inheritors order, say, L1 and the other determined by using
>> partitioning-specific information for the given query, say L2.
>>
>> To lock, we iterate L1 and if a given member is in L2, we lock it.  It
>> might be possible to make it as cheap as O(nlogn).
> 
> Commonly, we'll prune no partitions or all but one; and we should be
> able to make those cases very fast.

Agreed.

>> Maybe, we can make the initial patch use syscache to get the relkind for a
>> given child.  If the syscache bloat is unbearable, we go with the
>> denormalization approach.
> 
> Yeah.  Maybe if you write that patch, you can also test it to see how
> bad the bloat is.

I will try and see, but maybe the syscache solution doesn't get us past
the proof-of-concept stage.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-08-07 Thread Amit Kapila

On Tue, Jun 13, 2017 at 7:20 AM, Haribabu Kommi
 wrote:
>
>
> On Fri, Oct 14, 2016 at 7:26 AM, Alvaro Herrera 
> wrote:
>>
>> I have sent the partial patch I have to Hari Babu Kommi.  We expect that
>> he will be able to further this goal some more.
>
>
> Thanks Alvaro for sharing your development patch.
>
> Most of the patch design is same as described by Alvaro in the first mail
> [1].
> I will detail the modifications, pending items and open items (needs
> discussion)
> to implement proper pluggable storage.
>
> Here I attached WIP patches to support pluggable storage. The patch series
> are may not work individually. Still so many things are under development.
> These patches are just to share the approach of the current development.
>

+typedef struct StorageAmRoutine
+{

In this structure, you have already covered most of the API's that a
new storage module needs to provide, but I think there could be more.
One such API could be heap_hot_search.  This seems specific to current
heap where we have the provision of HOT.  I think we can provide a new
API tuple_search or something like that.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Notice message of ALTER SUBSCRIPTION ... RERESH PUBLICATIION

2017-08-07 Thread Yugo Nagata

On Mon, 7 Aug 2017 09:46:56 -0400
Peter Eisentraut  wrote:

> On 7/27/17 20:51, Yugo Nagata wrote:
> > When we run ALTER SUBSCRIPTION ... REFRESH PUBLICATION and there is 
> > an unkown table at local, it says;
> > 
> >  NOTICE:  added subscription for table public.tbl
> > 
> > I feel this a bit misleading because it says a subscription is added
> > but actually new subscription object is not created. Of cause, I can
> > understand the intention this message says, but I feel that
> > it is better to use other expression. For example, how about saying
> > 
> >  NOTICE:  table public.tbl is added to subscription sub1
> > 
> > as proposed in the attached patch. This also fixes the message
> > when a table is removed from a subscription. 
> 
> Fixed, thanks.  (Note that per another discussion, these are now DEBUG
> messages.)

Thanks, too.

> 
> -- 
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

2017-08-07 Thread Amit Langote

On 2017/08/07 14:37, Amit Khandekar wrote:
> On 4 August 2017 at 22:55, Robert Haas  wrote:
>> P.S. While I haven't reviewed 0002 in detail, I think the concept of
>> minimizing what needs to be built in RelationGetPartitionDispatchInfo
>> is a very good idea.
> 
> True. I also think, RelationGetPartitionDispatchInfo () should be
> called while preparing the ModifyTable plan; the PartitionDispatch
> data structure returned by RelationGetPartitionDispatchInfo() should
> be stored in that plan, and then the execution-time fields in
> PartitionDispatch would be populated in ExecInitModifyTable().

I'm not sure if we could ever store the PartitionDispatch structure itself
in the plan.

Planner would build and use it to put the leaf partition sub-plans in the
canonical order in the resulting plan (Append, ModifyTable, etc.).
Executor will have to rebuild the PartitionDispatch info again if and when
it needs the same (for example, in ExecSetupPartitionTupleRouting for
insert or update tuple routing).

The refactoring patch that I've proposed (0002) makes PartitionDispatch
structure itself contain a lot less information/state than it currently
does.  So RelationGetPartitionDispatchInfo's job per the revised patch is
to reveal the partition tree structure and the information of each
partitioned table that the tree contains.  The original design whereby it
builds and puts into PartitionDispatchData thing like partition key
execution state (ExprState), TupleTableSlot, TupleConversionMap seems
wrong to me in retrospect and we should somehow revise it.  Those things I
mentioned are only needed for tuple-routing, so they should be built and
managed by the executor, not partition.c.  Any feedback on the proposed
patch is welcome. :)

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-07 Thread Noah Misch

On Sun, Aug 06, 2017 at 08:50:37AM -0700, Noah Misch wrote:
> On Sun, Aug 06, 2017 at 11:17:57AM -0400, Tom Lane wrote:
> > Noah Misch  writes:
> > > I've added this as an open item.  Confirmed in this setup:
> > 
> > > -- Client
> > > Windows Server 2016
> > > postgresql-10.0-beta2-windows-x64-binaries.zip from EnterpriseDB
> > 
> > I wonder whether the other complainants were using EDB's build,
> > and if not, just what were they using.  The indirect question is:
> > what version of OpenSSL is the Windows build using?
> 
> Those binaries I used have OpenSSL 1.0.2l.
> 
> > > I don't, however, see a smoking gun among commits.  Would you bisect the
> > > commits since 9.6 and see which one broke things?
> > 
> > Gut instinct says that the reason this case fails when other tools
> > can connect successfully is that libpqwalreceiver is the only tool
> > that uses PQconnectStart/PQconnectPoll rather than a plain
> > PQconnectdb, and that there is some behavioral difference between
> > connectDBComplete's wait loop and libpqrcv_connect's wait loop that
> 
> That would fit.  Until v10 (commit 1e8a850), PQconnectStart() had no in-tree
> callers outside of libpq itself.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash

2017-08-07 Thread Noah Misch

On Mon, Aug 07, 2017 at 05:29:34PM +1200, Thomas Munro wrote:
> On Thu, Aug 3, 2017 at 3:03 AM, Robert Haas  wrote:
> > On Fri, Jul 21, 2017 at 1:31 AM, Thomas Munro
> >  wrote:
> >> Thanks Neha.  It's be best to post the back trace and if possible
> >> print oldestXact and ShmemVariableCache->oldestXid from the stack
> >> frame for TruncateCLOG.
> >>
> >> The failing assertion in TruncateCLOG() has a comment that says
> >> "vac_truncate_clog already advanced oldestXid", but vac_truncate_clog
> >> calls SetTransactionIdLimit() to write ShmemVariableCache->oldestXid
> >> *after* it calls TruncateCLOG().  What am I missing here?
> >
> > This problem was introduced by commit
> > ea42cc18c35381f639d45628d792e790ff39e271, so this should be added to
> > the PostgreSQL 10 open items list. That commit intended to introduce a
> > distinction between (1) the oldest XID that can be safely examined and
> > (2) the oldest XID that can't yet be safely reused.  These are the
> > same except when we're in the middle of truncating CLOG: (1) advances
> > before the truncation, and (2) advances afterwards. That's why
> > AdvanceOldestClogXid() happens before truncation proper and
> > SetTransactionIdLimit() happens afterwards, and changing the order
> > would, I think, be quite wrong.
> 
> Added to open items.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-07 Thread Peter Geoghegan

On Mon, Aug 7, 2017 at 3:23 PM, Tom Lane  wrote:
> The thing that I'm particularly thinking about is that if someone wants
> an ICU variant collation that we didn't make initdb provide, they'll do
> a CREATE COLLATION and go use it.  At update time, pg_dump or pg_upgrade
> will export/import that via CREATE COLLATION, and the only way it fails
> is if ICU rejects the collation name as garbage.  (Which, as we already
> established upthread, it's quite unlikely to do.)

Actually, it's *impossible* for ICU to fail to accept any string as a
valid locale within CREATE COLLATION, because CollationCreate() simply
doesn't sanitize ICU names. It doesn't do something like call
get_icu_language_tag(), unlike initdb (within
pg_import_system_collations()).

If I add such a test to CollationCreate(), it does a reasonable job of
sanitizing, while preserving the spirit of the BCP 47 language tag
format by not assuming that the user didn't specify a brand new locale
that it hasn't heard of. All of these are accepted with unmodified
master:

postgres=# CREATE COLLATION test1 (provider = icu, locale = 'en-x-icu');
CREATE COLLATION
postgres=# CREATE COLLATION test2 (provider = icu, locale = 'foo bar baz');
ERROR:  XX000: could not convert locale name "foo bar baz" to language
tag: U_ILLEGAL_ARGUMENT_ERROR
LOCATION:  get_icu_language_tag, collationcmds.c:454
postgres=# CREATE COLLATION test3 (provider = icu, locale = 'en-gb-icu');
ERROR:  XX000: could not convert locale name "en-gb-icu" to language
tag: U_ILLEGAL_ARGUMENT_ERROR
LOCATION:  get_icu_language_tag, collationcmds.c:454
postgres=# CREATE COLLATION test4 (provider = icu, locale = 'not-a-country');
CREATE COLLATION

If it's mandatory for get_icu_language_tag() to not throw an error
during initdb import when passed strings like these (that are
generated mechanically), why should we not do the same with CREATE
COLLATION? While the choice to preserve BCP 47's tolerance of missing
collations is debatable, not doing at least this much up-front is a
bug IMV.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Subscription code improvements

2017-08-07 Thread Masahiko Sawada

On Mon, Aug 7, 2017 at 10:22 PM, Peter Eisentraut
 wrote:
> On 8/7/17 00:27, Masahiko Sawada wrote:
 However, even with this patch there is still an issue that NOTICE
 messages "removed subscription for table public.t1" can be appeared
 even if we rollback ALTER SUBSCRIPTION REFRESH command as I mentioned
 on earlier thread. Since I think this behaviour will confuse users who
 check server logs I'd like to deal with it, I don't have a good idea
 though.
>>>
>>> Maybe we can just remove those messages?
>>>
>>> We don't get messages when we create a subscription about which tables
>>> are part of it.  So why do we need such messages when we refresh a
>>> subscription?
>>
>> I think that the messages is useful when we add/remove tables to/from
>> the publication and then refresh the subscription, so we might want to
>> change it to DEBUG rather than elimination.
>
> Good idea.  Done that way.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More race conditions in logical replication

2017-08-07 Thread Robert Haas

On Mon, Aug 7, 2017 at 8:14 PM, Alvaro Herrera  wrote:
> BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait
> event here (and in the replication slot case) is bogus.  We probably
> need something new here.

Yeah, if you're adding a new wait point, you should add document a new
constant in the appropriate section, probably something under
PG_WAIT_IPC in this case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More race conditions in logical replication

2017-08-07 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Alvaro Herrera wrote:

> > I just noticed that Jacana failed again in the subscription test, and it
> > looks like it's related to this.  I'll take a look tomorrow if no one
> > beats me to it.
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2017-07-26%2014%3A39%3A54
> 
> Ahh, I misread the message.  This one is actually about the replication
> *origin* being still active, not the replication *slot*.  I suppose
> origins need the same treatment as we just did for slots.  Anybody wants
> to volunteer a patch?

So here's a patch.  I was able to reproduce the issue locally by adding
a couple of sleeps, just like Tom did in the case of replication slots.
With this patch it seems the problem is solved.  The mechanism is the
same as Petr used to fix replication origins -- if an origin is busy,
sleep on the CV until someone else signals us; and each time the
acquirer PID changes, broadcast a signal.  Like before, this is likely
to be a problem in older branches too, but we have no CVs to backpatch
this.

BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait
event here (and in the replication slot case) is bogus.  We probably
need something new here.

I found four instances of this problem in buildfarm,
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2017-07-26%2014%3A39%3A54
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2017-07-28%2021%3A00%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2017-07-31%2007%3A03%3A20
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar=2017-08-04%2004%3A34%3A04

Jacana only stopped having it because it broke for unrelated reasons.  I
bet we'll see another failure soon ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 3593712791..ae40f7164d 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -939,7 +939,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool 
isTopLevel)
snprintf(originname, sizeof(originname), "pg_%u", subid);
originid = replorigin_by_name(originname, true);
if (originid != InvalidRepOriginId)
-   replorigin_drop(originid);
+   replorigin_drop(originid, false);
 
/*
 * If there is no slot associated with the subscription, we can finish
diff --git a/src/backend/replication/logical/origin.c 
b/src/backend/replication/logical/origin.c
index 1c665312a4..4f32e7861c 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -79,15 +79,15 @@
 #include "access/xact.h"
 
 #include "catalog/indexing.h"
-
 #include "nodes/execnodes.h"
 
 #include "replication/origin.h"
 #include "replication/logical.h"
-
+#include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
+#include "storage/condition_variable.h"
 #include "storage/copydir.h"
 
 #include "utils/builtins.h"
@@ -125,6 +125,11 @@ typedef struct ReplicationState
int acquired_by;
 
/*
+* Condition variable that's signalled when acquired_by changes.
+*/
+   ConditionVariable origin_cv;
+
+   /*
 * Lock protecting remote_lsn and local_lsn.
 */
LWLock  lock;
@@ -324,9 +329,9 @@ replorigin_create(char *roname)
  * Needs to be called in a transaction.
  */
 void
-replorigin_drop(RepOriginId roident)
+replorigin_drop(RepOriginId roident, bool nowait)
 {
-   HeapTuple   tuple = NULL;
+   HeapTuple   tuple;
Relationrel;
int i;
 
@@ -334,6 +339,8 @@ replorigin_drop(RepOriginId roident)
 
rel = heap_open(ReplicationOriginRelationId, ExclusiveLock);
 
+restart:
+   tuple = NULL;
/* cleanup the slot state info */
LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE);
 
@@ -346,11 +353,21 @@ replorigin_drop(RepOriginId roident)
{
if (state->acquired_by != 0)
{
-   ereport(ERROR,
-   (errcode(ERRCODE_OBJECT_IN_USE),
-errmsg("could not drop 
replication origin with OID %d, in use by PID %d",
-   state->roident,
-   
state->acquired_by)));
+   ConditionVariable  *cv;
+
+   if (nowait)
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_IN_USE),
+errmsg("could not drop

Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-07 Thread Noah Misch

On Wed, Apr 05, 2017 at 10:53:39PM +0200, Pavel Stehule wrote:
> 2017-03-17 4:23 GMT+01:00 Noah Misch :
> > On Sun, Mar 12, 2017 at 10:26:33PM +0100, Pavel Stehule wrote:
> > > 2017-03-12 21:57 GMT+01:00 Noah Misch :
> > > > On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote:
> > > > > 2017-03-12 0:56 GMT+01:00 Noah Misch :
> > > > Please add a test case.
> 
> I am sending test case.
> 
> I am afraid so this cannot be part of regress tests due strong dependency
> on locale win1250.

Right.  Based on that, I've distilled this portable test case:

-- Round-trip non-ASCII data through xpath().
DO $$
DECLARE
xml_declaration text := '';
degree_symbol text;
res xml[];
BEGIN
-- Per the documentation, xpath() doesn't work on non-ASCII data when
-- the server encoding is not UTF8.  The EXCEPTION block below,
-- currently dead code, will be relevant if we remove this limitation.
IF current_setting('server_encoding') <> 'UTF8' THEN
RAISE LOG 'skip: encoding % unsupported for xml',
current_setting('server_encoding');
RETURN;
END IF;

degree_symbol := convert_from('\xc2b0', 'UTF8');
res := xpath('text()', (xml_declaration ||
'' || degree_symbol || '')::xml);
IF degree_symbol <> res[1]::text THEN
RAISE 'expected % (%), got % (%)',
degree_symbol, convert_to(degree_symbol, 'UTF8'),
res[1], convert_to(res[1]::text, 'UTF8');
END IF;
EXCEPTION
-- character with byte sequence 0xc2 0xb0 in encoding "UTF8" has no 
equivalent in encoding "LATIN8"
WHEN untranslatable_character THEN RAISE LOG 'skip: %', SQLERRM;
-- default conversion function for encoding "UTF8" to "MULE_INTERNAL" 
does not exist
WHEN undefined_function THEN RAISE LOG 'skip: %', SQLERRM;
END
$$;

> > > > Why not use xml_parse() instead of calling xmlCtxtReadMemory()
> > > > directly?  The answer is probably in the archives, because someone
> > > > understood the problem enough to document "Some XML-related functions
> > > > may not work at all on non-ASCII data when the server encoding is not
> > > > UTF-8. This is known to be an issue for xpath() in particular."
> > >
> > > Probably there are two possible issues
> >
> > Would you research in the archives to confirm?
> >
> > > 1. what I touched - recv function does encoding to database encoding -
> > > but document encoding is not updated.
> >
> > Using xml_parse() would fix that, right?
> >
> 
> It should to help, because it cuts a header - but it does little bit more
> work, than we would - it check if xml is doc or not too.

That's true.  xpath() currently works on both DOCUMENT and CONTENT xml values.
If xpath() used xml_parse(), this would stop working:

  SELECT xpath('text()', XMLPARSE (DOCUMENT 'bar'));

> One possible fix - and similar technique is used more times in xml.c code
> .. xmlroot

> +   /* remove header */
> +   parse_xml_decl(string, _len, NULL, NULL, NULL);

> -   doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
> +   doc = xmlCtxtReadMemory(ctxt, (char *) string + header_len, len -

> another possibility is using xml_out_internal - that is used in XMLTABLE
> function - what was initial fix.
> 
> xml_out_internal uses parse_xml_decl too, but it is little bit more
> expensive due call print_xml_decl that has not any effect in this case
> (where only encoding is interesting) - but it can has sense, when server
> encoding is not UTF8, because in this case, xmlstr is not encoded to UTF8 -
> so now, I am think the correct solution should be using xml_out_internal -
> because it appends a header with server encoding to XML doc

As you may have been implying here, your patch never adds an xml declaration
that bears an encoding.  (That is because it passes targetencoding=0 to
xml_out_internal().)  If we were going to fix xpath() to support non-ASCII
characters in non-UTF8 databases, we would not do that by adding xml
declarations.  We would do our own conversion to UTF8, like xml_parse() does.
In that light, I like this parse_xml_decl() approach better.  Would you update
your patch to use it and to add the test case I give above?


This change can make things worse in a non-UTF8 database.  The following
happens to work today in a LATIN1 database, but it will cease to work:

SELECT xpath('string-length()', ('' 
||
'' || chr(176) || '')::xml);

However, extracting actual text already gives wrong answers, because we treat
the UTF8 response from libxml2 as though it were LATIN1:

SELECT xpath('string()', ('' ||
'' || chr(176) || '')::xml);

I plan to back-patch this.  The documentation says "Some XML-related functions
may not work at all on non-ASCII data when the server encoding is not
UTF-8. This is known to be an

Re: [HACKERS] ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-07 Thread Tom Lane

Peter Eisentraut  writes:
> On 8/6/17 20:07, Peter Geoghegan wrote:
>> I've looked into this. I'll give an example of what keyword variants
>> there are for Greek, and then discuss what I think each is.

> I'm not sure why we want to get into editorializing this.  We query ICU
> for the names of distinct collations and use that.  It's more than most
> people need, sure, but it doesn't cost us anything.

Yes, *it does*.  The cost will be borne by users who get screwed at update
time, not by developers, but that doesn't make it insignificant.

> The alternatives are hand-maintaining a list of collations, or
> installing no collations by default.  Both of those are arguably worse
> for users or for future code maintenance or both.

I'm not (yet) convinced that we need a hand-maintained whitelist.  But
I am wondering why we're expending extra code to import keyword variants.
Who is that catering to, really?

The thing that I'm particularly thinking about is that if someone wants
an ICU variant collation that we didn't make initdb provide, they'll do
a CREATE COLLATION and go use it.  At update time, pg_dump or pg_upgrade
will export/import that via CREATE COLLATION, and the only way it fails
is if ICU rejects the collation name as garbage.  (Which, as we already
established upthread, it's quite unlikely to do.)  On the other hand,
if someone relies on an ICU variant collation that initdb did import,
and then in the next release that collation doesn't get imported because
ICU changed their minds on what to advertise, the update situation is not
pretty at all.  Certainly it won't get handled transparently.  This line
of thinking leads me to believe that we ought to be pretty conservative
about what we import during initdb.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-07 Thread Peter Geoghegan

On Mon, Aug 7, 2017 at 2:50 PM, Peter Eisentraut
 wrote:
> On 8/6/17 20:07, Peter Geoghegan wrote:
>> I've looked into this. I'll give an example of what keyword variants
>> there are for Greek, and then discuss what I think each is.
>
> I'm not sure why we want to get into editorializing this.  We query ICU
> for the names of distinct collations and use that.

We ask ucol_getKeywordValuesForLocale() to get only "commonly used
[variant] values with the given locale" within
pg_import_system_collations(). So the editorializing has already
begun.

> It's more than most
> people need, sure, but it doesn't cost us anything.

It's also *less* than what other users need. I disagree on the cost of
redundancy among entries after initdb. It's just confusing to users,
and seems avoidable without adding special case logic. What's the
difference between el-u-co-standard-x-icu and el-x-icu?

> The alternatives
> are hand-maintaining a list of collations, or installing no collations
> by default.

A better alternative would be to actively take an interest in what
collations are created, by further refining the rules by which they
are created. We have a stable API, described by various standards,
that we can work with for this. This doesn't have to be a
maintainability burden. We can provide general guidance about how to
add stuff back within documentation.

I do think that we should actually list all the collations that are
available by default on some representative ICU version, once that
list is tightened up, just as other database systems list them. That
necessitates a little weasel wording that notes that later ICU
versions might add more, but that's not a problem IMV. I don't think
that CLDR will ever omit anything previously available, at least
within a reasonable timeframe [1].

[1] http://cldr.unicode.org/index/process/cldr-data-retention-policy
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_files_per_processes vs others uses of file descriptors

2017-08-07 Thread Tom Lane

Andres Freund  writes:
> On 2017-08-07 17:30:13 -0400, Tom Lane wrote:
>> Meh.  The lack of field complaints about this doesn't indicate to me that
>> we have a huge problem, and in any case, just increasing NUM_RESERVED_FDS
>> would do nothing for the system-wide limits.

> Howso? Via count_usable_fds() we test for max_files_per_process /
> RLIMIT_NOFILE fds, and *then* subtract NUM_RESERVED_FDS.

The limit I'm worried about is the kernel's overall FD table size limit
(ENFILE failures), not the per-process limit.  PG has a well-known
propensity for eating the entire kernel table under heavy load.  We
wouldn't ever have bothered with those retry loops otherwise.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-07 Thread Peter Eisentraut

On 8/6/17 20:07, Peter Geoghegan wrote:
> I've looked into this. I'll give an example of what keyword variants
> there are for Greek, and then discuss what I think each is.

I'm not sure why we want to get into editorializing this.  We query ICU
for the names of distinct collations and use that.  It's more than most
people need, sure, but it doesn't cost us anything.  The alternatives
are hand-maintaining a list of collations, or installing no collations
by default.  Both of those are arguably worse for users or for future
code maintenance or both.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_files_per_processes vs others uses of file descriptors

2017-08-07 Thread Andres Freund

Hi,

On 2017-08-07 17:30:13 -0400, Tom Lane wrote:
> Meh.  The lack of field complaints about this doesn't indicate to me that
> we have a huge problem, and in any case, just increasing NUM_RESERVED_FDS
> would do nothing for the system-wide limits.

Howso? Via count_usable_fds() we test for max_files_per_process /
RLIMIT_NOFILE fds, and *then* subtract NUM_RESERVED_FDS.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_files_per_processes vs others uses of file descriptors

2017-08-07 Thread Tom Lane

Andres Freund  writes:
> On 2017-08-07 17:05:06 -0400, Tom Lane wrote:
>> Probably the best we can hope for there is to have fd.c provide a function
>> "close an FD please", which postgres_fdw could call if libpq fails because
>> of ENFILE/EMFILE, and then retry.

> Unless that takes up a slot in fd.c while in use, that'll still leave us
> open to failures to open files in some critical parts, unless I miss
> something.

Well, there's always a race condition there, in that someone else can
eat the kernel FD as soon as you free it.  That's why we do this in a
retry loop.

> And then we'd have to teach similar things to PLs etc.  I agree that
> having some more slop isn't a proper solution, but only having ~30 fds
> as slop on the most common systems seems mightily small.

Meh.  The lack of field complaints about this doesn't indicate to me that
we have a huge problem, and in any case, just increasing NUM_RESERVED_FDS
would do nothing for the system-wide limits.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_files_per_processes vs others uses of file descriptors

2017-08-07 Thread Peter Geoghegan

On Mon, Aug 7, 2017 at 1:40 PM, Andres Freund  wrote:
> Given how close max_files_per_process is to the default linux limit of
> 1024 fds, I wonder if we shouldn't increase NUM_RESERVED_FDS by quite a
> bit?

Personally, any time I've seen a problem with this it was because an
extension leaked FDs, which is always going to fail in the end. The
extension leaked FDs because it didn't fully buy into using Postgres
resource managers, perhaps only in a subtle way. I find it hard to
imagine an extension author explicitly relying on any particular
amount of slop for FDs.

Is this specifically about postgres_fdw, or is there some other
specific problem you have in mind, that this would help solve?

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_files_per_processes vs others uses of file descriptors

2017-08-07 Thread Andres Freund

On 2017-08-07 17:05:06 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-08-07 16:52:42 -0400, Tom Lane wrote:
> >> No, I don't think so.  If you're depending on the NUM_RESERVED_FDS
> >> headroom for anything meaningful, *you're doing it wrong*.  You should be
> >> getting an FD via fd.c, so that there is an opportunity to free up an FD
> >> (by closing a VFD) if you're up against system limits.  Relying on
> >> NUM_RESERVED_FDS headroom can only protect against EMFILE not ENFILE.
> 
> > How would this work for libpq based stuff like postgres fdw? Or some
> > random PL doing something with files? There's very little headroom here.
> 
> Probably the best we can hope for there is to have fd.c provide a function
> "close an FD please", which postgres_fdw could call if libpq fails because
> of ENFILE/EMFILE, and then retry.

Unless that takes up a slot in fd.c while in use, that'll still leave us
open to failures to open files in some critical parts, unless I miss
something.

And then we'd have to teach similar things to PLs etc.  I agree that
having some more slop isn't a proper solution, but only having ~30 fds
as slop on the most common systems seems mightily small.


> (Though I'm unsure how reliably postgres_fdw can detect that failure
> reason right now --- I don't know that we preserve errno on the way
> out of PQconnect.)

Yea, probably not really...


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_files_per_processes vs others uses of file descriptors

2017-08-07 Thread Tom Lane

Andres Freund  writes:
> On 2017-08-07 16:52:42 -0400, Tom Lane wrote:
>> No, I don't think so.  If you're depending on the NUM_RESERVED_FDS
>> headroom for anything meaningful, *you're doing it wrong*.  You should be
>> getting an FD via fd.c, so that there is an opportunity to free up an FD
>> (by closing a VFD) if you're up against system limits.  Relying on
>> NUM_RESERVED_FDS headroom can only protect against EMFILE not ENFILE.

> How would this work for libpq based stuff like postgres fdw? Or some
> random PL doing something with files? There's very little headroom here.

Probably the best we can hope for there is to have fd.c provide a function
"close an FD please", which postgres_fdw could call if libpq fails because
of ENFILE/EMFILE, and then retry.  (Though I'm unsure how reliably
postgres_fdw can detect that failure reason right now --- I don't know
that we preserve errno on the way out of PQconnect.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_files_per_processes vs others uses of file descriptors

2017-08-07 Thread Andres Freund

On 2017-08-07 16:52:42 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > These days there's a number of other consumers of
> > fds. E.g. postgres_fdw, epoll, ...  All these aren't accounted for by
> > fd.c.
> 
> > Given how close max_files_per_process is to the default linux limit of
> > 1024 fds, I wonder if we shouldn't increase NUM_RESERVED_FDS by quite a
> > bit?
> 
> No, I don't think so.  If you're depending on the NUM_RESERVED_FDS
> headroom for anything meaningful, *you're doing it wrong*.  You should be
> getting an FD via fd.c, so that there is an opportunity to free up an FD
> (by closing a VFD) if you're up against system limits.  Relying on
> NUM_RESERVED_FDS headroom can only protect against EMFILE not ENFILE.

How would this work for libpq based stuff like postgres fdw? Or some
random PL doing something with files? There's very little headroom here.


> What this means is that the epoll stuff needs to be tied into fd.c more
> than it is now, but that's likely a good thing anyway; it would for
> example provide a more robust way of ensuring we don't leak epoll FDs at
> transaction abort.

Not arguing against that.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_files_per_processes vs others uses of file descriptors

2017-08-07 Thread Tom Lane

Andres Freund  writes:
> These days there's a number of other consumers of
> fds. E.g. postgres_fdw, epoll, ...  All these aren't accounted for by
> fd.c.

> Given how close max_files_per_process is to the default linux limit of
> 1024 fds, I wonder if we shouldn't increase NUM_RESERVED_FDS by quite a
> bit?

No, I don't think so.  If you're depending on the NUM_RESERVED_FDS
headroom for anything meaningful, *you're doing it wrong*.  You should be
getting an FD via fd.c, so that there is an opportunity to free up an FD
(by closing a VFD) if you're up against system limits.  Relying on
NUM_RESERVED_FDS headroom can only protect against EMFILE not ENFILE.

What this means is that the epoll stuff needs to be tied into fd.c more
than it is now, but that's likely a good thing anyway; it would for
example provide a more robust way of ensuring we don't leak epoll FDs at
transaction abort.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] max_files_per_processes vs others uses of file descriptors

2017-08-07 Thread Andres Freund

Hi,

currently the default max_files_per_process is 1000. fd.c uses close to
that many (- NUM_RESERVED_FDS/10). count_usable_fds() makes sure that at
server start there's at most that many fds available, but that doesn't
mean that much for runtime.

These days there's a number of other consumers of
fds. E.g. postgres_fdw, epoll, ...  All these aren't accounted for by
fd.c.

Given how close max_files_per_process is to the default linux limit of
1024 fds, I wonder if we shouldn't increase NUM_RESERVED_FDS by quite a
bit?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Record full paths of programs sought by "configure".

2017-08-07 Thread Andrew Dunstan



On 08/07/2017 04:20 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 08/07/2017 04:07 PM, Tom Lane wrote:
>>> Sorry, I was imprecise.  What I'm suggesting is that you drop the
>>> runtime PATH-foolery and instead put this in configure's environment:
>>>
>>> PROVE=$perlpathdir/prove
>>>
>>> Otherwise you're basically lying to configure about what you're going
>>> to use, and that's always going to break eventually.
>> Hmm, you're saying this should work now? OK, I'll try it when I get a
>> minute to spare.
> I'm pretty sure it's always worked, at least in the sense that you could
> override what configure would put into Makefile.global that way.  I'm not
> quite clear on whether providing an exact path to "prove" there is enough
> to fix your problem.  If we have any places where we need to invoke the
> corresponding version of "perl", then we have more things to fix.


I'm pretty sure that's all we need. But we can find out ;-)


>
>>> Hm, yeah, the IPC::Run test would need to deal with this as well.
>>> A PROVE_PERL environment variable is one way.  Or maybe simpler,
>>> just skip the probe for IPC::Run if PROVE has been specified
>>> externally; assume the user knows what he's doing in that case.
>> WFM
> OK, I'll go make that happen.
>
>   



OK, thanks.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Record full paths of programs sought by "configure".

2017-08-07 Thread Tom Lane

Andrew Dunstan  writes:
> On 08/07/2017 04:07 PM, Tom Lane wrote:
>> Sorry, I was imprecise.  What I'm suggesting is that you drop the
>> runtime PATH-foolery and instead put this in configure's environment:
>> 
>> PROVE=$perlpathdir/prove
>> 
>> Otherwise you're basically lying to configure about what you're going
>> to use, and that's always going to break eventually.

> Hmm, you're saying this should work now? OK, I'll try it when I get a
> minute to spare.

I'm pretty sure it's always worked, at least in the sense that you could
override what configure would put into Makefile.global that way.  I'm not
quite clear on whether providing an exact path to "prove" there is enough
to fix your problem.  If we have any places where we need to invoke the
corresponding version of "perl", then we have more things to fix.

>> Hm, yeah, the IPC::Run test would need to deal with this as well.
>> A PROVE_PERL environment variable is one way.  Or maybe simpler,
>> just skip the probe for IPC::Run if PROVE has been specified
>> externally; assume the user knows what he's doing in that case.

> WFM

OK, I'll go make that happen.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Record full paths of programs sought by "configure".

2017-08-07 Thread Andrew Dunstan



On 08/07/2017 04:07 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 08/07/2017 03:36 PM, Tom Lane wrote:
>>> My goodness, that's ugly.  Is it really better than injecting
>>> "PROVE=prove"?  (I'd suggest saying that to configure, not make,
>>> so that the configure log bears some resemblance to what you
>>> want done.)
>> This is what we had to do BEFORE the change in this commit. Now it's no
>> longer sufficient.
> Sorry, I was imprecise.  What I'm suggesting is that you drop the
> runtime PATH-foolery and instead put this in configure's environment:
>
> PROVE=$perlpathdir/prove
>
> Otherwise you're basically lying to configure about what you're going
> to use, and that's always going to break eventually.



Hmm, you're saying this should work now? OK, I'll try it when I get a
minute to spare.


>
>> It would certainly be better if we could tell configure a path to prove
>> and a path to the perl we need to test IPC::Run against.
> Hm, yeah, the IPC::Run test would need to deal with this as well.
> A PROVE_PERL environment variable is one way.  Or maybe simpler,
> just skip the probe for IPC::Run if PROVE has been specified
> externally; assume the user knows what he's doing in that case.


WFM

> Are there any other gotchas in the build sequence?
>


Not that I can think of.

> Do we have/need any explicit references to the test version of "perl",
> or is "prove" a sufficient API?
>
>   


Probably sufficient.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Record full paths of programs sought by "configure".

2017-08-07 Thread Tom Lane

Andrew Dunstan  writes:
> On 08/07/2017 03:36 PM, Tom Lane wrote:
>> My goodness, that's ugly.  Is it really better than injecting
>> "PROVE=prove"?  (I'd suggest saying that to configure, not make,
>> so that the configure log bears some resemblance to what you
>> want done.)

> This is what we had to do BEFORE the change in this commit. Now it's no
> longer sufficient.

Sorry, I was imprecise.  What I'm suggesting is that you drop the
runtime PATH-foolery and instead put this in configure's environment:

PROVE=$perlpathdir/prove

Otherwise you're basically lying to configure about what you're going
to use, and that's always going to break eventually.

> It would certainly be better if we could tell configure a path to prove
> and a path to the perl we need to test IPC::Run against.

Hm, yeah, the IPC::Run test would need to deal with this as well.
A PROVE_PERL environment variable is one way.  Or maybe simpler,
just skip the probe for IPC::Run if PROVE has been specified
externally; assume the user knows what he's doing in that case.
Are there any other gotchas in the build sequence?

> The problem in all this is that we're assuming incorrectly that the perl
> we use to build against is the same as the perl we need to run the build

... I think you meant "TAP tests" here ? --- ^

> with. On Msys that's emphatically not true.

Do we have/need any explicit references to the test version of "perl",
or is "prove" a sufficient API?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Effect of dropping a partitioned table's column over time

2017-08-07 Thread Robert Haas

On Sun, Aug 6, 2017 at 11:38 PM, Craig Ringer  wrote:
> Can we instead create the new partitions with the same dropped columns?
>
> Ensure that every partition, parent and child, has the same column-set?

We could, but that has costs of its own.  It means that those calls
are stored in the tuple as nulls, which means that there will be a
null bitmap where you wouldn't otherwise have one, and every time you
deform a tuple you'll have to worry about those columns even though
they're never used for anything.

It's not clear whether this cost is more or less than the tuple
conversion cost.  It seems ugly to me, though, either way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Record full paths of programs sought by "configure".

2017-08-07 Thread Andrew Dunstan



On 08/07/2017 03:36 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 08/07/2017 03:21 PM, Tom Lane wrote:
>>> I'm confused.  AFAIK, that commit did not change which "prove" would
>>> be used --- at least not unless you change PATH between configure and
>>> make.  It only changed how specifically that program would be named in
>>> Makefile.global.  Please clarify how that broke anything.
>> That's exactly what we do. See
>>  at
>> line 1649.
> My goodness, that's ugly.  Is it really better than injecting
> "PROVE=prove"?  (I'd suggest saying that to configure, not make,
> so that the configure log bears some resemblance to what you
> want done.)
>
>   



This is what we had to do BEFORE the change in this commit. Now it's no
longer sufficient.

It would certainly be better if we could tell configure a path to prove
and a path to the perl we need to test IPC::Run against.

e.g. PROVE=/usr/bin/prove PROVE_PERL=/usr/bin/perl configure ...

The problem in all this is that we're assuming incorrectly that the perl
we use to build against is the same as the perl we need to run the build
with. On Msys that's emphatically not true.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Record full paths of programs sought by "configure".

2017-08-07 Thread Tom Lane

Andrew Dunstan  writes:
> On 08/07/2017 03:21 PM, Tom Lane wrote:
>> I'm confused.  AFAIK, that commit did not change which "prove" would
>> be used --- at least not unless you change PATH between configure and
>> make.  It only changed how specifically that program would be named in
>> Makefile.global.  Please clarify how that broke anything.

> That's exactly what we do. See
>  at
> line 1649.

My goodness, that's ugly.  Is it really better than injecting
"PROVE=prove"?  (I'd suggest saying that to configure, not make,
so that the configure log bears some resemblance to what you
want done.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

2017-08-07 Thread Robert Haas

On Mon, Aug 7, 2017 at 2:54 AM, Amit Langote
 wrote:
> I think Amit Khandekar mentioned this on the UPDATE partition key thread [1].

Yes, similar discussion.

> As long as find_all_inheritors() is a place only to determine the order in
> which partitions will be locked, it's fine.  My concern is about the time
> of actual locking, which in the current planner implementation is too soon
> that we end up needlessly locking all the partitions.

I don't think avoiding that problem is going to be easy.  We need a
bunch of per-relation information, like the size of each relation, and
what indexes it has, and how big they are, and the statistics for each
one.  It was at one point proposed by someone that every partition
should be required to have the same indexes, but (1) we didn't
implement it like that and (2) if we had done that it wouldn't solve
this problem anyway because the sizes are still going to vary.

Note that I'm not saying this isn't a good problem to solve, just that
it's likely to be a very hard problem to solve.

> The locking-partitions-too-soon issue, I think, is an important one and
> ISTM, we'd want to lock the partitions after we've determined the specific
> ones a query needs to scan using the information returned by
> RelationGetPartitionDispatchInfo.  That means the latter had better locked
> the relations whose cached partition descriptors will be used to determine
> the result that it produces.  One way to do that might be to lock all the
> tables in the list returned by find_all_inheritors that are partitioned
> tables before calling RelationGetPartitionDispatchInfo.  It seems what the
> approach you've outlined below will let us do that.

Yeah, I think so.  I think we could possibly open and lock partitioned
children only, then prune away leaf partitions that we can determine
aren't needed, then open and lock the leaf partitions that are needed.

> BTW, IIUC, there will be two lists of OIDs we'll  have: one in the
> find_all_inheritors order, say, L1 and the other determined by using
> partitioning-specific information for the given query, say L2.
>
> To lock, we iterate L1 and if a given member is in L2, we lock it.  It
> might be possible to make it as cheap as O(nlogn).

Commonly, we'll prune no partitions or all but one; and we should be
able to make those cases very fast.  Other cases can cost a little
more, but I'll certainly complain about anything more than O(n lg n).

>> 3. While we're optimizing, in the first loop inside of
>> RelationGetPartitionDispatchInfo, don't call heap_open().  Instead,
>> use get_rel_relkind() to see whether we've got a partitioned table; if
>> so, open it.  If not, there's no need.
>
> That's what the proposed refactoring patch 0002 actually does.

Great.

> Maybe, we can make the initial patch use syscache to get the relkind for a
> given child.  If the syscache bloat is unbearable, we go with the
> denormalization approach.

Yeah.  Maybe if you write that patch, you can also test it to see how
bad the bloat is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Record full paths of programs sought by "configure".

2017-08-07 Thread Andrew Dunstan



On 08/07/2017 03:21 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 07/31/2017 01:02 PM, Tom Lane wrote:
>>> Record full paths of programs sought by "configure".
>> The problem with this commit, as jacana is demonstrating, is that on
>> Msys it finds the wrong prove. configure needs to run against the perl
>> we build against, i.e. a native Windows perl, but prove needs to run
>> with the perl from the MSys DTK that understands MSys virtualized
>> paths. I have a hack that will allow the buildfarm to overcome the
>> difficulty, (essentially it passes 'PROVE=prove' to make) but that's
>> fairly ugly and certainly non-intuitive for someone running an MSys
>> build and TAP tests without the buildfarm client.
> I'm confused.  AFAIK, that commit did not change which "prove" would
> be used --- at least not unless you change PATH between configure and
> make.  It only changed how specifically that program would be named in
> Makefile.global.  Please clarify how that broke anything.
>
>   




That's exactly what we do. See
 at
line 1649.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Record full paths of programs sought by "configure".

2017-08-07 Thread Tom Lane

Andrew Dunstan  writes:
> On 07/31/2017 01:02 PM, Tom Lane wrote:
>> Record full paths of programs sought by "configure".

> The problem with this commit, as jacana is demonstrating, is that on
> Msys it finds the wrong prove. configure needs to run against the perl
> we build against, i.e. a native Windows perl, but prove needs to run
> with the perl from the MSys DTK that understands MSys virtualized
> paths. I have a hack that will allow the buildfarm to overcome the
> difficulty, (essentially it passes 'PROVE=prove' to make) but that's
> fairly ugly and certainly non-intuitive for someone running an MSys
> build and TAP tests without the buildfarm client.

I'm confused.  AFAIK, that commit did not change which "prove" would
be used --- at least not unless you change PATH between configure and
make.  It only changed how specifically that program would be named in
Makefile.global.  Please clarify how that broke anything.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

2017-08-07 Thread Robert Haas

On Mon, Aug 7, 2017 at 1:48 AM, Ashutosh Bapat
 wrote:
> with the way schema is designed. May be it's better to use your idea
> of using get_rel_relkind() or find a way to check that the flag is in
> sync with the relkind, like when building the relcache.

That's got the same problem as building a full relcache entry: cache
bloat.  It will have *less* cache bloat, but still some.  Maybe it's
little enough to be tolerable; not sure.  But we want this system to
scale to LOTS of partitions eventually, so building on a design that
we know has scaling problems seems a bit short-sighted.

> I noticed that find_all_inheritors() uses a hash table to eliminate
> duplicates arising out of multiple inheritance. Partition hierarchy is
> never going to have multiple inheritance, and doesn't need to
> eliminate duplicates and so doesn't need the hash table. It will be
> good, if we can eliminate that overhead. But that's separate task than
> what this thread is about.

I don't want to eliminate that overhead.  If the catalog is manually
modified or corrupted, the problem could still occur, and result in
backend crashes or, at best, incomprehensible errors.  The comments
allude to this problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xmltable SQL conformance

2017-08-07 Thread Pavel Stehule

2017-08-07 20:36 GMT+02:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 8/3/17 17:07, Pavel Stehule wrote:
> > I'm looking to update the SQL conformance list for the release.
> I'm
> > wondering whether the new xmltable feature fully completes the
> > following
> > items:
> >
> > X300XMLTable
> > X301XMLTable: derived column list option
> > X302XMLTable: ordinality column option
> > X303XMLTable: column default option
> > X304XMLTable: passing a context item
> >
> > It looks so to me, but maybe you know more.
> >
> >
> > I am not sure what X304 means? Other are not supported 100% .. are
> > related to unsupported XQuery lang.
> >
> >
> > Others than x300, 301, 302, 303 are unsupported.
>
> I think you mean supported here?  Because they quite clearly are.
>

I am sorry - it my bad English :( - Czechism


> > I don't understand to
> > X304 well - related syntax is ignored in Postgres if I understand.
>
> This means that you pass in the document to work on, which is supported.
>
> Thanks for your input.  I have updated the features list accordingly.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] xmltable SQL conformance

2017-08-07 Thread Peter Eisentraut

On 8/3/17 17:07, Pavel Stehule wrote:
> I'm looking to update the SQL conformance list for the release.  I'm
> wondering whether the new xmltable feature fully completes the
> following
> items:
> 
> X300XMLTable
> X301XMLTable: derived column list option
> X302XMLTable: ordinality column option
> X303XMLTable: column default option
> X304XMLTable: passing a context item
> 
> It looks so to me, but maybe you know more.
> 
> 
> I am not sure what X304 means? Other are not supported 100% .. are
> related to unsupported XQuery lang.
> 
> 
> Others than x300, 301, 302, 303 are unsupported.

I think you mean supported here?  Because they quite clearly are.

> I don't understand to
> X304 well - related syntax is ignored in Postgres if I understand.

This means that you pass in the document to work on, which is supported.

Thanks for your input.  I have updated the features list accordingly.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Record full paths of programs sought by "configure".

2017-08-07 Thread Andrew Dunstan

On 07/31/2017 01:02 PM, Tom Lane wrote:
> Record full paths of programs sought by "configure".
>
> Previously we had a mix of uses of AC_CHECK_PROG[S] and AC_PATH_PROG[S].
> The only difference between those macros is that the latter emits the
> full path to the program it finds, eg "/usr/bin/prove", whereas the
> former emits just "prove".  Let's standardize on always emitting the
> full path; this is better for documentation of the build, and it might
> prevent some types of failures if later build steps are done with
> a different PATH setting.
>
> I did not touch the AC_CHECK_PROG[S] calls in ax_pthread.m4 and
> ax_prog_perl_modules.m4.  There seems no need to make those diverge from
> upstream, since we do not record the programs sought by the former, while
> the latter's call to AC_CHECK_PROG(PERL,...) will never be reached.
>
> Discussion: https://postgr.es/m/25937.1501433...@sss.pgh.pa.us

The problem with this commit, as jacana is demonstrating, is that on Msys it 
finds the wrong prove. configure needs to run against the perl we build 
against, i.e. a native Windows perl, but prove needs to run with the perl from 
the MSys DTK that understands MSys virtualized paths. I have a hack that will 
allow the buildfarm to overcome the difficulty, (essentially it passes 
'PROVE=prove' to make) but that's fairly ugly and certainly non-intuitive for 
someone running an MSys build and TAP tests without the buildfarm client.

cheers

andrew 

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [GSOC][weekly report 9] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-08-07 Thread Alvaro Herrera

Mengxing Liu wrote:
> In the last week, I focus on:
> 
> 
> 1) Continuing on optimization of skip list.

Let me state once again that I'm certainly not an
expert in the predicate locks module and that I hope Kevin will chime in
with more useful feedback than mine.

Some random thoughts:

* I wonder why did you settle on a skip list as the best optimization
  path for this.  Did you consider other data structures?  (We don't
  seem to have much that would be usable in shared memory, I fear.)

* I wonder if a completely different approach to the finished xact
  maintenance problem would be helpful.  For instance, in
  ReleasePredicateLocks we call ClearOldPredicateLocks()
  inconditionally, and there we grab the SerializableFinishedListLock
  lock inconditionally for the whole duration of that routine, but
  perhaps it would be better to skip the whole ClearOld stuff if the
  SerializableFinishedListLock is not available?  Find out some way to
  ensure that the cleanup is always called later on, but maybe skipping
  it once in a while improves overall performance.

* the whole predicate.c stuff is written using SHM_QUEUE.  I suppose
  SHM_QUEUE works just fine, but predicate.c was being written at about
  the same time (or a bit earlier) than the newer ilist.c interface was
  being created, which I think had more optimization work thrown in.
  Maybe it would be good for predicate.c to ditch use of SHM_QUEUE and
  use ilist.c interfaces instead?  We could even think about being less
  strict about holding exclusive lock on SerializableFinished for the
  duration of ClearOldPredicateLocks, i.e. use only a share lock and
  only exchange for exclusive if a list modification is needed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [GSOC][weekly report 9] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-08-07 Thread Mengxing Liu

In the last week, I focus on:


1) Continuing on optimization of skip list. Here is the new logic:
At first, I use an unordered linked list to do all inserting/searching 
operations.
When the number of conflicts is more than the threshold, transform the linked 
list into an ordered skip list.
Then all inserting/searching operations are done by the skip list.
By this way, for transactions with only a few conflicts, overhead of inserting 
is O(1). 
the patch is attached.


It helped a little, but just a little. In the end, the skip list has the same 
performance of the linked list.


2) Studying the performance of less contention on FinishedListLock. 
I found it can get benefit when the number of conflicts is medium. It is easy 
to understand the optimization 
has no influence when conflicts are too rare. But when there are too many 
conflicts, partition lock
will be the new bottleneck and the performance can't be improved. A chart is 
attached. This optimization
can achieve 14% speedup at most. 


Do you think this optimization can be accepted?


--

Sincerely


Mengxing Liu








skip-list-for-conflict-tracking.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] free space % calculation in pgstathashindex

2017-08-07 Thread Ashutosh Sharma

On Mon, Aug 7, 2017 at 7:19 PM, Amit Kapila  wrote:
> On Mon, Aug 7, 2017 at 6:07 PM, Ashutosh Sharma  wrote:
>> Hi,
>>
> ..
>> In step #1, assuming '*' as an arithmetic operator, the left operand
>> i.e. 'stats.unused_pages' is of type uint32 whereas the right operand
>> i.e. 'stats.space_per_page' is of type int32 and arithmetic
>> conversions of the ANSI C standard states: 'if either operand has type
>> unsigned int, the other operand is converted to unsigned int' which
>> means stats.space_per_page would be converted to uint32 and then the
>> multiplication would be performed. But, if the result of
>> multiplication is greater than the max value accepted by 32 bits
>> number, we would get an incorrect result. Hence, it is important to do
>> proper typecasting of operands before performing any arithmetic
>> operation. In this case, i would typecast both stats.unused_pages and
>> stats.space_per_page to uint64 and the
>> perform multiplication on them. Similar handling needs to be done in
>> step #2 as well.
>>
>
> I think even if you have typecasted the first variable, it would have
> served the purpose.
>

Yes, that's right, typecasting the first variable can also prevent the
overflow problem but, then, if the operands are of two different
types, compiler will promote all to the largest or more precise type.
Therefore, even if we explicitly typecast it or not, compiler will
eventually be doing that. However, i have now just typecasted the
first value. Attached is the updated patch.

>   /* Count unused pages as free space. */
>
> - stats.free_space += stats.unused_pages * stats.space_per_page;
> + stats.free_space += ((uint64) stats.unused_pages *
> + (uint64) stats.space_per_page);
>
> Why an extra parenthesis in above case whereas not in below case?  I
> think the code will look consistent if you follow the same coding
> practice.  I suggest don't use it unless you need it.

That is because in the 1st case, there are multiple operators (*, +)
whereas in the 2nd case we have just one(*). So, just to ensure that
'*' is performed before '+',  i had used parenthesis, though it is not
required as '*' has higher precedence than '+'. I have removed the
extra parenthesis and attached is the new version of patch. Thanks.

>
>   /*
>   * Total space available for tuples excludes the metapage and the bitmap
>   * pages.
>   */
> - total_space = (nblocks - (stats.bitmap_pages + 1)) * stats.space_per_page;
> + total_space = (uint64) (nblocks - (stats.bitmap_pages + 1)) *
> + (uint64) stats.space_per_page;
>

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index 44e322d..9365ba7 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -687,13 +687,14 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	index_close(rel, AccessShareLock);
 
 	/* Count unused pages as free space. */
-	stats.free_space += stats.unused_pages * stats.space_per_page;
+	stats.free_space += (uint64) stats.unused_pages * stats.space_per_page;
 
 	/*
 	 * Total space available for tuples excludes the metapage and the bitmap
 	 * pages.
 	 */
-	total_space = (nblocks - (stats.bitmap_pages + 1)) * stats.space_per_page;
+	total_space = (uint64) (nblocks - (stats.bitmap_pages + 1)) *
+		stats.space_per_page;
 
 	if (total_space == 0)
 		free_percent = 0.0;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

2017-08-07 Thread Peter Eisentraut

On 8/5/17 18:56, Noah Misch wrote:
>> [Action required within three days.  This is a generic notification.]

I'm awaiting further testing and discussion.  Probably nothing happening
for beta3.  Will report on Thursday.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-08-07 Thread Fabrízio de Royes Mello

On Mon, Aug 7, 2017 at 10:43 AM, Robins Tharakan  wrote:
>
> On 20 July 2017 at 05:14, Robins Tharakan  wrote:
>>
>> On 20 July 2017 at 05:08, Michael Paquier 
wrote:
>>>
>>> On Wed, Jul 19, 2017 at 8:59 PM,
>>> Fabrízio de Royes Mello
>>> > You should add the properly sgml docs for this pg_dumpall change also.
>>>
>>> Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
>>> extensions are in src/test/modules/test_pg_dump, but you just care
>>> about the former with this patch. And if you implement some new tests,
>>> look at the other tests and base your work on that.
>>
>>
>> Thanks Michael /
>> Fabrízio.
>>
>> Updated patch (attached) additionally adds SGML changes for pg_dumpall.
>> (I'll try to work on the tests, but sending this
>> nonetheless
>> ).
>>
>
> Attached is an updated patch (v4) with basic tests for pg_dump /
pg_dumpall.
> (Have zipped it since patch size jumped to ~40kb).
>

The patch applies cleanly to current master and all tests run without
failures.

I also test against all current supported versions (9.2 ... 9.6) and didn't
find any issue.

Changed status to "ready for commiter".

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] free space % calculation in pgstathashindex

2017-08-07 Thread Amit Kapila

On Mon, Aug 7, 2017 at 6:07 PM, Ashutosh Sharma  wrote:
> Hi,
>
..
> In step #1, assuming '*' as an arithmetic operator, the left operand
> i.e. 'stats.unused_pages' is of type uint32 whereas the right operand
> i.e. 'stats.space_per_page' is of type int32 and arithmetic
> conversions of the ANSI C standard states: 'if either operand has type
> unsigned int, the other operand is converted to unsigned int' which
> means stats.space_per_page would be converted to uint32 and then the
> multiplication would be performed. But, if the result of
> multiplication is greater than the max value accepted by 32 bits
> number, we would get an incorrect result. Hence, it is important to do
> proper typecasting of operands before performing any arithmetic
> operation. In this case, i would typecast both stats.unused_pages and
> stats.space_per_page to uint64 and the
> perform multiplication on them. Similar handling needs to be done in
> step #2 as well.
>

I think even if you have typecasted the first variable, it would have
served the purpose.

  /* Count unused pages as free space. */

- stats.free_space += stats.unused_pages * stats.space_per_page;
+ stats.free_space += ((uint64) stats.unused_pages *
+ (uint64) stats.space_per_page);

Why an extra parenthesis in above case whereas not in below case?  I
think the code will look consistent if you follow the same coding
practice.  I suggest don't use it unless you need it.

  /*
  * Total space available for tuples excludes the metapage and the bitmap
  * pages.
  */
- total_space = (nblocks - (stats.bitmap_pages + 1)) * stats.space_per_page;
+ total_space = (uint64) (nblocks - (stats.bitmap_pages + 1)) *
+ (uint64) stats.space_per_page;


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Notice message of ALTER SUBSCRIPTION ... RERESH PUBLICATIION

2017-08-07 Thread Peter Eisentraut

On 7/27/17 20:51, Yugo Nagata wrote:
> When we run ALTER SUBSCRIPTION ... REFRESH PUBLICATION and there is 
> an unkown table at local, it says;
> 
>  NOTICE:  added subscription for table public.tbl
> 
> I feel this a bit misleading because it says a subscription is added
> but actually new subscription object is not created. Of cause, I can
> understand the intention this message says, but I feel that
> it is better to use other expression. For example, how about saying
> 
>  NOTICE:  table public.tbl is added to subscription sub1
> 
> as proposed in the attached patch. This also fixes the message
> when a table is removed from a subscription. 

Fixed, thanks.  (Note that per another discussion, these are now DEBUG
messages.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding hook in BufferSync for backup purposes

2017-08-07 Thread Tom Lane

Alvaro Herrera  writes:
> I suppose your hook idea lets you implement the LSN fork in an
> extension, rather than having it be part of core.  The idea of hooking
> onto BufferSync makes me uneasy, though -- like it's not the correct
> place to do it.

Yeah.  Keep in mind that if the extension does anything at all that could
possibly throw an error, and if that error condition persists across
multiple tries, you will have broken the database completely: it will
be impossible to complete a checkpoint, and your WAL segment pool will
grow until it exhausts disk.  So the idea of doing something that involves
unspecified extension behavior, especially possible interaction with
an external backup agent, right there is pretty terrifying.

Other problems with the proposed patch: it misses coverage of
BgBufferSync, and I don't like exposing an ad-hoc structure like
CkptTsStatus as part of an extension API.  The algorithm used by
BufferSync to schedule buffer writes has changed multiple times
before and doubtless will again; if we're going to have a hook
here it should depend as little as possible on those details.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Subscription code improvements

2017-08-07 Thread Peter Eisentraut

On 8/7/17 00:27, Masahiko Sawada wrote:
>>> However, even with this patch there is still an issue that NOTICE
>>> messages "removed subscription for table public.t1" can be appeared
>>> even if we rollback ALTER SUBSCRIPTION REFRESH command as I mentioned
>>> on earlier thread. Since I think this behaviour will confuse users who
>>> check server logs I'd like to deal with it, I don't have a good idea
>>> though.
>>
>> Maybe we can just remove those messages?
>>
>> We don't get messages when we create a subscription about which tables
>> are part of it.  So why do we need such messages when we refresh a
>> subscription?
> 
> I think that the messages is useful when we add/remove tables to/from
> the publication and then refresh the subscription, so we might want to
> change it to DEBUG rather than elimination.

Good idea.  Done that way.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Subscription code improvements

2017-08-07 Thread Peter Eisentraut

On 8/7/17 00:32, Masahiko Sawada wrote:
> On Sun, Aug 6, 2017 at 7:44 AM, Noah Misch  wrote:
>> On Wed, Aug 02, 2017 at 04:09:43PM -0400, Peter Eisentraut wrote:
>>> On 8/1/17 00:17, Noah Misch wrote:
 The above-described topic is currently a PostgreSQL 10 open item.  Peter,
 since you committed the patch believed to have created it, you own this 
 open
 item.  If some other commit is more relevant or if this does not belong as 
 a
 v10 open item, please let us know.  Otherwise, please observe the policy on
 open item ownership[1] and send a status update within three calendar days 
 of
 this message.  Include a date for your subsequent status update.  Testers 
 may
 discover new open items at any time, and I want to plan to get them all 
 fixed
 well in advance of shipping v10.  Consequently, I will appreciate your 
 efforts
 toward speedy resolution.  Thanks.
>>>
>>> I'm looking into this now and will report by Friday.
>>
>> This PostgreSQL 10 open item is past due for your status update.  Kindly send
>> a status update within 24 hours, and include a date for your subsequent 
>> status
>> update.  Refer to the policy on open item ownership:
>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> I think this open item has closed by the commit
> 7e174fa793a2df89fe03d002a5087ef67abcdde8 ?

Yes.  I have updated the wiki page now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-08-07 Thread Amit Kapila

On Tue, Aug 1, 2017 at 1:56 PM, Haribabu Kommi  wrote:
>
>
> On Sun, Jul 23, 2017 at 4:10 PM, Amit Kapila 
> wrote:
>>
>
>>
>> > 1. Design an API that returns values/nulls array and convert that into a
>> > HeapTuple whenever it is required in the upper layers. All the existing
>> > heap form/deform tuples are used for every tuple with some adjustments.
>> >
>>
>> So, this would have the additional cost of form/deform.  Also, how
>> would it have lesser changes as compare to what you have described
>> earlier?
>
>
> Yes, It have the additional cost of form/deform. It is the same approach
> that
> is described earlier. But I have an intention of modifying everywhere the
> HeapTuple is accessed. But with the other prototype changes of removing
> HeapTuple usage from triggers, I realized that it needs some clear design
> to proceed further, instead of combining those changes with pluggable
> Storage API.
>
> - heap_getnext function is kept as it as and it is used only for system
> table
>   access.
> - heap_getnext_slot function is introduced to return the slot whenever the
>   data is found, otherwise NULL, This function is used in all the places
> from
>   Executor and etc.
>
> - The TupleTableSlot structure is modified to contain a void* tuple instead
> of
> HeapTuple. And also it contains the storagehanlder functions.
> - heap_insert and etc function can take Slot as an argument and perform the
> insert operation.
>
> The cases where the TupleTableSlot is not possible to sent, form a HeapTuple
> from the data and sent it and also note down that it is a HeapTuple data,
> not
> the tuple from the storage.
>
..
>
>
>>
>> > 3. Design an API that returns StorageTuple(void *) but the necessary
>> > format information of that tuple can be get from the tupledesc. wherever
>> > the tuple is present, there exists a tupledesc in most of the cases. How
>> > about adding some kind of information in tupledesc to find out the tuple
>> > format and call the necessary functions
>> >
>
>
> heap_getnext function returns StorageTuple instead of HeapTuple. The tuple
> type information is available in the TupleDesc structure.
>
> All heap_insert and etc function accepts TupleTableSlot as input and perform
> the insert operation. This approach is almost same as first approach except
> the
> storage handler functions are stored in TupleDesc.
>

Why do we need to store handler function in TupleDesc?  As of now, the
above patch series has it available in RelationData and
TupleTableSlot, I am not sure if instead of that keeping it in
TupleDesc is a good idea.  Which all kind of places require TupleDesc
to contain handler?  If those are few places, can we think of passing
it as a parameter?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-08-07 Thread Ashutosh Bapat

On Thu, Aug 3, 2017 at 9:38 PM, Ashutosh Bapat
 wrote:
> On Thu, Aug 3, 2017 at 2:10 AM, Robert Haas  wrote:
>> On Mon, Jul 31, 2017 at 7:59 AM, Ashutosh Bapat
>>  wrote:
>>> Adding AppendRelInfos to root->append_rel_list as and when they are
>>> created would keep parent AppendRelInfos before those of children. But
>>> that function throws away the AppendRelInfo it created when their are
>>> no real children i.e. in partitioned table's case when has no leaf
>>> partitions. So, we can't do that. Hence, I chose to change the API to
>>> return the list of AppendRelInfos when the given RTE has real
>>> children.
>>
>> So, IIUC, the case you're concerned about is when you have a hierarchy
>> of only partitioned tables, with no plain tables.  For example, if B
>> is a partitioned table and a partition of A, and that's all there is,
>> A will recurse to B and B will return NIL.
>>
>> Is it necessary to get rid of the extra AppendRelInfos, or are they
>> harmless like the duplicate RTE and PlanRowMark nodes?
>>
>
> Actually there are two sides to this:
>
> If there are no leaf partitions, without the patch two things happen
> 1. rte->inh is cleared and 2 no appinfo is added to the
> root->append_rel_list, even though harmless RTE and PlanRowMark nodes
> are created. The first avoids treating the relation as the inheritance
> parent and thus avoids creating any child relations and paths, saving
> a lot of work. Ultimately set_rel_size() marks such a relation as
> dummy
>  352 else if (rte->relkind == RELKIND_PARTITIONED_TABLE)
>  353 {
>  354 /*
>  355  * A partitioned table without leaf
> partitions is marked
>  356  * as a dummy rel.
>  357  */
>  358 set_dummy_rel_pathlist(rel);
>  359 }
>
> Since root->append_rel_list is traversed for every inheritance parent,
> not adding needless AppendRelInfos improves performance and saves
> memory, (FWIW or consider a case where there are thousands of
> partitioned partitions without any leaf partition.).

With some testing, I found that this was true once, but not after
declarative partition support. Please check [1].

>
> My initial thought was to keep both these properties intact. But then
> removing such AppendRelInfos would have a problem when such a table is
> on the inner side of the join as described in [1]. So I wrote the
> patch not to do either of those things when there are partitioned
> partitions without leaf partitions. So, it looks like you are correct,
> we could just go ahead and add those AppendRelInfos directly to
> root->append_rel_list.

Irrespective of [1], I have implemented your idea of not changing
signature of expand_inherited_rtentry() with following idea.

>>
>> If we do need to get rid of the extra AppendRelInfos, maybe a less
>> invasive solution would be to change the if (!need_append) case to do
>> root->append_rel_list = list_truncate(root->append_rel_list,
>> original_append_rel_length).

[1] 
https://www.postgresql.org/message-id/CAFjFpReWJr1yTkHU=oqimbmcycmosw3vpr39rbuq_ovwdfb...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] free space % calculation in pgstathashindex

2017-08-07 Thread Ashutosh Sharma

Hi,

While working on - [1]  (one of the bug reported for hash index), i
noticed that the percentage of free space shown in 'free_percent'
column of pgstathashindex() is incorrect for some of the boundary
cases. This is how the free space percentage is calculated by
pgstathashindex(),

1)  /* Count unused pages as free space. */
stats.free_space += stats.unused_pages * stats.space_per_page;

2)  /* Total space available for tuples excludes the metapage and the
bitmap pages */
total_space = (nblocks - (stats.bitmap_pages + 1)) * stats.space_per_page;

3)  Calculate the percentage of free space in hash index table.
free_percent = 100.0 * stats.free_space / total_space;

In step #1, assuming '*' as an arithmetic operator, the left operand
i.e. 'stats.unused_pages' is of type uint32 whereas the right operand
i.e. 'stats.space_per_page' is of type int32 and arithmetic
conversions of the ANSI C standard states: 'if either operand has type
unsigned int, the other operand is converted to unsigned int' which
means stats.space_per_page would be converted to uint32 and then the
multiplication would be performed. But, if the result of
multiplication is greater than the max value accepted by 32 bits
number, we would get an incorrect result. Hence, it is important to do
proper typecasting of operands before performing any arithmetic
operation. In this case, i would typecast both stats.unused_pages and
stats.space_per_page to uint64 and the
perform multiplication on them. Similar handling needs to be done in
step #2 as well.

Attached is the patch that fixes this.

[1] - 
https://www.postgresql.org/message-id/20170704105728.mwb72jebfmok2nm2%40zip.com.au
[2] - 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka9483.html
commit a34659959fe7385e68d196e57e90fe92b12764d4
Author: ashu 
Date:   Mon Aug 7 11:18:27 2017 +0530

Add proper typecasting to the operands when doing arithmatic operations.

Patch by Ashutosh Sharma.

diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index 44e322d..4e363cd 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -687,13 +687,15 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	index_close(rel, AccessShareLock);
 
 	/* Count unused pages as free space. */
-	stats.free_space += stats.unused_pages * stats.space_per_page;
+	stats.free_space += ((uint64) stats.unused_pages *
+		 (uint64) stats.space_per_page);
 
 	/*
 	 * Total space available for tuples excludes the metapage and the bitmap
 	 * pages.
 	 */
-	total_space = (nblocks - (stats.bitmap_pages + 1)) * stats.space_per_page;
+	total_space = (uint64) (nblocks - (stats.bitmap_pages + 1)) *
+		(uint64) stats.space_per_page;
 
 	if (total_space == 0)
 		free_percent = 0.0;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-07 Thread Ashutosh Sharma

On Fri, Aug 4, 2017 at 7:14 PM, Amit Kapila  wrote:
> On Sun, Jul 30, 2017 at 2:07 PM, Ashutosh Sharma  
> wrote:
>> Hi,
>>
>> On Wed, May 10, 2017 at 2:28 PM, Ashutosh Sharma  
>> wrote:
>>> While doing the code coverage testing of v7 patch shared with - [1], I
>>> found that there are few lines of code in _hash_next() that are
>>> redundant and needs to be removed. I basically came to know this while
>>> testing the scenario where a hash index scan starts when a split is in
>>> progress. I have removed those lines and attached is the newer version
>>> of patch.
>>>
>>
>> Please find the new version of patches for page at a time scan in hash
>> index rebased on top of latest commit in master branch. Also, i have
>> ran pgindent script with pg_bsd_indent version 2.0 on all the modified
>> files. Thanks.
>>
>
> Few comments:

Thanks for reviewing the patch.

> 1.
> +_hash_kill_items(IndexScanDesc scan, bool havePin)
>
> I think you can do without the second parameter.  Can't we detect
> inside _hash_kill_items whether the page is pinned or not as we do for
> btree?

Okay, done that way. Please check attached v10 patch.

>
> 2.
> + /*
> + * We remember prev and next block number along with current block
> + * number so that if fetching the tup- les using cursor we know
> + * the page from where to startwith. This is for the case where we
> + * have re- ached the end of bucket chain without finding any
> + * matching tuples.
>
> The spelling of tuples and reached contain some unwanted symbol.  Have
> space between "startwith" or just use "begin"

Corrected.

>
> 3.
> + if (!BlockNumberIsValid((opaque)->hasho_nextblkno))
> + {
> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
> + so->currPos.nextPage = InvalidBlockNumber;
> + }
>
> There is no need to use Parentheses around opaque.  I mean there is no
> problem with that, but it is redundant and makes code less readable.
> Also, there is similar usage at other places in the code, please
> change all another place as well.

Removed parenthesis around opaque.

 I think you can save the value of
> prevblkno in a local variable and use it in else part.

Okay, done that way.

>
> 4.
> + if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) &&
> + _hash_checkqual(scan, itup))
> + {
> + /* tuple is qualified, so remember it */
> + _hash_saveitem(so, itemIndex, offnum, itup);
> + itemIndex++;
> + }
> + else
> +
> + /*
> + * No more matching tuples exist in this page. so, exit while
> + * loop.
> + */
> + break;
>
> It is better to have braces for the else part. It makes code look
> better.  The type of code exists few line down as well, change that as
> well.

Added braces in the else part.

>
> 5.
> + /*
> + * We save the LSN of the page as we read it, so that we know whether it
> + * safe to apply LP_DEAD hints to the page later.
> + */
>
> "whether it safe"/"whether it is safe"

Corrected.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 59e9a5f5afc31a3d14ae39bf5ae0cf21ee42f624 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Mon, 7 Aug 2017 16:06:52 +0530
Subject: [PATCH] Rewrite hash index scan to work page at a time.

Patch by Ashutosh Sharma 
---
 src/backend/access/hash/README   |  25 +-
 src/backend/access/hash/hash.c   | 153 +++-
 src/backend/access/hash/hashpage.c   |  10 +-
 src/backend/access/hash/hashsearch.c | 446 +++
 src/backend/access/hash/hashutil.c   |  71 +-
 src/include/access/hash.h|  55 -
 6 files changed, 570 insertions(+), 190 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index c8a0ec7..eef7d66 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -259,10 +259,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
@@ -270,15 +271,13 @@ Holding the buffer pin on the primary bucket page for the whole scan prevents
 the reader's current-tuple pointer from being invalidated by splits or
 compactions.  (Of course, other buckets can still be split or compacted.)
 
-To keep concurrency reasonably good, we require readers to cope with
-concurrent insertions, which means that they have to be able to re-find
-their current scan position after re-acquiring the

Re: [HACKERS] pgsql 10: hash indexes testing

2017-08-07 Thread AP

On Sat, Aug 05, 2017 at 04:41:24PM +0530, Amit Kapila wrote:
> > (On another note, I committed these patches.)
> 
> Thanks.

Seconded. :)

Now uploading data with fillfactor of 90. I'll know in 2-3 days
if the new patches are successful (earlier if they did not help).

I compiled (as apt.postgresql.org does not provide the latest
beta3 version for stretch; only sid which has a different perl
version) 10~beta3~20170805.2225-1~593.git0d1f98b.

AP

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding hook in BufferSync for backup purposes

2017-08-07 Thread Alvaro Herrera

Андрей Бородин wrote:

> ==What==
> I propose to add hook inside BufferSync() function it inform extensions that 
> we
> are going to write pages to disk. Please see patch attached. I pass a 
> timestamp
> of the checkpoint, but it would be good if we could also pass there number of
> checkpoint or something like this to ensure that some checkpoints were not 
> lost
> (this could yield malformed backups).
>  
> ==State==
> This is just an idea to discuss, I could not find something like this in
> pgsql-hackers as for now. Neither I could find similar hooks in the code.
> Is this hook sufficient to implement page tracking for differential backups?
> I’m not sure, but seems like it is.

Hi,

I remember discussing the topic of differential base-backups with
somebody (probably Marco and Gabriele).  The idea we had was to have a
new relation fork which stores an LSN for each group of pages,
indicating the LSN of the newest change to those pages.  The backup tool
"scans" the whole LSN fork, and grabs images of all pages that have LSNs
newer than the one used for the previous base backup.

(I think your sketch above should use LSNs rather than timestamps).

I suppose your hook idea lets you implement the LSN fork in an
extension, rather than having it be part of core.  The idea of hooking
onto BufferSync makes me uneasy, though -- like it's not the correct
place to do it.  I think it should be at the point where the buffer is
modified (i.e. when WAL is written) rather than when it's checkpointed
out.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

2017-08-07 Thread Amit Langote

On 2017/08/05 2:25, Robert Haas wrote:
> On Fri, Aug 4, 2017 at 3:38 AM, Amit Langote
>  wrote:
>> The current way to expand inherited tables, including partitioned tables,
>> is to use either find_all_inheritors() or find_inheritance_children()
>> depending on the context.  They return child table OIDs in the (ascending)
>> order of those OIDs, which means the callers that need to lock the child
>> tables can do so without worrying about the possibility of deadlock in
>> some concurrent execution of that piece of code.  That's good.
>>
>> For partitioned tables, there is a possibility of returning child table
>> (partition) OIDs in the partition bound order, which in addition to
>> preventing the possibility of deadlocks during concurrent locking, seems
>> potentially useful for other caller-specific optimizations.  For example,
>> tuple-routing code can utilize that fact to implement binary-search based
>> partition-searching algorithm.  For one more example, refer to the "UPDATE
>> partition key" thread where it's becoming clear that it would be nice if
>> the planner had put the partitions in bound order in the ModifyTable that
>> it creates for UPDATE of partitioned tables [1].
> 
> I guess I don't really understand why we want to change the locking
> order.  That is bound to make expanding the inheritance hierarchy more
> expensive.  If we use this approach in all cases, it seems to me we're
> bound to reintroduce the problem we fixed in commit
> c1e0e7e1d790bf18c913e6a452dea811e858b554 and maybe add a few more in
> the same vein.  But I don't see that there's any necessary relation
> between the order of locking and the order of expansion inside the
> relcache entry/plan/whatever else -- so I propose that we keep the
> existing locking order and only change the other stuff.

Hmm, okay.

I guess I was trying to fit one solution to what might be two (or worse,
more) problems of the current implementation, which is not good.

> While reading related code this morning, I noticed that
> RelationBuildPartitionDesc and RelationGetPartitionDispatchInfo have
> *already* changed the locking order for certain operations, because
> the PartitionDesc's OID array is bound-ordered not OID-ordered.  That
> means that when RelationGetPartitionDispatchInfo uses the
> PartitionDesc's OID arra to figure out what to lock, it's potentially
> going to encounter partitions in a different order than would have
> been the case if it had used find_all_inheritors directly.

I think Amit Khandekar mentioned this on the UPDATE partition key thread [1].

> I'm
> tempted to think that RelationGetPartitionDispatchInfo shouldn't
> really be doing locking at all.  The best way to have the locking
> always happen in the same order is to have only one piece of code that
> determines that order - and I vote for find_all_inheritors.  Aside
> from the fact that it's the way we've always done it (and still do it
> in most other places), that code includes infinite-loop defenses which
> the new code you've introduced lacks.

As long as find_all_inheritors() is a place only to determine the order in
which partitions will be locked, it's fine.  My concern is about the time
of actual locking, which in the current planner implementation is too soon
that we end up needlessly locking all the partitions.

(Also in the current implementation, we open all the partitions to
construct Var translation lists, which are actually unused through most of
the planner stages, but admittedly it's a separate issue.)

The locking-partitions-too-soon issue, I think, is an important one and
may need discussing separately, but thought I'd mention it anyway.  It
also seems somewhat related to this discussion, but I may be wrong.

> Concretely, my proposal is:
> 
> 1. Before calling RelationGetPartitionDispatchInfo, the calling code
> should use find_all_inheritors to lock all the relevant relations (or
> the planner could use find_all_inheritors to get a list of relation
> OIDs, store it in the plan in order, and then at execution time we
> visit them in that order and lock them).

ISTM, we'd want to lock the partitions after we've determined the specific
ones a query needs to scan using the information returned by
RelationGetPartitionDispatchInfo.  That means the latter had better locked
the relations whose cached partition descriptors will be used to determine
the result that it produces.  One way to do that might be to lock all the
tables in the list returned by find_all_inheritors that are partitioned
tables before calling RelationGetPartitionDispatchInfo.  It seems what the
approach you've outlined below will let us do that.

BTW, IIUC, there will be two lists of OIDs we'll  have: one in the
find_all_inheritors order, say, L1 and the other determined by using
partitioning-specific information for the given query, say L2.

To lock, we iterate L1 and if a given member is in L2, we lock it.  It
might be possible to make it as cheap

Re: [HACKERS] Tuple-routing for certain partitioned tables not working as expected

2017-08-07 Thread Etsuro Fujita


On 2017/08/07 15:33, Amit Langote wrote:

On 2017/08/07 15:22, Etsuro Fujita wrote:

On 2017/08/07 13:11, Amit Langote wrote:> The patch looks good too.
Although, looking at the following hunk:


+ Assert(partrel->rd_rel->relkind == RELKIND_RELATION ||
+partrel->rd_rel->relkind == RELKIND_FOREIGN_TABLE);
+
/*
 * Verify result relation is a valid target for the current
operation.
 */
! if (partrel->rd_rel->relkind == RELKIND_RELATION)
! CheckValidResultRel(partrel, CMD_INSERT);

makes me now wonder if we need the CheckValidResultRel check at all.  The
only check currently in place for RELKIND_RELATION is
CheckCmdReplicaIdentity(), which is a noop (returns true) for inserts
anyway.


Good point!  I left the verification for a plain table because that is
harmless but as you mentioned, that is nothing but an overhead.  So, here
is a new version which removes the verification at all from
ExecSetupPartitionTupleRouting.


The updated patch looks good to me, thanks.


Thanks for the review!

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuple-routing for certain partitioned tables not working as expected

2017-08-07 Thread Amit Langote

On 2017/08/07 15:22, Etsuro Fujita wrote:
> On 2017/08/07 13:11, Amit Langote wrote:> The patch looks good too.
> Although, looking at the following hunk:
>>
>> + Assert(partrel->rd_rel->relkind == RELKIND_RELATION ||
>> +partrel->rd_rel->relkind == RELKIND_FOREIGN_TABLE);
>> +
>>/*
>> * Verify result relation is a valid target for the current
>> operation.
>> */
>> ! if (partrel->rd_rel->relkind == RELKIND_RELATION)
>> ! CheckValidResultRel(partrel, CMD_INSERT);
>>
>> makes me now wonder if we need the CheckValidResultRel check at all.  The
>> only check currently in place for RELKIND_RELATION is
>> CheckCmdReplicaIdentity(), which is a noop (returns true) for inserts
>> anyway.
> 
> Good point!  I left the verification for a plain table because that is
> harmless but as you mentioned, that is nothing but an overhead.  So, here
> is a new version which removes the verification at all from
> ExecSetupPartitionTupleRouting.

The updated patch looks good to me, thanks.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

2017-08-07 Thread Ashutosh Bapat

On Mon, Aug 7, 2017 at 11:18 AM, Ashutosh Bapat
 wrote:
>>
>> One objection to this line of attack is that there might be a good
>> case for locking only the partitioned inheritors first and then going
>> back and locking the leaf nodes in a second pass, or even only when
>> required for a particular row.  However, that doesn't require putting
>> everything in bound order - it only requires moving the partitioned
>> children to the beginning of the list.  And I think rather than having
>> new logic for that, we should teach find_inheritance_children() to do
>> that directly.  I have a feeling Ashutosh is going to cringe at this
>> suggestion, but my idea is to do this by denormalizing: add a column
>> to pg_inherits indicating whether the child is of
>> RELKIND_PARTITIONED_TABLE.  Then, when find_inheritance_children scans
>> pg_inherits, it can pull that flag out for free along with the
>> relation OID, and qsort() first by the flag and then by the OID.  It
>> can also return the number of initial elements of its return value
>> which have that flag set.
>
> I am always uncomfortable, when we save the same information in two
> places without having a way to make sure that they are in sync. That
> means we have to add explicit code to make sure that that information
> is kept in sync. Somebody forgetting to add that code wherever
> necessary means we have contradictory information persisted in the
> databases without an idea of which of them is correct. Not necessarily
> in this case, but usually it is an indication of something going wrong
> with the way schema is designed. May be it's better to use your idea
> of using get_rel_relkind() or find a way to check that the flag is in
> sync with the relkind, like when building the relcache.

Said all that, I think we will use this code quite often and so the
performance benefits by replicating the information are worth the
trouble of maintaining code to sync and check the duplicate
information.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Adding hook in BufferSync for backup purposes

2017-08-07 Thread Андрей Бородин

Hi, hackers! I want to propose adding hook in BufferSync. ==Why==So that extensions could track pages changed between checkpoints. I think this can allow efficient differential backups taken right after checkpoint. And this functionality can be implemented as an extension. ==What==I propose to add hook inside BufferSync() function it inform extensions that we are going to write pages to disk. Please see patch attached. I pass a timestamp of the checkpoint, but it would be good if we could also pass there number of checkpoint or something like this to ensure that some checkpoints were not lost (this could yield malformed backups). ==State==This is just an idea to discuss, I could not find something like this in pgsql-hackers as for now. Neither I could find similar hooks in the code.Is this hook sufficient to implement page tracking for differential backups? I’m not sure, but seems like it is. ==Questions==Is this call enough to gather information about changed pages for backup purposes?Can we call extensions reliably from checkpointer process?Can we guaranty that extension won’t miss our call or we will defer BufferSync if extension have failed?Can we provide more flexibility for this hook? Any thought will be appreciated. Best regards, Andrey Borodin, Yandex.From 01d692e01716d6904847e4ce73faabbd7cc9fa97 Mon Sep 17 00:00:00 2001
From: Andrey Borodin 
Date: Sat, 5 Aug 2017 12:36:32 +0500
Subject: [PATCH] Add hook to BuferSync

---
 src/backend/storage/buffer/bufmgr.c | 37 +--
 src/include/storage/buf_internals.h | 39 +++--
 2 files changed, 46 insertions(+), 30 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 15795b0c5a..57bb5b89f0 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -76,34 +76,6 @@ typedef struct PrivateRefCountEntry
 /* 64 bytes, about the size of a cache line on common systems */
 #define REFCOUNT_ARRAY_ENTRIES 8
 
-/*
- * Status of buffers to checkpoint for a particular tablespace, used
- * internally in BufferSync.
- */
-typedef struct CkptTsStatus
-{
-	/* oid of the tablespace */
-	Oid			tsId;
-
-	/*
-	 * Checkpoint progress for this tablespace. To make progress comparable
-	 * between tablespaces the progress is, for each tablespace, measured as a
-	 * number between 0 and the total number of to-be-checkpointed pages. Each
-	 * page checkpointed in this tablespace increments this space's progress
-	 * by progress_slice.
-	 */
-	float8		progress;
-	float8		progress_slice;
-
-	/* number of to-be checkpointed pages in this tablespace */
-	int			num_to_scan;
-	/* already processed pages in this tablespace */
-	int			num_scanned;
-
-	/* current offset in CkptBufferIds for this tablespace */
-	int			index;
-} CkptTsStatus;
-
 /* GUC variables */
 bool		zero_damaged_pages = false;
 int			bgwriter_lru_maxpages = 100;
@@ -1764,6 +1736,10 @@ UnpinBuffer(BufferDesc *buf, bool fixOwner)
 	}
 }
 
+
+/* Hook for plugins to get control in BufferSync() */
+checkpointer_buffer_sync_hook_type checkpointer_buffer_sync_hook = NULL;
+
 /*
  * BufferSync -- Write out all dirty buffers in the pool.
  *
@@ -1926,6 +1902,11 @@ BufferSync(int flags)
 
 	Assert(num_spaces > 0);
 
+	if(checkpointer_buffer_sync_hook)
+		checkpointer_buffer_sync_hook(CheckpointStats.ckpt_write_t,
+		CkptBufferIds,
+		per_ts_stat);
+
 	/*
 	 * Build a min-heap over the write-progress in the individual tablespaces,
 	 * and compute how large a portion of the total progress a single
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b768b6fc96..b9e4f19f3a 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -278,8 +278,6 @@ extern PGDLLIMPORT WritebackContext BackendWritebackContext;
 /* in localbuf.c */
 extern BufferDesc *LocalBufferDescriptors;
 
-/* in bufmgr.c */
-
 /*
  * Structure to sort buffers per file on checkpoints.
  *
@@ -295,9 +293,46 @@ typedef struct CkptSortItem
 	int			buf_id;
 } CkptSortItem;
 
+/* in bufmgr.c */
 extern CkptSortItem *CkptBufferIds;
 
 /*
+ * Status of buffers to checkpoint for a particular tablespace, used
+ * internally in BufferSync.
+ */
+typedef struct CkptTsStatus
+{
+	/* oid of the tablespace */
+	Oid			tsId;
+
+	/*
+	 * Checkpoint progress for this tablespace. To make progress comparable
+	 * between tablespaces the progress is, for each tablespace, measured as a
+	 * number between 0 and the total number of to-be-checkpointed pages. Each
+	 * page checkpointed in this tablespace increments this space's progress
+	 * by progress_slice.
+	 */
+	float8		progress;
+	float8		progress_slice;
+
+	/* number of to-be checkpointed pages in this tablespace */
+	int			num_to_scan;
+	/* already processed pages in this tablespace */
+	int			num_scanned;
+
+	/* current offset in CkptBufferIds for this tablespace */
+	int

Re: [HACKERS] UPDATE of partition key

2017-08-07 Thread Rajkumar Raghuwanshi

On Fri, Aug 4, 2017 at 10:28 PM, Amit Khandekar 
wrote:

> >
> > Below are the TODOS at this point :
> >
> > Fix for bug reported by Rajkumar about update with join.
>
> I had explained the root issue of this bug here : [1]
>
> Attached patch includes the fix, which is explained below.
>

Hi Amit,

I have applied v14 patch and tested from my side, everything looks good to
me. attaching some of test case and out file for reference.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


update_partition_test.out
Description: Binary data
--===
--creating test dataset
CREATE TABLE pt (a INT, b INT, c INT) PARTITION BY RANGE(a);
CREATE TABLE pt_p1 PARTITION OF pt FOR VALUES FROM (1) to (6) PARTITION BY RANGE (b);
CREATE TABLE pt_p1_p1 PARTITION OF pt_p1 FOR VALUES FROM (11) to (44);
CREATE TABLE pt_p1_p2 PARTITION OF pt_p1 FOR VALUES FROM (44) to (66);
CREATE TABLE pt_p2 PARTITION OF pt FOR VALUES FROM (6) to (11) PARTITION BY LIST (c);
CREATE TABLE pt_p2_p1 PARTITION OF pt_p2 FOR VALUES IN (666,777,888);
CREATE TABLE pt_p2_p2 PARTITION OF pt_p2 FOR VALUES IN (999,NULL);
INSERT INTO pt (a,b,c) VALUES (1,11,111),(2,22,222),(3,33,333),(4,44,444),(5,55,555);
INSERT INTO pt (a,b,c) VALUES (6,66,666),(7,77,777),(8,88,888),(9,99,999),(10,100,NULL);
--test with updating root partition
--move data within same partition,setisfying partition contraint --should pass
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt SET b = 23 WHERE b = 11;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--move data within same subtree,different partition,setisfying partition contraint --should pass
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt SET b = 45,c = 422 WHERE b = 22;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--move data different subtree,setisfying partition contraint --should pass
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt SET a = 8,c=888 WHERE b = 33;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--move data within same partition,not setisfying partition contraint --should fail
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt SET b = 23, a=13 WHERE b = 11;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--move data within same subtree,different partition,not setisfying partition contraint --should fail
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt SET b = 45, a=14 WHERE b = 22;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--move data different subtree,not setisfying partition contraint --should fail
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt SET b = 88, c=198 WHERE b = 33;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--
--test with updating child partition
--move data within same partition,setisfying partition contraint --should pass
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt_p1_p1 SET b = 23 WHERE b = 11;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--move data within same subtree,different partition,setisfying partition contraint --should pass
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt_p1 SET b = 45,c = 422 WHERE b = 22;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--move data within same subtree,different partition,setisfying partition contraint,updating leaf child --should fail
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt_p1_p1 SET b = 45 WHERE b = 22;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--move data different subtree,setisfying partition contraint,updating child partition --should fail
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt_p1 SET a = 8,c=888 WHERE b = 33;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--move data within same partition,not setisfying partition contraint --should fail
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt_p1_p1 SET b = 23, a=13 WHERE b = 11;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--move data within same subtree,different partition,not setisfying partition contraint --should fail
BEGIN;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
UPDATE pt_p1 SET b = 45, a=14 WHERE b = 22;
SELECT TABLEOID::REGCLASS AS PARTITION_NAME,* FROM pt;
ROLLBACK;
--===
--creating test dataset
ALTER TABLE pt_p1 ADD constraint pt_p1_check check(c < 560);
ALTER TABLE pt_p1_p1 add CONSTRAINT pt_p1_p1_uk UNIQUE (c);
ALTER TABLE pt_p1_p2 ADD constraint

Re: [HACKERS] Tuple-routing for certain partitioned tables not working as expected

2017-08-07 Thread Etsuro Fujita

On 2017/08/07 13:11, Amit Langote wrote:> The patch looks good too. 
Although, looking at the following hunk:


+   Assert(partrel->rd_rel->relkind == RELKIND_RELATION ||
+  partrel->rd_rel->relkind == RELKIND_FOREIGN_TABLE);
+
/*
 * Verify result relation is a valid target for the current 
operation.
 */
!   if (partrel->rd_rel->relkind == RELKIND_RELATION)
!   CheckValidResultRel(partrel, CMD_INSERT);

makes me now wonder if we need the CheckValidResultRel check at all.  The
only check currently in place for RELKIND_RELATION is
CheckCmdReplicaIdentity(), which is a noop (returns true) for inserts anyway.


Good point!  I left the verification for a plain table because that is 
harmless but as you mentioned, that is nothing but an overhead.  So, 
here is a new version which removes the verification at all from 
ExecSetupPartitionTupleRouting.


Best regards,
Etsuro Fujita
*** /dev/null
--- b/contrib/file_fdw/data/list1.csv
***
*** 0 
--- 1,2 
+ 1,foo
+ 1,bar
*** /dev/null
--- b/contrib/file_fdw/data/list2.bad
***
*** 0 
--- 1,2 
+ 2,baz
+ 1,qux
*** /dev/null
--- b/contrib/file_fdw/data/list2.csv
***
*** 0 
--- 1,2 
+ 2,baz
+ 2,qux
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***
*** 162,167  SELECT tableoid::regclass, * FROM agg FOR UPDATE;
--- 162,188 
  ALTER FOREIGN TABLE agg_csv NO INHERIT agg;
  DROP TABLE agg;
  
+ -- declarative partitioning tests
+ SET ROLE regress_file_fdw_superuser;
+ CREATE TABLE pt (a int, b text) partition by list (a);
+ CREATE FOREIGN TABLE p1 partition of pt for values in (1) SERVER file_server
+ OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',');
+ CREATE TABLE p2 partition of pt for values in (2);
+ SELECT tableoid::regclass, * FROM pt;
+ SELECT tableoid::regclass, * FROM p1;
+ SELECT tableoid::regclass, * FROM p2;
+ COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter 
','); -- ERROR
+ COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
+ SELECT tableoid::regclass, * FROM pt;
+ SELECT tableoid::regclass, * FROM p1;
+ SELECT tableoid::regclass, * FROM p2;
+ INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
+ INSERT INTO pt VALUES (2, 'xyzzy');
+ SELECT tableoid::regclass, * FROM pt;
+ SELECT tableoid::regclass, * FROM p1;
+ SELECT tableoid::regclass, * FROM p2;
+ DROP TABLE pt;
+ 
  -- privilege tests
  SET ROLE regress_file_fdw_superuser;
  SELECT * FROM agg_text ORDER BY a;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***
*** 289,294  SELECT tableoid::regclass, * FROM agg FOR UPDATE;
--- 289,375 
  
  ALTER FOREIGN TABLE agg_csv NO INHERIT agg;
  DROP TABLE agg;
+ -- declarative partitioning tests
+ SET ROLE regress_file_fdw_superuser;
+ CREATE TABLE pt (a int, b text) partition by list (a);
+ CREATE FOREIGN TABLE p1 partition of pt for values in (1) SERVER file_server
+ OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',');
+ CREATE TABLE p2 partition of pt for values in (2);
+ SELECT tableoid::regclass, * FROM pt;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+ (2 rows)
+ 
+ SELECT tableoid::regclass, * FROM p1;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+ (2 rows)
+ 
+ SELECT tableoid::regclass, * FROM p2;
+  tableoid | a | b 
+ --+---+---
+ (0 rows)
+ 
+ COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter 
','); -- ERROR
+ ERROR:  cannot route inserted tuples to a foreign table
+ CONTEXT:  COPY pt, line 2: "1,qux"
+ COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
+ SELECT tableoid::regclass, * FROM pt;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+  p2   | 2 | baz
+  p2   | 2 | qux
+ (4 rows)
+ 
+ SELECT tableoid::regclass, * FROM p1;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+ (2 rows)
+ 
+ SELECT tableoid::regclass, * FROM p2;
+  tableoid | a |  b  
+ --+---+-
+  p2   | 2 | baz
+  p2   | 2 | qux
+ (2 rows)
+ 
+ INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
+ ERROR:  cannot route inserted tuples to a foreign table
+ INSERT INTO pt VALUES (2, 'xyzzy');
+ SELECT tableoid::regclass, * FROM pt;
+  tableoid | a |   b   
+ --+---+---
+  p1   | 1 | foo
+  p1   | 1 | bar
+  p2   | 2 | baz
+  p2   | 2 | qux
+  p2   | 2 | xyzzy
+ (5 rows)
+ 
+ SELECT tableoid::regclass, * FROM p1;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+ (2 rows)
+ 
+ SELECT tableoid::regclass, * FROM p2;
+  tableoid | a |   b   
+ --+---+---
+  p2

Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash

2017-08-07 Thread Craig Ringer

On 7 August 2017 at 14:04, Thomas Munro 
wrote:

> On Fri, Jul 21, 2017 at 7:17 PM, Kyotaro HORIGUCHI
>  wrote:
> > In vac_truncate_clog, TruncateCLOG is called before
> > SetTransactionIdLimit, which advances
> > ShmemVariableCache->oldestXid. Given that the assertion in
> > TruncateCLOG is valid, they should be called in reverse order. I
> > suppose that CLOG files can be safely truncated after advancing
> > XID limits.
>
> If we keep the assertion by changing the order of changes to match the
> comment like this, then don't we still have a problem if another
> backend moves it backwards because of the data race I mentioned?  That
> too could be fixed (perhaps by teaching SetTransactionIdLimit not to
> overwrite higher values), but it sounds like the assertion might be a
> mistake.
> 
>

I think so - specifically, that it's a leftover from a revision where the
xid limit was advanced before clog truncation.

I'll be finding time in the next couple of days to look more closely and
ensure that's all it is.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash

2017-08-07 Thread Thomas Munro

On Fri, Jul 21, 2017 at 7:17 PM, Kyotaro HORIGUCHI
 wrote:
> In vac_truncate_clog, TruncateCLOG is called before
> SetTransactionIdLimit, which advances
> ShmemVariableCache->oldestXid. Given that the assertion in
> TruncateCLOG is valid, they should be called in reverse order. I
> suppose that CLOG files can be safely truncated after advancing
> XID limits.

If we keep the assertion by changing the order of changes to match the
comment like this, then don't we still have a problem if another
backend moves it backwards because of the data race I mentioned?  That
too could be fixed (perhaps by teaching SetTransactionIdLimit not to
overwrite higher values), but it sounds like the assertion might be a
mistake.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

64 matches

Mail list logo