date:20161213

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-13 Thread Michael Paquier

On Wed, Dec 14, 2016 at 4:34 PM, Ashutosh Sharma  wrote:
> Attached is the patch based on Amit's explanation above. Please have a
> look and let me know for any concerns. Thank you Micheal and Andres
> for sharing your thoughts and Amit for your valuable inputs.

 }
+event = &set->events[0];
+event->events &= ~(WL_SOCKET_READABLE);
 #ifndef WIN32

I bet that this patch breaks many things for any non-WIN32 platform.
What I think you want to do is modify the flag events associated to
the socket read/write event to be updated in WaitEventAdjustWin32(),
which gets called in ModifyWaitEvent(). By the way, position 0 refers
to a socket for FeBeWaitSet, but that's a mandatory thing when a list
of events is created with AddWaitEventToSet.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-13 Thread Ashutosh Sharma

Hi,

> Okay, but I think we need to re-enable the existing event handle for
> required event (FD_READ) by using WSAEventSelect() to make it work
> sanely.  We have tried something on those lines and it resolved the
> problem.  Ashutosh will post a patch on those lines later today.  Let
> us know if you have something else in mind.

Attached is the patch based on Amit's explanation above. Please have a
look and let me know for any concerns. Thank you Micheal and Andres
for sharing your thoughts and Amit for your valuable inputs.


reenable_event_handle.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-13 Thread Etsuro Fujita


On 2016/12/09 19:46, Maksim Milyutin wrote:

I would like to work on two tasks:
 - insert (and eventually update) tuple routing for foreign partition.
 - the ability to create an index on the parent and have all of the
children inherit it;

The first one has been implemented in pg_pathman somehow, but the code
relies on dirty hacks, so the FDW API has to be improved. As for the
extended index support, it doesn't look like a super-hard task.


That would be great!  I'd like to help review the first one.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2016-12-13 Thread Robert Haas

On Tue, Dec 13, 2016 at 11:11 PM, Pavan Deolasee
 wrote:
> To be fair to myself, I did try to find patches with equal or more
> complexity. But most of them had (multiple) reviewers assigned and were
> being discussed for weeks and months. I did not think I could contribute
> positively to those discussions, mostly because meaningful review at that
> point would require much more in-depth knowledge of the area. So I picked
> couple of patches which did not have any reviewers. May be not the best
> decision, but I did what I thought was correct.

I'm a little tired right now because it's almost 2am here, so maybe I
should go to bed instead of writing emails when I might be excessively
grumpy, but I just don't buy this.  There are indeed a number of
patches which have provoked lots of discussion already, but there are
also many that have had virtually no review at all.  Many of the
parallel query patches fall into this category, and there are lots of
others.  CommitFest 2017-01 currently has 62 patches in the "Needs
Review" state, and it is just not the case that all 62 of those are
under active discussion and review.  I bet there are a dozen that have
had no replies at all and another dozen that have had only a handful
of fairly casual replies without any in-depth discussion.  Maybe more.

Now, I will agree that diving into a patch in an area that you've
never seen before can be challenging and intimidating.  But you're not
asking any less for your own patch.  You would like someone to dive
into your patch and figure it out, whether they know that area well or
not, and let's face it: most people don't.  I don't.  I've read the
email threads about WARM, but the concept hasn't sunk in for me yet.
I understand HOT and I understand the heap and indexes pretty well so
I'll figure it out eventually, but I don't have it figured out now and
that's only going to get solved if I go put in a bunch of time to go
learn.  I imagine virtually everyone is in the same position.
Similarly, I just spent a ton of time reviewing Amit Kapila's work on
hash indexes and I don't know a heck of a lot about hash indexes --
virtually nobody does, because that area has been largely ignored for
the last decade or so.  Yet if everybody uses that as an excuse, then
we don't get anywhere.

To put that another way, reviewing patches is hard, but we still have
to do it if we want to produce a quality release with good features.
If we don't do a good job reviewing and therefore don't commit things,
we'll have a disappointing release.  If we don't do a good job
reviewing but commit things anyway, then we'll have a horribly buggy
release.  If the patch we fail to review adequately happens to be
WARM, then those will very possibly be data-corrupting bugs, which are
particularly bad for our project's reputation for reliability.  Those
aren't good options, so we had better try to make sure that we do lots
of high-quality review of that patch and all the others.  If we can't
turn to the people who are writing the patches to help review them,
even though that's difficult, then I think that much-needed review
won't happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-13 Thread Robert Haas

On Tue, Dec 13, 2016 at 1:00 PM, Ian Jackson  wrote:
> The conversion is as follows: if a scenario is affected by the caveat,
> in there must be at least one transaction T which firstly produces
> "impossible" results I, and in which some later statement S produces a
> serialisation failure.
>
> To exhibit the corresponding unavoidable bug: Execute an identical
> scenario, with exactly the same sequence of steps in the same order,
> up to S.  However, instead of S, execute ROLLBACK.

I am having a hard time understanding exactly what the argument on
this thread is about, but I want to comment on this point.

Saying that a set of transactions are serializable is equivalent to
the statement that there is some serial order of execution which would
have produced results equivalent to the actual results.  That is,
there must be at least one schedule (T1, ..., Tn) such that running
the transactions one after another in that order would have produced
the same results that were obtained by running them concurrently.
Any transactions that roll back whether due to serialization anomalies
or manual user intervention or any other reason are not part of the
schedule, which considers only successfully committed transactions.
The rolled-back transactions effectively didn't happen, and
serializability as a concept makes no claim about the behavior of such
transactions.  That's not a PostgreSQL thing: that's database theory.

However, in practice, the scenario that you mention should generally
work fine in PostgreSQL, I think.  If T performed any writes, the
rollback throws them away, so imagine removing the actual T from the
mix and replacing it with a substitute version T' which performs the
same reads but no writes and then tries to COMMIT where T tried to
ROLLBACK.  T' will succeed, because we never roll back a read-only
transaction at commit time.  If it were to fail, it would have to fail
*while performing one of the reads*, not later.

But imagine a hypothetical database system in which anomalies are
never detected until commit time.  We somehow track the global
must-happen-before graph and refuse the commit of any transaction
which will create a cycle.  Let's also suppose that this system uses
snapshots to implement MVCC.  In such a system, read-only transactions
will sometimes fail at commit time if they've seen a view of the world
that is inconsistent with the only remaining possible serial
schedules.  For example, suppose T1 updates A -> A' and reads B.
Concurrently, T2 updates B -> B'; thus, T1 must precede T2.  Next, T2
commits.  Now, T3 begins and reads B', so that T2 must precede T3.
Next T1 commits.  T3, which took its snapshot before the commit of T1,
now reads A.  Thus T3 has to proceed T1.  That's a cycle, so T3 won't
be allowed to commit, but yet it's done a couple of reads and hasn't
failed yet...  because of an implementation detail of the system.
That's probably annoying from a user perspective -- if a transaction
is certainly going to fail we'd like to report the failure as early as
possible -- and it's probably crushingly slow, but according to my
understanding it's unarguably a correct implementation of
serializability (assuming there are no bugs), yet it doesn't deliver
the guarantee you're asking for here.

I have not read any database literature on the interaction of
serializability with subtransactions.  This seems very thorny.
Suppose T1 reads A and B and updates A -> A' while concurrently T2
reads A and B and updates B -> B'.  This is obviously not
serializable; if either transaction had executed before the other in a
serial schedule, the second transaction in the schedule would have had
to have seen (A, B') or (A', B) rather than (A, B), but that's not
what happened.  But what if each of T1 and T2 did the reads in a
subtransaction, rolled it back, and then did the write in the main
transaction and committed?  The database system has two options.
First, it could assume that the toplevel transaction may have relied
on the results of the aborted subtransaction.  But if it does that,
then any serialization failure which afflicts a subtransaction must
necessarily also kill the main transaction, which seems pedantic and
unhelpful.  If you'd wanted the toplevel transaction to be killled,
you wouldn't have used a subtransaction, right?  Second, it could
assume that the toplevel transaction in no way relied on or used the
values obtained from the aborted subtransaction.  However, that
defeats the whole purpose of having subtransactions in the first
place.  What's the point of being able to start subtransactions if you
can't roll them back and then decide to do something else instead?  It
seems to me that what the database system should do is make that
second assumption, and what the user should do is understand that to
the degree that transactions depend on the results of aborted
subtransactions, there may be serialization anomalies that go
undetected.  And the user should put up with that because t

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-13 Thread Amit Kapila

On Tue, Dec 13, 2016 at 9:49 PM, Andres Freund  wrote:
> On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
>> OK, I think that I have spotted an issue after additional read of the
>> code. When a WSA event is used for read/write activity on a socket,
>> the same WSA event gets reused again and again. That's fine for
>> performance reasons
>
> It's actually also required to not loose events,
> i.e. correctness. Windows events are edge, not level triggered.
>

Are all Windows events edge triggered?  What I read from msdn [1], it
doesn't appear so.  I am quoting text from msdn page [1] which seems
to be saying the event FD_READ we use in this case is level triggered.
"For FD_READ, FD_OOB, and FD_ACCEPT network events, network event
recording and event object signaling are level-triggered."

> The
> previous implementation wasn't correct.  So just resetting things ain't
> ok.
>

Okay, but I think we need to re-enable the existing event handle for
required event (FD_READ) by using WSAEventSelect() to make it work
sanely.  We have tried something on those lines and it resolved the
problem.  Ashutosh will post a patch on those lines later today.  Let
us know if you have something else in mind.

[1] - 
https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2016-12-13 Thread Pavan Deolasee

On Thu, Dec 8, 2016 at 3:11 AM, Robert Haas  wrote:

> On Thu, Oct 20, 2016 at 11:30 AM, Pavan Deolasee
>  wrote:
> > We have a design to convert WARM chains back to HOT and that will
> increase
> > the percentage of WARM updates much beyond 50%. I was waiting for
> feedback
> > on the basic patch before putting in more efforts, but it went unnoticed
> > last CF.
>
> While you did sign up to review one patch in the last CF, the amount
> of review you did for that patch is surely an order of magnitude less
> than what WARM will require.  Maybe more than that.  I don't mean to
> point the finger at you specifically -- there are lots of people
> slinging patches into the CommitFest who aren't doing as much review
> as their own patches will require.

I understand the point you're trying to make and I am not complaining that
WARM did not receive a review, even though I would very much like that to
happen because I see it's a right step forward in solving some of the
problems Postgres users face. But I also understand that every committer
and developer is busy with the stuff that they see important and
interesting.

To be fair to myself, I did try to find patches with equal or more
complexity. But most of them had (multiple) reviewers assigned and were
being discussed for weeks and months. I did not think I could contribute
positively to those discussions, mostly because meaningful review at that
point would require much more in-depth knowledge of the area. So I picked
couple of patches which did not have any reviewers. May be not the best
decision, but I did what I thought was correct.

I'll try to do more review in the next CF. Obviously that does not
guarantee that WARM will see a reviewer, but hopefully I would have done my
bit.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Declarative partitioning - another take

2016-12-13 Thread Venkata B Nagothi

On Mon, Dec 12, 2016 at 3:06 PM, Amit Langote  wrote:

>
> Hi,
>
> On 2016/12/11 10:02, Venkata B Nagothi wrote:
> > On Fri, Dec 9, 2016 at 11:11 PM, Amit Langote 
> > wrote:
> >> On Fri, Dec 9, 2016 at 3:16 PM, Venkata B Nagothi 
> >> wrote:
> >>> I am testing the partitioning feature from the latest master and got
> the
> >>> following error while loading the data -
> >>>
> >>> db01=# create table orders_y1993 PARTITION OF orders FOR VALUES FROM
> >>> ('1993-01-01') TO ('1993-12-31');
> >>> CREATE TABLE
> >>>
> >>> db01=# copy orders from '/data/orders-1993.csv' delimiter '|';
> >>> ERROR:  could not read block 6060 in file "base/16384/16412": read only
> >> 0 of
> >>> 8192 bytes
> >>> CONTEXT:  COPY orders, line 376589:
> >>> "9876391|374509|O|54847|1997-07-16|3-MEDIUM
> >>  |Clerk#01993|0|ithely
> >>> regular pack"
> >>
> >> Hmm.   Could you tell what relation the file/relfilenode 16412 belongs
> to?
> >>
> >
> > db01=# select relname from pg_class where relfilenode=16412 ;
> >relname
> > --
> >  orders_y1997
> > (1 row)
> >
> >
> > I VACUUMED the partition and then re-ran the copy command and no luck.
> >
> > db01=# vacuum orders_y1997;
> > VACUUM
> >
> > db01=# copy orders from '/data/orders-1993.csv' delimiter '|';
> > ERROR:  could not read block 6060 in file "base/16384/16412": read only 0
> > of 8192 bytes
> > CONTEXT:  COPY orders, line 376589:
> > "9876391|374509|O|54847|1997-07-16|3-MEDIUM
>  |Clerk#01993|0|ithely
> > regular pack"
> >
> > I do not quite understand the below behaviour as well. I VACUUMED 1997
> > partition and then i got an error for 1992 partition and then after 1996
> > and then after 1994 and so on.
> >
>
> [ ... ]
>
> > db01=# vacuum orders_y1997;
> > VACUUM
> > db01=# copy orders from '/data/orders-1993.csv' delimiter '|';
> > ERROR:  could not read block 6060 in file "base/16384/16412": read only 0
> > of 8192 bytes
> > CONTEXT:  COPY orders, line 376589:
> > "9876391|374509|O|54847|1997-07-16|3-MEDIUM
>  |Clerk#01993|0|ithely
> > regular pack"
> > db01=#
> >
> > Am i not understanding anything here ?
>
> I could not reproduce this issue.  Also, I could not say what might have
> gone wrong based only on the information I have seen so far.
>
> Have you tried inserting the same data using insert?
>

I can load the data into appropriate partitions using INSERT. So, no issues
there.

db01=# CREATE TABLE orders2(
o_orderkey INTEGER,
o_custkey INTEGER,
o_orderstatus CHAR(1),
o_totalprice REAL,
o_orderdate DATE,
o_orderpriority CHAR(15),
o_clerk CHAR(15),
o_shippriority INTEGER,
o_comment VARCHAR(79)) partition by (o_orderdate);

*db01=# insert into orders2 select * from orders where
o_orderdate='1995-10-11';*
*INSERT 0 3110*


> create table orders_unpartitioned (like orders);
> copy orders_unpartitioned from '/data/orders-1993.csv';
> insert into orders select * from orders_unpartitioned;
>

Loading the data into a normal table is not an issue (infact the csv is
generated from the table itself)

The issue is occurring only when i am trying to load the data from CSV file
into a partitioned table -

db01=# CREATE TABLE orders_y1992
PARTITION OF orders2 FOR VALUES FROM ('1992-01-01') TO ('1992-12-31');
CREATE TABLE
db01=# copy orders2 from '/data/orders-1993.csv' delimiter '|';
ERROR:  could not read block 6060 in file "base/16384/16407": read only 0
of 8192 bytes
CONTEXT:  COPY orders2, line 376589:
"9876391|374509|O|54847|1997-07-16|3-MEDIUM   |Clerk#01993|0|ithely
regular pack"

Not sure why COPY is failing.

Regards,

Venkata B N
Database Consultant

Re: [HACKERS] pg_background contrib module proposal

2016-12-13 Thread Craig Ringer

On 13 Dec. 2016 20:54, "amul sul"  wrote:


postgres=> select * from pg_background_result(67069) as (x text);
ERROR:  terminating connection due to administrator command
CONTEXT:  background worker, pid 67069
postgres=>


It'll also want to handle cancellation due to conflict with recovery if you
intend it to be used on a standby, probably by making use
of procsignal_sigusr1_handler. The rest of the work is done by
CHECK_FOR_INTERRUPTS() .

This only matters if it's meant to work on standbys of course. I haven't
checked if you write to catalogs or otherwise do non-standby-friendly
things.

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

2016-12-13 Thread Kevin Grittner

On Mon, Dec 12, 2016 at 4:46 PM, Thomas Munro
 wrote:
> On Tue, Dec 13, 2016 at 10:47 AM, Kevin Grittner  wrote:

>> Barring objections I will back-patch to 9.2 through 9.5 tomorrow.
>> (9.1 is out of support and the fix is already in 9.6 and forward.)
>
> +1

Done.

I will add a test case based on Ian's work to the master branch to
show the bug if this were to be reverted, but first want to look
for similar cases, so we know the scope of the issue and we fix
what we can.  I probably won't get to that until early next month,
since I'm on vacation the last two weeks in December and am trying
to tie up some loose ends on other things first.

Thanks, Ian, for the example to show that this should be considered
a bug!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-13 Thread Michael Paquier

On Wed, Dec 14, 2016 at 1:14 AM, Ashutosh Sharma  wrote:
> I have been working on this issue for last few days and it seems like
> i have almost found the reason for this failure. My findings till date
> are as follows.
>
> Just to confirm if the problem is happening due to reusability of
> WaitEventSet structure, I had replaced  a function call
> ModifyWaitEvent() and WaitEventSetWait() which makes use of an already
> existing  WaitEventSet to store event handles with WaitLatchOrSocket()
> and it worked. Actually WaitLatchOrSocket() creates a new WaitEventSet
> structure every time it has to wait for an event and is also being
> used in the old code. This clearly shows that we do have some problem
> just because we are reusing the same set of object handles throughput
> a backend session. I am further investigating on this and would post
> the final analysis along with the fix very  soon. Attached is the
> patch that has the changes described
> above. Any suggestions or inputs would be appreciated.

So it would mean that there is a mismatch between what should be used
for the wait event and what is actually used. One simple idea if you
want to compare both would be to generate some elog(LOG) entries or
whatever on the fields we are interesting on and see what is
mismatching. That may help in understanding what's wrong in the wait
events used.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-12-13 Thread Petr Jelinek

On 13/12/16 21:42, Peter Eisentraut wrote:
> On 12/10/16 2:48 AM, Petr Jelinek wrote:
>> Attached new version with your updates and rebased on top of the current
>> HEAD (the partitioning patch produced quite a few conflicts).
> 
> I have attached a few more "fixup" patches, mostly with some editing of
> documentation and comments and some compiler warnings.
> 
> In 0006 in the protocol documentation I have left a "XXX ???" where I
> didn't understand what it was trying to say.
> 

Okay I'll address that separately, thanks.

> All issues from (my) previous reviews appear to have been addressed.
> 
> Comments besides that:
> 
> 
> 0003-Add-SUBSCRIPTION-catalog-and-DDL-v12.patch
> 
> Still wondering about the best workflow with pg_dump, but it seems all
> the pieces are there right now, and the interfaces can be tweaked later.

Right, either way there needs to be some special handling for
subscriptions, having to request them specifically seems safest option
to me, but I am open to suggestions there.

> 
> DROP SUBSCRIPTION requires superuser, but should perhaps be owner check
> only?
> 

Hmm not sure that it requires superuser, I actually think it mistakenly
didn't require anything. In any case will make sure it just does owner
check.

> DROP SUBSCRIPTION IF EXISTS crashes if the subscription does not in fact
> exist.
>

Right, missing return.

> Maybe write the grammar so that SLOT does not need to be a new key word.
>  The changes you made for CREATE PUBLICATION should allow that.
> 

Hmm how would that look like? The opt_drop_slot would become IDENT
IDENT? Or maybe you want me to add the WITH (definition) kind of thing?

> The tests are not added to serial_schedule.  Intentional?  If so, document?
> 

Not intentional, will fix. Never use it, easy to forget about it.

> 
> 0004-Define-logical-replication-protocol-and-output-plugi-v12.patch
> 
> Not sure why pg_catalog is encoded as a zero-length string.  I guess it
> saves some space.  Maybe that could be explained in a brief code comment?
> 

Yes it's to save space, mainly for built-in types.

> 
> 0005-Add-logical-replication-workers-v12.patch
> 
> The way the executor stuff is organized now looks better to me.
> 
> The subscriber crashes if max_replication_slots is 0:
> 
> TRAP: FailedAssertion("!(max_replication_slots > 0)", File: "origin.c",
> Line: 999)
> 
> The documentation says that replication slots are required on the
> subscriber, but from a user's perspective, it's not clear why that is.

Yeah honestly I think origins should not depend on
max_replication_slots. They are not really connected (you can have many
of one and none of the other and vice versa). Also max_replication_slots
should IMHO default to max_wal_senders at this point. (In ideal world
all of those 3 would be in DSM instead of SHM and only governed by some
implementation maximum which is probably 2^16 and the GUCs would be removed)

But yes as it is, we should check for that, probably both during CREATE
SUBSCRIPTION and during apply start.

> 
> Dropping a table that is part of a live subscription results in log
> messages like
> 
> WARNING:  leaked hash_seq_search scan for hash table 0x7f9d2a807238
> 
> I was testing replicating into a temporary table, which failed like this:
> 
> FATAL:  the logical replication target public.test1 not found
> LOG:  worker process:  (PID 2879) exited with exit code 1
> LOG:  starting logical replication worker for subscription 16392
> LOG:  logical replication apply for subscription mysub started
> 
> That's okay, but those messages were repeated every few seconds or so
> and would create quite some log volume.  I wonder if that needs to be
> reigned in somewhat.

It retries every 5s or so I think, I am not sure how that could be
improved besides using the wal_retrieve_retry_interval instead of
hardcoded 5s (or maybe better add GUC for apply). Maybe some kind of
backoff algorithm could be added as well.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-12-13 Thread Petr Jelinek

On 13/12/16 22:05, Andres Freund wrote:
> Hi,
> 
> On 2016-12-13 15:42:17 -0500, Peter Eisentraut wrote:
>> I think this is getting very close to the point where it's committable.
>> So if anyone else has major concerns about the whole approach and
>> perhaps the way the new code in 0005 is organized, now would be the time ...
> 
> Uh. The whole cache invalidation thing is completely unresolved, and
> that's just the publication patch. I've not looked in detail at later
> patches.  So no, I don't think so.
> 

I already have code for that. I'll submit next version once I'll go over
PeterE's review. BTW the relcache thing is not as bad as it seems from
the publication patch because output plugin has to deal with
relcache/publication cache invalidations, it handles most of the updates
correctly. But there was still problem in terms of the write filtering
so the publications still have to reset relcache too.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-13 Thread Michael Paquier

On Wed, Dec 14, 2016 at 9:25 AM, Kyotaro HORIGUCHI
 wrote:
> Ah, sorry for the confusion.
>
> At Tue, 13 Dec 2016 22:43:22 +0900, Michael Paquier 
>  wrote in 
> 
>> On Tue, Dec 13, 2016 at 7:15 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Thank you for the comment.
>>
>> Er, it is good to see more people interested in this problem... As the
>> former author, still working actively on this patch, perhaps it would
>> be better if I continue to code this patch, no? It is a waste to step
>> on each other's toes.
>
> You are the author of this patch. I addressed that since the
> comments are mainly on typos, not function. I think you do better
> than me so the v6 from me should be abandoned.

No problem. Thanks for the help!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-12-13 Thread Petr Jelinek

On 14/12/16 01:26, Peter Eisentraut wrote:
> On 12/12/16 7:33 PM, Andres Freund wrote:
>> +-- test switching between slots in a session
>> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 
>> 'test_decoding', true);
>> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 
>> 'test_decoding', true);
>> +SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL);
>> +SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL);
>>
>> Can we actually output something? Right now this doesn't test that
>> much...
> 
> This test was added because an earlier version of the patch would crash
> on this.
> 

I did improve the test as part of the tests improvements that were sent
to committers list btw.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-13 Thread Kyotaro HORIGUCHI

Ah, sorry for the confusion.

At Tue, 13 Dec 2016 22:43:22 +0900, Michael Paquier  
wrote in 
> On Tue, Dec 13, 2016 at 7:15 PM, Kyotaro HORIGUCHI
>  wrote:
> > Thank you for the comment.
> 
> Er, it is good to see more people interested in this problem... As the
> former author, still working actively on this patch, perhaps it would
> be better if I continue to code this patch, no? It is a waste to step
> on each other's toes.

You are the author of this patch. I addressed that since the
comments are mainly on typos, not function. I think you do better
than me so the v6 from me should be abandoned.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-12-13 Thread Peter Eisentraut

On 12/12/16 7:33 PM, Andres Freund wrote:
> +-- test switching between slots in a session
> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 
> 'test_decoding', true);
> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 
> 'test_decoding', true);
> +SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL);
> +SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL);
> 
> Can we actually output something? Right now this doesn't test that
> much...

This test was added because an earlier version of the patch would crash
on this.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2016-12-13 Thread Michael Paquier

On Tue, Dec 13, 2016 at 11:40:54AM -0500, Robert Haas wrote:
> Let's confine ourselves to fixing one problem at a time.  I think we
> can get where we want to be in this case by adding one new column and
> some new rows to pg_stat_activity.

Agreed. Let's also remove the abuse of WAL senders with the query field
at the same time.

> Michael, is that something you're
> going to do?  If not, one of my esteemed colleagues here at
> EnterpriseDB will have a try.

If I had received feedback on the other thread, I would have coded a
proposal of patch already. But as long as SCRAM is not done I will
restrain from taking an extra project. I am fine to do reviews as I already
looked at ways to solve the problem though. So if anybody has room to do
it please be my guest.

Regarding the way to solve things, I think that having in ProcGlobal an
array of PGPROC entries for each auxiliary process is the way to go, the
position of each entry in the array defining what the process type is.
That would waste some shared memory, but we are not talking about that
much here. That would as well remove the need of having checkpointerLatch,
walWriteLatch and the startup fields in ProcGlobal.
-- 
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] Parallel safety of CURRENT_* family

2016-12-13 Thread Robert Haas

On Thu, Dec 1, 2016 at 3:46 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Dec 1, 2016 at 2:32 PM, Tom Lane  wrote:
>>> ... well, they would be if we passed down xactStartTimestamp to parallel
>>> workers, but I can't find any code that does that.  In view of the fact that
>>> transaction_timestamp() is marked as parallel-safe, this is a bug in 9.6.
>
>> Yeah.  Do you think we should arrange to pass that down, or change the 
>> marking?
>
> We can't fix the marking in existing 9.6 installations, so I think we
> have to pass it down.  (Which would be a better response anyway.)

I happened across this thread today and took a look at what it would
take to fix this.  I quickly ran up against the fact that
SerializeTransactionState() and RestoreTransactionState() are not
exactly brilliantly designed, relying on the notion that each
individual value that we want to serialize will be no wider than a
TransactionId, which won't be true for timestamps.  Even apart from
that, the whole design of those functions is pretty lame, and I'm
pretty sure I wrote all of that code myself, so I have nobody to blame
but me.  Anyway, here's a proposed patch to refactor that code into
something a little more reasonable.  It doesn't fix the actual problem
here, but I think it's a good first step.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d643216..9c13717 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -65,6 +65,23 @@
 #include "utils/timestamp.h"
 #include "pg_trace.h"
 
+/*
+ * Serialization format for transaction state information.
+ */
+typedef struct SerializedTransactionState
+{
+	int			xactIsoLevel;
+	bool		xactDeferrable;
+	TransactionId xactTopTransactionId;
+	TransactionId currentTransactionId;
+	CommandId	currentCommandId;
+	int			nCurrentXids;
+	TransactionId currentXids[FLEXIBLE_ARRAY_MEMBER];
+}	SerializedTransactionState;
+
+#define SIZE_OF_SERIALIZED_TRANSACTION_STATE(n_current_xids) \
+	offsetof(SerializedTransactionState, currentXids) + \
+	(n_current_xids * sizeof(TransactionId))
 
 /*
  *	User-tweakable parameters
@@ -4812,8 +4829,7 @@ Size
 EstimateTransactionStateSpace(void)
 {
 	TransactionState s;
-	Size		nxids = 6;		/* iso level, deferrable, top & current XID,
- * command counter, XID count */
+	Size		nxids = 0;
 
 	for (s = CurrentTransactionState; s != NULL; s = s->parent)
 	{
@@ -4823,7 +4839,7 @@ EstimateTransactionStateSpace(void)
 	}
 
 	nxids = add_size(nxids, nParallelCurrentXids);
-	return mul_size(nxids, sizeof(TransactionId));
+	return SIZE_OF_SERIALIZED_TRANSACTION_STATE(nxids);
 }
 
 /*
@@ -4847,16 +4863,17 @@ SerializeTransactionState(Size maxsize, char *start_address)
 	TransactionState s;
 	Size		nxids = 0;
 	Size		i = 0;
-	Size		c = 0;
 	TransactionId *workspace;
-	TransactionId *result = (TransactionId *) start_address;
+	SerializedTransactionState *result;
+
+	result = (SerializedTransactionState *) start_address;
 
-	result[c++] = (TransactionId) XactIsoLevel;
-	result[c++] = (TransactionId) XactDeferrable;
-	result[c++] = XactTopTransactionId;
-	result[c++] = CurrentTransactionState->transactionId;
-	result[c++] = (TransactionId) currentCommandId;
-	Assert(maxsize >= c * sizeof(TransactionId));
+	Assert(maxsize >= SIZE_OF_SERIALIZED_TRANSACTION_STATE(0));
+	result->xactIsoLevel = XactIsoLevel;
+	result->xactDeferrable = XactDeferrable;
+	result->xactTopTransactionId = XactTopTransactionId;
+	result->currentTransactionId = CurrentTransactionState->transactionId;
+	result->currentCommandId = currentCommandId;
 
 	/*
 	 * If we're running in a parallel worker and launching a parallel worker
@@ -4865,9 +4882,10 @@ SerializeTransactionState(Size maxsize, char *start_address)
 	 */
 	if (nParallelCurrentXids > 0)
 	{
-		result[c++] = nParallelCurrentXids;
-		Assert(maxsize >= (nParallelCurrentXids + c) * sizeof(TransactionId));
-		memcpy(&result[c], ParallelCurrentXids,
+		result->nCurrentXids = nParallelCurrentXids;
+		Assert(maxsize >=
+			   SIZE_OF_SERIALIZED_TRANSACTION_STATE(nParallelCurrentXids));
+		memcpy(&result->currentXids, ParallelCurrentXids,
 			   nParallelCurrentXids * sizeof(TransactionId));
 		return;
 	}
@@ -4882,7 +4900,7 @@ SerializeTransactionState(Size maxsize, char *start_address)
 			nxids = add_size(nxids, 1);
 		nxids = add_size(nxids, s->nChildXids);
 	}
-	Assert((c + 1 + nxids) * sizeof(TransactionId) <= maxsize);
+	Assert(SIZE_OF_SERIALIZED_TRANSACTION_STATE(nxids) <= maxsize);
 
 	/* Copy them to our scratch space. */
 	workspace = palloc(nxids * sizeof(TransactionId));
@@ -4900,8 +4918,8 @@ SerializeTransactionState(Size maxsize, char *start_address)
 	qsort(workspace, nxids, sizeof(TransactionId), xidComparator);
 
 	/* Copy data into output area. */
-	result[c++] = (TransactionId) nxids;
-	memcpy(&result[c], workspace, nxids * sizeof(TransactionId));
+	result->nCur

Re: [HACKERS] Logical Replication WIP

2016-12-13 Thread Andres Freund

On 2016-12-13 06:55:31 +0100, Petr Jelinek wrote:
> >> This is a quadratic algorithm - that could bite us... Not sure if we
> >> need to care.  If we want to fix it, one approach owuld be to use
> >> RangeVarGetRelid() instead, and then do a qsort/deduplicate before
> >> actually opening the relations.
> >>
> > 
> > I guess it could get really slow only with big inheritance tree, I'll
> > look into how much work is the other way of doing things (this is not
> > exactly hot code path).
> > 
> 
> Actually looking at it, it only processes user input so I don't think
> it's very problematic in terms of performance. You'd have to pass many
> thousands of tables in single DDL to notice.

Well, at least we should put a CHECK_FOR_INTERRUPTS there. At the moment
it's IIRC uninterruptible, which isn't good for something directly
triggered by the user.  A comment that it's known to be O(n^2), but
considered acceptable, would be good too.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-12-13 Thread Andres Freund

Hi,

On 2016-12-13 15:42:17 -0500, Peter Eisentraut wrote:
> I think this is getting very close to the point where it's committable.
> So if anyone else has major concerns about the whole approach and
> perhaps the way the new code in 0005 is organized, now would be the time ...

Uh. The whole cache invalidation thing is completely unresolved, and
that's just the publication patch. I've not looked in detail at later
patches.  So no, I don't think so.

I think after the invalidation issue is resolved the publication patch
might be close to being ready. I'm doubtful the later patches are.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-12-13 Thread Peter Eisentraut

On 12/10/16 2:48 AM, Petr Jelinek wrote:
> Attached new version with your updates and rebased on top of the current
> HEAD (the partitioning patch produced quite a few conflicts).

I have attached a few more "fixup" patches, mostly with some editing of
documentation and comments and some compiler warnings.

In 0006 in the protocol documentation I have left a "XXX ???" where I
didn't understand what it was trying to say.

All issues from (my) previous reviews appear to have been addressed.

Comments besides that:

0003-Add-SUBSCRIPTION-catalog-and-DDL-v12.patch

Still wondering about the best workflow with pg_dump, but it seems all
the pieces are there right now, and the interfaces can be tweaked later.

DROP SUBSCRIPTION requires superuser, but should perhaps be owner check
only?

DROP SUBSCRIPTION IF EXISTS crashes if the subscription does not in fact
exist.

Maybe write the grammar so that SLOT does not need to be a new key word.
 The changes you made for CREATE PUBLICATION should allow that.

The tests are not added to serial_schedule.  Intentional?  If so, document?

0004-Define-logical-replication-protocol-and-output-plugi-v12.patch

Not sure why pg_catalog is encoded as a zero-length string.  I guess it
saves some space.  Maybe that could be explained in a brief code comment?

0005-Add-logical-replication-workers-v12.patch

The way the executor stuff is organized now looks better to me.

The subscriber crashes if max_replication_slots is 0:

TRAP: FailedAssertion("!(max_replication_slots > 0)", File: "origin.c",
Line: 999)

The documentation says that replication slots are required on the
subscriber, but from a user's perspective, it's not clear why that is.

Dropping a table that is part of a live subscription results in log
messages like

WARNING:  leaked hash_seq_search scan for hash table 0x7f9d2a807238

I was testing replicating into a temporary table, which failed like this:

FATAL:  the logical replication target public.test1 not found
LOG:  worker process:  (PID 2879) exited with exit code 1
LOG:  starting logical replication worker for subscription 16392
LOG:  logical replication apply for subscription mysub started

That's okay, but those messages were repeated every few seconds or so
and would create quite some log volume.  I wonder if that needs to be
reigned in somewhat.

I think this is getting very close to the point where it's committable.
So if anyone else has major concerns about the whole approach and
perhaps the way the new code in 0005 is organized, now would be the time ...

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From df169d3fec6d67c3daf301d9ed9903545358dd7e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 12 Dec 2016 12:00:00 -0500
Subject: [PATCH 2/8] fixup! Add PUBLICATION catalogs and DDL

---
 src/backend/commands/publicationcmds.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 954b2bd65d..f1d7404ffe 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -389,7 +389,6 @@ AlterPublication(AlterPublicationStmt *stmt)
 {
 	Relation		rel;
 	HeapTuple		tup;
-	AclResult		aclresult;

 	rel = heap_open(PublicationRelationId, RowExclusiveLock);

@@ -564,12 +563,11 @@ PublicationAddTables(Oid pubid, List *rels, bool if_not_exists,
 	foreach(lc, rels)
 	{
 		Relation	rel = (Relation) lfirst(lc);
-		AclResult	aclresult;
 		ObjectAddress	obj;

 		/* Must be owner of the table or superuser. */
 		if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId()))
-			aclcheck_error(aclresult, ACL_KIND_CLASS,
+			aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
 		   RelationGetRelationName(rel));

 		obj = publication_add_relation(pubid, rel, if_not_exists);
-- 
2.11.0

From 2cab724505487663b53f8af09c8e16a70c52014d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 9 Dec 2016 12:00:00 -0500
Subject: [PATCH 4/8] fixup! Add SUBSCRIPTION catalog and DDL

---
 doc/src/sgml/catalogs.sgml | 28 ++---
 doc/src/sgml/ref/alter_subscription.sgml   | 25 +---
 doc/src/sgml/ref/create_subscription.sgml  | 47 ++
 doc/src/sgml/ref/drop_subscription.sgml| 23 ++-
 doc/src/sgml/ref/psql-ref.sgml |  6 +--
 src/backend/catalog/pg_subscription.c  |  2 +-
 src/backend/commands/subscriptioncmds.c|  6 +--
 src/backend/parser/gram.y  |  2 +-
 .../libpqwalreceiver/libpqwalreceiver.c|  2 +-
 src/backend/tcop/utility.c |  5 +++
 src/bin/pg_dump/pg_dump.c  |  7 +---
 src/test/regress/parallel_schedule |  2 +-
 12 files changed, 75 insertions(+), 80 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src

Re: [HACKERS] New design for FK-based join selectivity estimation

2016-12-13 Thread Tom Lane

ronan.dunk...@dalibo.com writes:
> On mardi 13 décembre 2016 09:10:47 CET Adrien Nayrat wrote:
>> The commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a introduce an
>> estimation error :

> The problem is, for semi and anti joins, we assume that we have nohing to do 
> (costsize.c:4253):

>   else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
>   {
>   /*
>* For JOIN_SEMI and JOIN_ANTI, the selectivity is 
> defined as the
>* fraction of LHS rows that have matches.  If the 
> referenced
>* table is on the inner side, that means the 
> selectivity is 1.0
>* (modulo nulls, which we're ignoring for now).  We 
> already
>* covered the other case, so no work here.
>*/
>   }

> This results in assuming that the whole outerrel will match, no matter the 
> selectivity of the innerrel.

Yeah.  In the terms of this example, the FK means that every outer row
would have a match, if the query were

select * from t3 where j in (select * from t4);

But actually it's

select * from t3 where j in (select * from t4 where j<10);

so of course we should not expect a match for every row.

> If I understand it correctly and the above is right, I think we should ignore 
> SEMI or ANTI joins altogether when considering FKs, and keep the 
> corresponding 
> restrictinfos for later processing since they are already special-cased later 
> on. 

That seems like an overreaction.  While the old code happens to get this
example exactly right, eqjoinsel_semi is still full of assumptions and
approximations, and it doesn't do very well at all if it lacks MCV lists
for both sides.

I'm inclined to think that what we want to have happen in this case is
to estimate the fraction of outer rows having a match as equal to the
selectivity of the inner query's WHERE clauses, ie the semijoin
selectivity should be sizeof(inner result) divided by sizeof(inner
relation).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-13 Thread Kevin Grittner

On Tue, Dec 13, 2016 at 12:00 PM, Ian Jackson  wrote:
> Kevin Grittner writes:

> I still hope to be able to convince you that the definition of
> SERIALIZABLE (in the pgsql docs) ought to be a stronger version, which
> covers even non-committed transactions.

That doesn't seem likely.  The stronger definition would perform so
poorly that even if you managed to convince me, the PostgreSQL
community as a whole has already rejected it.

>   "you must discard any results from a transaction which later throws
>a serialization failure"

You argue that saying "If A happens, you must not use the results"
means that "If A doesn't happen you can use the results."  That
does not logically follow; your argument based on it has no merit.
Basically, a serializable transaction must successfully commit in
order to consider any results from it to be valid.  The quote above
is discussing one way that can happen, which does not preclude
other ways.

>>> 1. Declare that all spurious failures, in SERIALIZABLE transactions,
>>> are bugs.
>>
>> It's not clear to me what you mean here.
>
> I mean that the pgsql documentation should simply say that for
> SERIALIZABLE transactions, there is always a coherent serialisation.
> Even if the transaction failed due to a constraint violation, or
> serialisation failure, or database disconnection, or whatever.
>
> I think there is no "useful" weaker guarantee.

That's wrong on the face of it, and certainly at odds with all
research papers and standards in the area.

> By which I mean that for any reasonable weaker guarantee: any scenario
> in which pgsql violates the stronger guarantee, can be converted into
> a scenario where pgsql violates the weaker guarantee.

You have failed to provide any evidence of that.  Twisting
statements and the principles of logical inference don't advance
your position.

> The fact that pgsql might generate "spurious" constraint violations,
> even in SERIALIZABLE transactions, has been known for many years.

Yes, it was clearly discussed during development.

> See the URLs etc. in my patch, which include a reference from 2009.

This paper from 2007 was among those cited during development of
the PostgreSQL serializable implementation:

Automating the Detection of Snapshot Isolation Anomalies.
Sudhir Jorwekar, Alan Fekete, Krithi Ramamritham, S. Sudarshan.
VLDB ‘07, September 23-28, 2007, Vienna, Austria.
https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/VLDB07-snapshot.pdf

Among relevant statements in that paper:

| The database system ensures the preservation of some integrity
| constraints which are explicitly declared to the system in the
| schema definition, such as uniqueness of primary key and
| referential integrity. Some of the SI anomalies are avoided due
| to the dbms enforcement of these constraints.

Of course, both pre-date development of the feature itself -- which
was released as part of PostgreSQL 9.1, in September, 2011.

> Many people have always regarded this as a bug.  The pgsql
> authors have not agreed.

Right.  For some people, a system has a bug if it does not behave
as they would most like it to.  In my world, it is not a bug if it
is functioning as defined, intended, and documented.  The change
requested is a feature request, and a feature I would also like to
see.  It is not one that anyone has offered to pay to have
developed, which is a significant barrier to implementation.

>> The behavior we have been providing, modulus the bug you just found
>> and any related bugs of the same ilk, is "we don't allow
>> serialization anomalies into the database or in results from
>> serializable transactions which commit."  That is enough to be very
>> useful to many.
>
> As I explain, I think that if you make that promise, you will have to
> do all the work necessary to make the stronger promise anyway.

That is absolutely *not* true of your suggestion that results
returned from a serializable transaction which does not
subsequently commit must be free from serialization anomalies.
You are just factually wrong there.

> I think all scenarios of this kind can be shown to allow serialisation
> anomalies.

Which kind?  Catching an exception from a declarative constraint?
Not all of them can, especially after the patch that went into 9.6
and that I'm intending to back-patch to other supported branches
shortly.  Other exceptions?  I have seen no evidence that any
others can, and I have a lot of reasons to believe that the special
properties of the accesses from constraint implementations which
allow those exceptions to be problematic are not present for other
exceptions.

Now, that is not an assertion that it is impossible for the code to
have some other bug, but if there is such a bug we would need to
find it to be able to fix it.

> I'm afraid I have no idea.  But if that is a complete description of
> the bug (if it doesn't affect "INSERT ... ON CONFLICT" for example)
> then that is good.

A bug was found in the initial release of INSERT ...

Re: [HACKERS] New design for FK-based join selectivity estimation

2016-12-13 Thread ronan . dunklau

On mardi 13 décembre 2016 09:10:47 CET Adrien Nayrat wrote:
> Hi hackers,
> 
> The commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a introduce an
> estimation error :
[]
> 
> Estimated row is 10x larger since 100340e2d
> 
> Regards,

Hello,

I think I understand what the problem is. In get_foreign_key_join_selectiviy, 
we remove the restrict info clauses which match a foreign key. This is done so 
that the selectivy is not applied twice (once in the function itself, once 
when processing the restrictinfos).

The problem is, for semi and anti joins, we assume that we have nohing to do 
(costsize.c:4253):

else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
{
/*
 * For JOIN_SEMI and JOIN_ANTI, the selectivity is 
defined as the
 * fraction of LHS rows that have matches.  If the 
referenced
 * table is on the inner side, that means the 
selectivity is 1.0
 * (modulo nulls, which we're ignoring for now).  We 
already
 * covered the other case, so no work here.
 */
}

This results in assuming that the whole outerrel will match, no matter the 
selectivity of the innerrel.

If I understand it correctly and the above is right, I think we should ignore 
SEMI or ANTI joins altogether when considering FKs, and keep the corresponding 
restrictinfos for later processing since they are already special-cased later 
on. 

Regards,

--
Ronan Dunklau

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb problematic operators

2016-12-13 Thread Tom Lane

Nico Williams  writes:
> On Tue, Dec 13, 2016 at 10:26:24AM +0900, Michael Paquier wrote:
>> On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark  wrote:
>>> One option might be for Postgres to define duplicate operator names
>>> using ¿ or something else.

>> Are you sure that using a non-ASCII character is a good idea for an
>> in-core operator? I would think no.

> Eventually language designers will cross that Rubicon in mainstream
> languages.  And why not?

Because it will create all sorts of character-set-conversion hazards.

In the current design of Postgres it flat out would not work, because
template0 has to be encoding-agnostic.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-13 Thread Robert Haas

On Tue, Dec 13, 2016 at 12:22 PM, Ildar Musin  wrote:
> We've noticed that PartitionDispatch object is built on every INSERT query
> and that it could create unnecessary overhead. Wouldn't it be better to keep
> it in relcache?

You might be able to cache some of that data in the relcache, but List
*keystate is pointing to query-lifespan data, so you can't cache that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-12-13 Thread Robert Haas

On Mon, Dec 12, 2016 at 9:21 PM, Amit Kapila  wrote:
> The reason is to make the operations consistent in master and standby.
> In WAL patch, for clearing the SPLIT_CLEANUP flag, we write a WAL and
> if we don't release the lock after writing a WAL, the operation can
> appear on standby even before on master.   Currently, without WAL,
> there is no benefit of doing so and we can fix by using
> MarkBufferDirty, however, I thought it might be simpler to keep it
> this way as this is required for enabling WAL.  OTOH, we can leave
> that for WAL patch.  Let me know which way you prefer?

It's not required for enabling WAL.  You don't *have* to release and
reacquire the buffer lock; in fact, that just adds overhead.  You *do*
have to be aware that the standby will perhaps see the intermediate
state, because it won't hold the lock throughout.  But that does not
mean that the master must release the lock.

>> I don't think we should be afraid to call MarkBufferDirty() instead of
>> going through these (fairly stupid) hasham-specific APIs.
>
> Right and anyway we need to use it at many more call sites when we
> enable WAL for hash index.

I propose the attached patch, which removes _hash_wrtbuf() and causes
those functions which previously called it to do MarkBufferDirty()
directly.  Aside from hopefully fixing the bug we're talking about
here, this makes the logic in several places noticeably less
contorted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index f1511d0..0eeb37d 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -635,7 +635,8 @@ loop_top:
 		num_index_tuples = metap->hashm_ntuples;
 	}
 
-	_hash_wrtbuf(rel, metabuf);
+	MarkBufferDirty(metabuf);
+	_hash_relbuf(rel, metabuf);
 
 	/* return statistics */
 	if (stats == NULL)
@@ -724,7 +725,6 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 		OffsetNumber deletable[MaxOffsetNumber];
 		int			ndeletable = 0;
 		bool		retain_pin = false;
-		bool		curr_page_dirty = false;
 
 		vacuum_delay_point();
 
@@ -805,7 +805,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 		{
 			PageIndexMultiDelete(page, deletable, ndeletable);
 			bucket_dirty = true;
-			curr_page_dirty = true;
+			MarkBufferDirty(buf);
 		}
 
 		/* bail out if there are no more pages to scan. */
@@ -820,15 +820,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 		 * release the lock on previous page after acquiring the lock on next
 		 * page
 		 */
-		if (curr_page_dirty)
-		{
-			if (retain_pin)
-_hash_chgbufaccess(rel, buf, HASH_WRITE, HASH_NOLOCK);
-			else
-_hash_wrtbuf(rel, buf);
-			curr_page_dirty = false;
-		}
-		else if (retain_pin)
+		if (retain_pin)
 			_hash_chgbufaccess(rel, buf, HASH_READ, HASH_NOLOCK);
 		else
 			_hash_relbuf(rel, buf);
@@ -862,6 +854,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 		bucket_opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 
 		bucket_opaque->hasho_flag &= ~LH_BUCKET_NEEDS_SPLIT_CLEANUP;
+		MarkBufferDirty(bucket_buf);
 	}
 
 	/*
@@ -873,7 +866,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 		_hash_squeezebucket(rel, cur_bucket, bucket_blkno, bucket_buf,
 			bstrategy);
 	else
-		_hash_chgbufaccess(rel, bucket_buf, HASH_WRITE, HASH_NOLOCK);
+		_hash_chgbufaccess(rel, bucket_buf, HASH_READ, HASH_NOLOCK);
 }
 
 void
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 572146a..59c4213 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -208,11 +208,12 @@ restart_insert:
 	(void) _hash_pgaddtup(rel, buf, itemsz, itup);
 
 	/*
-	 * write and release the modified page.  if the page we modified was an
+	 * dirty and release the modified page.  if the page we modified was an
 	 * overflow page, we also need to separately drop the pin we retained on
 	 * the primary bucket page.
 	 */
-	_hash_wrtbuf(rel, buf);
+	MarkBufferDirty(buf);
+	_hash_relbuf(rel, buf);
 	if (buf != bucket_buf)
 		_hash_dropbuf(rel, bucket_buf);
 
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index e2d208e..8fbf494 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -149,10 +149,11 @@ _hash_addovflpage(Relation rel, Buffer metabuf, Buffer buf, bool retain_pin)
 
 	/* logically chain overflow page to previous page */
 	pageopaque->hasho_nextblkno = BufferGetBlockNumber(ovflbuf);
+	MarkBufferDirty(buf);
 	if ((pageopaque->hasho_flag & LH_BUCKET_PAGE) && retain_pin)
-		_hash_chgbufaccess(rel, buf, HASH_WRITE, HASH_NOLOCK);
+		_hash_chgbufaccess(rel, buf, HASH_READ, HASH_NOLOCK);
 	else
-		_hash_wrtbuf(rel, buf);
+		_hash_relbuf(rel, buf);
 
 	return ovflbuf;
 }
@@ -304,7 +305,8 @@ found:
 
 	/* mark page "in use" in t

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-13 Thread Ian Jackson

Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry 
on constraint violation [and 2 more messages]"):
> [various things]

I get the feeling from your message that I have irritated you.  I'm
sorry if I have.  In particular, I wanted to say that I certainly
don't expect bugs to be fixed immediately.  I appreciate your
attention to detail.  I also appreciate you taking the time to deal on
your -hackers list with someone who really isn't a database hacker!

I still hope to be able to convince you that the definition of
SERIALIZABLE (in the pgsql docs) ought to be a stronger version, which
covers even non-committed transactions.


> [Ian Jackson:]
> > But I have just demonstrated a completely general technique which can
> > be used to convert any "spurious" constraint failure into a nonsense
> > transaction actually committed to the database.
> 
> Really?  I see a lot of daylight between "you must discard any
> results from a transaction which later throws a serialization
> failure" and "if you catch the exception from a constraint
> violation within a subtransaction it is possible to create a
> serialization anomaly".  I don't actually see how one relates to
> the other.  What am I missing?

I don't think it can be useful to state (in the pgsql docs) the caveat
you propose, to wit:
  "you must discard any results from a transaction which later throws
   a serialization failure"

I think any scenario where this caveat declares a behaviour to be "not
a bug", can be converted into a scenario where there is undoubtedly a
bug.  So this caveat does not actually reduce the difficulty of
implementing the promises made by the documentation.

The conversion is as follows: if a scenario is affected by the caveat,
in there must be at least one transaction T which firstly produces
"impossible" results I, and in which some later statement S produces a
serialisation failure.

To exhibit the corresponding unavoidable bug: Execute an identical
scenario, with exactly the same sequence of steps in the same order,
up to S.  However, instead of S, execute ROLLBACK.

Now this anomalous transaction T' is not "a transaction which later
throws a serialization failure".  There is no requirement to discard
the "impossible" results I.  The results I are supposed to be correct,
but they are not.

(I am making the hopefully-warranted assumption that ROLLBACK cannot
itself cause a serialisation failure.)

In theory this argument does not prove that every bug which the caveat
is trying to address, is as important as the same bug without the
caveat.  This is because perhaps T' is an unrealistic transaction
somehow.

But I don't really think that the primary documentation should be
complicated, and should give weaker statements about the system
behaviour, simply as an aid to bug prioritisation.


> > 1. Declare that all spurious failures, in SERIALIZABLE transactions,
> > are bugs.
> 
> It's not clear to me what you mean here.

I mean that the pgsql documentation should simply say that for
SERIALIZABLE transactions, there is always a coherent serialisation.
Even if the transaction failed due to a constraint violation, or
serialisation failure, or database disconnection, or whatever.

I think there is no "useful" weaker guarantee.

By which I mean that for any reasonable weaker guarantee: any scenario
in which pgsql violates the stronger guarantee, can be converted into
a scenario where pgsql violates the weaker guarantee.

Conversely, any correct implementation of the weaker guarantee
necessarily implements the stronger guarantee too.  So the weaker
guarantee does not make implementation easier.  All it does is
complicate application code.


> > 2. State that the SERIALIZABLE guarantee in Postgresql only applies to
> > transactions which (a) complete successsfully and (b) contain only
> > very simple pgsql constructs.
> 
> About 24 hours ago you reported a bug which was hard enough to hit
> that it took 5 years for anyone to find it.

Well, of course I have a slightly different point of view.

The fact that pgsql might generate "spurious" constraint violations,
even in SERIALIZABLE transactions, has been known for many years.  See
the URLs etc. in my patch, which include a reference from 2009.  Many
people have always regarded this as a bug.  The pgsql authors have not
agreed.

>From my point of view, I have developed a trick which allows me to
convince you that it has been a bug all along :-).


> The behavior we have been providing, modulus the bug you just found
> and any related bugs of the same ilk, is "we don't allow
> serialization anomalies into the database or in results from
> serializable transactions which commit."  That is enough to be very
> useful to many.

As I explain, I think that if you make that promise, you will have to
do all the work necessary to make the stronger promise anyway.

So you may as well make the stronger (and more comprehensible!)
promise.


> > Of course I understand that these pr

[HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2016-12-13 Thread Vladimir Rusinov

Based on discussion in
https://www.postgresql.org/message-id/CAE1wr-w%3DLE1cK5uG_rmAh-VBxc4_Bnw-gAE3qSqL-%3DtWwvLvjQ%40mail.gmail.com
.

Tested via regression tests.
To be applied in master only and to be included in 10.0.



-- 
Vladimir Rusinov
Storage SRE, Google Ireland

Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
Registered in Dublin, Ireland
Registration Number: 368047
From 2502b9f36f08965a66f4125952f23e1f6ede4d83 Mon Sep 17 00:00:00 2001
From: Vladimir Rusinov 
Date: Mon, 12 Dec 2016 13:23:32 +
Subject: [PATCH 1/1] Rename pg_switch_xlog to pg_switch_wal.

Based on discussion in https://www.postgresql.org/message-id/CAE1wr-w%3DLE1cK5uG_rmAh-VBxc4_Bnw-gAE3qSqL-%3DtWwvLvjQ%40mail.gmail.com.

Tested via regression tests.
To be applied in master only and to be included in 10.0.

Signed-off-by: Vladimir Rusinov 
---
 doc/src/sgml/backup.sgml   | 2 +-
 doc/src/sgml/func.sgml | 8 
 doc/src/sgml/high-availability.sgml| 2 +-
 src/backend/access/transam/xlogfuncs.c | 6 --
 src/backend/catalog/system_views.sql   | 2 +-
 src/include/access/xlog_fn.h   | 2 +-
 src/include/catalog/pg_proc.h  | 2 +-
 src/test/recovery/t/002_archiving.pl   | 2 +-
 src/test/recovery/t/003_recovery_targets.pl| 2 +-
 src/test/regress/expected/hs_standby_functions.out | 2 +-
 src/test/regress/sql/hs_primary_extremes.sql   | 2 +-
 src/test/regress/sql/hs_primary_setup.sql  | 2 +-
 src/test/regress/sql/hs_standby_functions.sql  | 2 +-
 13 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 6eaed1e..dc00a00 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -726,7 +726,7 @@ test ! -f /mnt/server/archivedir/000100A90065 && cp pg_wal/0
 

 Also, you can force a segment switch manually with
-pg_switch_xlog if you want to ensure that a
+pg_switch_wal if you want to ensure that a
 just-finished transaction is archived as soon as possible.  Other utility
 functions related to WAL management are listed in .
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index eca98df..e11aec9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17949,7 +17949,7 @@ SELECT set_config('log_statement_stats', 'off', false);
 pg_backup_start_time


-pg_switch_xlog
+pg_switch_wal


 pg_xlogfile_name
@@ -18043,7 +18043,7 @@ SELECT set_config('log_statement_stats', 'off', false);
   
   

-pg_switch_xlog()
+pg_switch_wal()
 
pg_lsn
Force switch to a new transaction log file (restricted to superusers by default, but other users can be granted EXECUTE to run the function)
@@ -18122,11 +18122,11 @@ postgres=# select pg_start_backup('label_goes_here');

 

-pg_switch_xlog moves to the next transaction log file, allowing the
+pg_switch_wal moves to the next transaction log file, allowing the
 current file to be archived (assuming you are using continuous archiving).
 The return value is the ending transaction log location + 1 within the just-completed transaction log file.
 If there has been no transaction log activity since the last transaction log switch,
-pg_switch_xlog does nothing and returns the start location
+pg_switch_wal does nothing and returns the start location
 of the transaction log file currently in use.

 
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 6b89507..a1dd93b 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -2201,7 +2201,7 @@ LOG:  database system is ready to accept read only connections
 

 WAL file control commands will not work during recovery,
-e.g. pg_start_backup, pg_switch_xlog etc.
+e.g. pg_start_backup, pg_switch_wal etc.

 

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 01cbd90..6547b8f 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -277,13 +277,15 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 }
 
 /*
- * pg_switch_xlog: switch to next xlog file
+ * pg_switch_wal: switch to next wal (aka xlog) file
+ *
+ * Renamed from pg_switch_xlog in 10.0
  *
  * Permission checking for this function is managed through the normal
  * GRANT system.
  */
 Datum
-pg_switch_xlog(PG_FUNCTION_ARGS)
+pg_switch_wal(PG_FUNCTION_ARGS)
 {
 	XLogRecPtr	switchpoint;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index df59d18..ffd1fad 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1039,7 +1039,7 @@ REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) FROM public;

Re: [HACKERS] Declarative partitioning - another take

2016-12-13 Thread Ildar Musin


Hi hackers,

On 08.12.2016 19:44, Dmitry Ivanov wrote:

That would be fantastic.  I and my colleagues at EnterpriseDB can
surely help review; of course, maybe you and some of your colleagues
would like to help review our patches, too.


Certainly, I'll start reviewing as soon as I get familiar with the code.



Do you think this is
likely to be something where you can get something done quickly, with
the hope of getting it into v10?


Yes, I've just cleared my schedule in order to make this possible. I'll
bring in the patches ASAP.




We've noticed that PartitionDispatch object is built on every INSERT 
query and that it could create unnecessary overhead. Wouldn't it be 
better to keep it in relcache?


Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2016-12-13 Thread Magnus Hagander

On Mon, Dec 12, 2016 at 6:45 PM, Kevin Grittner  wrote:

> On Mon, Dec 12, 2016 at 10:33 AM, Andres Freund 
> wrote:
> > On 2016-12-12 13:26:32 -0300, Alvaro Herrera wrote:
> >> Robert Haas wrote:
>
> >>> 1. Show all processes that have a PGPROC in pg_stat_activity,
>
> >>> 2. Add a second view, say pg_stat_system_activity,
>
> >> I vote 1.
> >
> > +1
>
> +1
>

+1 more, if we're still counting.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] jsonb problematic operators

2016-12-13 Thread Nico Williams

On Tue, Dec 13, 2016 at 10:26:24AM +0900, Michael Paquier wrote:
> On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark  wrote:
> > One option might be for Postgres to define duplicate operator names
> > using ¿ or something else. I think ¿ is a good choice because it's a
> > common punctuation mark in spanish so it's probably not hard to find
> > on a lot of keyboards or hard to find instructions on how to type one.
> 
> Are you sure that using a non-ASCII character is a good idea for an
> in-core operator? I would think no.

Eventually language designers will cross that Rubicon in mainstream
languages.  And why not?  It sure would be convenient... from the
designer's p.o.v.  Of course, _users_ would be annoyed, as most users
in the English-speaking world will have no idea how to type such
characters, most others also will not know how to, and there will be
users still using non-Unicode locales who will be unable to type such
characters at all.  Cut-n-paste will save the day, not doubt, though
mostly/only for users using Unicode locales.

But it is tempting.

Using non-ASCII Unicode characters for _alternatives_ seems like a
possible starting point though, since that leaves users with a
universally- available ASCII alternative.  Still, now users would then
have to recognize multiple equivalent forms... ugh.

Nico
-- 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb problematic operators

2016-12-13 Thread Robert Haas

On Mon, Dec 12, 2016 at 8:26 PM, Michael Paquier
 wrote:
> On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark  wrote:
>> On 12 December 2016 at 04:59, Craig Ringer  wrote:
>>> I didn't realise Pg's use of ? was that old, so thanks. That makes
>>> offering alternatives much less appealing.
>>
>> One option might be for Postgres to define duplicate operator names
>> using ¿ or something else. I think ¿ is a good choice because it's a
>> common punctuation mark in spanish so it's probably not hard to find
>> on a lot of keyboards or hard to find instructions on how to type one.
>>
>> There is always a risk in allowing redundant syntaxes though. For
>> example people running grep to find all uses of an operator will miss
>> the alternate spelling. There may even be security implications for
>> that though to be honest that seems unlikely in this case.
>
> Are you sure that using a non-ASCII character is a good idea for an
> in-core operator? I would think no.

I would agree with your thought.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-13 Thread Kevin Grittner

On Tue, Dec 13, 2016 at 9:50 AM, Ian Jackson  wrote:
> Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: 
> Retry on constraint violation [and 2 more messages]"):
>> On Tue, Dec 13, 2016 at 5:30 AM, Ian Jackson  
>> wrote:
>>> Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts
>>> before reporting constraint violations" ?
>>
>> No.  It was specifically meant to address duplicate keys, and there
>> was one particular set of steps which it was not able to address.
>> See post by Thomas Munro.  Hopefully, he, I, or someone else will
>> have a chance to work on the one known remaining issue and look for
>> others.  Your efforts have been helpful; it would be great if you
>> can find and document any other test cases which show a
>> less-than-ideal SQLSTATE or other outright serialization anomalies.
>
> Well, my own usage of postgresql is not really that advanced and we do
> not have very many complicated constraints.  So for projects I'm
> responsible for, what has been done in 9.6 is going to be good enough
> in practice (when it finally bubbles through via my distro etc.); and
> I have a workaround for now.

I worked in a shop with over 30,000 types of transactions, and
*none* of them involved using a subtransaction to catch and ignore
an attempted constraint violation.  I know of several larger shops
which have not experienced such problems, either.  I agree you
found a bug and it needs to be fixed, but it doesn't seem to be a
bug that occurs in common usage patterns; otherwise, it would
probably have taken less than five years for it to be found.

> But I am trying to save others (who may have more complicated
> situations) from being misled.

A motive I certainly appreciate.

>>> All statements in such transactions, even aborted transactions, need
>>> to see results, and have behaviour, which are completely consistent
>>> with some serialisaton of all involved transactions.  This must apply
>>> up to (but not including) any serialisation failure error.
>>
>> If I understand what you are saying, I disagree.
>
> But I have just demonstrated a completely general technique which can
> be used to convert any "spurious" constraint failure into a nonsense
> transaction actually committed to the database.

Really?  I see a lot of daylight between "you must discard any
results from a transaction which later throws a serialization
failure" and "if you catch the exception from a constraint
violation within a subtransaction it is possible to create a
serialization anomaly".  I don't actually see how one relates to
the other.  What am I missing?

> I think there are only two possible coherent positions:
>
> 1. Declare that all spurious failures, in SERIALIZABLE transactions,
> are bugs.

It's not clear to me what you mean here.

> 2. State that the SERIALIZABLE guarantee in Postgresql only applies to
> transactions which (a) complete successsfully and (b) contain only
> very simple pgsql constructs.

About 24 hours ago you reported a bug which was hard enough to hit
that it took 5 years for anyone to find it.  I think you might
consider allowing a bit more time to analyze the situation before
declaring that serializable transactions only work "in very simple
constructs".  Some of the transactions with which I have seen it
reliably work involved in excess of 20,000 lines of procedural
code.  What it seems to me we should do is try to identify the
conditions under which a bug can occur (which seems likely to be
limited to a subset of cases where errors thrown by declarative
constraints are caught within a subtransaction, and the
subtransaction work is discarded without throwing another error),
and see whether the bugs can be fixed.

> I think (2) would be rather a bad concession!  (It is probably also
> contrary to some standards document somewhere, if that matters.)
> Furthermore if you adopt (2) you would have to make a list of the
> "safe" pgsql constructs.

I think the unsafe constructs are likely to be a *much* shorter list.

> That would definitely exclude the exception trapping facility; but
> what other facilities or statements might be have similar effects ?
> ISTM that very likely INSERT ... ON CONFLICT can be used the same way.
> Surely you do not want to say "PostgreSQL does not give a
> transactional guarantee when INSERT ... ON CONFLICT is used".

I think we've chased down the bugs in this area.  Counterexamples
welcome.

>>  To prevent incorrect results from being returned even when a
>> transaction later fails with a serialization failure would require
>> blocking
>
> I'm afraid I don't understand enough about database implementation to
> answer this with confidence.  But this does not seem likely to be
> true.  Blocking can surely always be avoided, by simply declaring a
> serialisation failure instead of blocking.  I have no idea whether
> that is a practical approach in the general case, or whether it brings
> its own problems.

The "false positive" rate (throw

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2016-12-13 Thread Robert Haas

On Mon, Dec 12, 2016 at 8:13 PM, Michael Paquier
 wrote:
> On Tue, Dec 13, 2016 at 10:05 AM, Craig Ringer  wrote:
>> We should probably expose a proc_type or something, with types:
>>
>> * client_backend
>> * bgworker
>> * walsender
>> * autovacuum
>> * checkpointer
>> * bgwriter
>
> A text field is adapted then, more than a single character.

Sure.

>> for simpler filtering.
>>
>> I don't think existing user code is likely to get upset by more
>> processes appearing in pg_stat_activity, and it'll be very handy.
>
> Indeed, for WAL senders now abusing of the query field is definitely
> not consistent. Even if having this information is useful, adding such
> a column would make sense. Still, one thing that is important to keep
> with pg_stat_activity is the ability to count the number of
> connections that are part of max_connections for monitoring purposes.
> The docs definitely would need an example of such a query counting
> only client_backend and WAL senders and tell users that this can be
> used to count how many active connections there are.

Let's confine ourselves to fixing one problem at a time.  I think we
can get where we want to be in this case by adding one new column and
some new rows to pg_stat_activity.  Michael, is that something you're
going to do?  If not, one of my esteemed colleagues here at
EnterpriseDB will have a try.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-13 Thread Robert Haas

On Tue, Dec 13, 2016 at 1:58 AM, Amit Langote
 wrote:
> Attaching the above patch, along with some other patches posted earlier,
> and one more patch fixing another bug I found.  Patch descriptions follow:
>
> 0001-Miscallaneous-code-and-comment-improvements.patch
>
> Fixes some obsolete comments while improving others.  Also, implements
> some of Tomas Vondra's review comments.

Committed with some pgindenting.

> 0002-Miscallaneous-documentation-improvements.patch
>
> Fixes inconsistencies and improves some examples in the documentation.
> Also, mentions the limitation regarding row movement.

Ignored because I committed what I think is the same or a similar
patch earlier.  Please resubmit any remaining changes.

> 0003-Invalidate-the-parent-s-relcache-after-partition-cre.patch
>
> Fixes a bug reported by Tomas, whereby a parent's relcache was not
> invalidated after creation of a new partition using CREATE TABLE PARTITION
> OF.  This resulted in tuple-routing code not being to able to find a
> partition that was created by the last command in a given transaction.

Shouldn't StorePartitionBound() be responsible for issuing its own
invalidations, as StorePartitionKey() already is?  Maybe you'd need to
pass "parent" as another argument, but that way you know you don't
have the same bug at some other place where the function is called.

> 0004-Fix-a-bug-of-insertion-into-an-internal-partition.patch
>
> Fixes a bug I found this morning, whereby an internal partition's
> constraint would not be enforced if it is targeted directly.  See example
> below:
>
> create table p (a int, b char) partition by range (a);
> create table p1 partition of p for values from (1) to (10) partition by
> list (b);
> create table p1a partition of p1 for values in ('a');
> insert into p1 values (0, 'a');  -- should fail, but doesn't

I expect I'm missing something here, but why do we need to hoist
RelationGetPartitionQual() out of InitResultRelInfo() instead of just
having BeginCopy() pass true instead of false?

(Also needs a rebase due to the pgindent cleanup.)

> 0005-Fix-a-tuple-routing-bug-in-multi-level-partitioned-t.patch
>
> Fixes a bug discovered by Dmitry Ivanov, whereby wrong indexes were
> assigned to the partitions of lower levels (level > 1), causing spurious
> "partition not found" error as demonstrated in his email [1].

Committed.  It might have been good to include a test case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-13 Thread Andres Freund

On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
> OK, I think that I have spotted an issue after additional read of the
> code. When a WSA event is used for read/write activity on a socket,
> the same WSA event gets reused again and again. That's fine for
> performance reasons

It's actually also required to not loose events,
i.e. correctness. Windows events are edge, not level triggered. The
previous implementation wasn't correct.  So just resetting things ain't
ok.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-13 Thread Ashutosh Sharma

Hi All,

I have been working on this issue for last few days and it seems like
i have almost found the reason for this failure. My findings till date
are as follows.

Just to confirm if the problem is happening due to reusability of
WaitEventSet structure, I had replaced  a function call
ModifyWaitEvent() and WaitEventSetWait() which makes use of an already
existing  WaitEventSet to store event handles with WaitLatchOrSocket()
and it worked. Actually WaitLatchOrSocket() creates a new WaitEventSet
structure every time it has to wait for an event and is also being
used in the old code. This clearly shows that we do have some problem
just because we are reusing the same set of object handles throughput
a backend session. I am further investigating on this and would post
the final analysis along with the fix very  soon. Attached is the
patch that has the changes described
above. Any suggestions or inputs would be appreciated.

On Tue, Dec 13, 2016 at 9:34 PM, Ashutosh Sharma  wrote:
> Hi Micheal,
>
>>
>> Ashutosh, could you try it and see if it improves things?
>> -
>
> Thanks for your patch. I would like to inform you that I didn't find any
> improvement with your patch.
>
> With Regards,
> Ashutosh Sharma
> EnterpriseDB: http://www.enterprisedb.com

fix_wait_events.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-13 Thread Ashutosh Sharma

Hi Micheal,


> Ashutosh, could you try it and see if it improves things?
> -
>
Thanks for your patch. I would like to inform you that I didn't find any
improvement with your patch.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-12-13 Thread Karl O. Pinc

Hi Gilles,

I don't plan to be able to get back to this patch until late
this week or early next week.  My plan is to then go though
the whole thing and fix everything I can find.  If we're both
working at the same time this could lead to wasted effort
so I will write as soon as I start work and if you are working
at the same time I'll send smaller hunks.

By the by, my last email contained some stupidity in it's
code suggestion because it is structured like:

while
  if (...)
do something;
  else
break;
  do more...;

Stupid to have an if/else construct.

while
  if (!...)
break;
  do something;
  do more...;

Is cleaner.  Apologies if my coding out loud is confusing things.
I'll fix this in the next go-round if you don't get to it first.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-13 Thread Ian Jackson

Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry 
on constraint violation [and 2 more messages]"):
> On Tue, Dec 13, 2016 at 5:30 AM, Ian Jackson  
> wrote:
> > Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts
> > before reporting constraint violations" ?
> 
> No.  It was specifically meant to address duplicate keys, and there
> was one particular set of steps which it was not able to address.
> See post by Thomas Munro.  Hopefully, he, I, or someone else will
> have a chance to work on the one known remaining issue and look for
> others.  Your efforts have been helpful; it would be great if you
> can find and document any other test cases which show a
> less-than-ideal SQLSTATE or other outright serialization anomalies.

Well, my own usage of postgresql is not really that advanced and we do
not have very many complicated constraints.  So for projects I'm
responsible for, what has been done in 9.6 is going to be good enough
in practice (when it finally bubbles through via my distro etc.); and
I have a workaround for now.

But I am trying to save others (who may have more complicated
situations) from being misled.

> > All statements in such transactions, even aborted transactions, need
> > to see results, and have behaviour, which are completely consistent
> > with some serialisaton of all involved transactions.  This must apply
> > up to (but not including) any serialisation failure error.
> 
> If I understand what you are saying, I disagree.

But I have just demonstrated a completely general technique which can
be used to convert any "spurious" constraint failure into a nonsense
transaction actually committed to the database.  Of course my
transactions are contrived, but I don't see a way to rule out the
possibility that a real application might do something similar.

I think there are only two possible coherent positions:

1. Declare that all spurious failures, in SERIALIZABLE transactions,
are bugs.

2. State that the SERIALIZABLE guarantee in Postgresql only applies to
transactions which (a) complete successsfully and (b) contain only
very simple pgsql constructs.

I think (2) would be rather a bad concession!  (It is probably also
contrary to some standards document somewhere, if that matters.)
Furthermore if you adopt (2) you would have to make a list of the
"safe" pgsql constructs.

That would definitely exclude the exception trapping facility; but
what other facilities or statements might be have similar effects ?
ISTM that very likely INSERT ... ON CONFLICT can be used the same way.
Surely you do not want to say "PostgreSQL does not give a
transactional guarantee when INSERT ... ON CONFLICT is used".

>  To prevent incorrect results from being returned even when a
> transaction later fails with a serialization failure would require
> blocking

I'm afraid I don't understand enough about database implementation to
answer this with confidence.  But this does not seem likely to be
true.  Blocking can surely always be avoided, by simply declaring a
serialisation failure instead of blocking.  I have no idea whether
that is a practical approach in the general case, or whether it brings
its own problems.

But whether or not it is true, I think that "it's hard" is not really
a good answer to "PostgreSQL should implement behaviour for
SERIALIZABLE which can be coherently described and which covers
practically useful use cases".

> > I am concerned that there are other possible bugs of this form.
> > In earlier messages on this topic, it has been suggested that the
> > "impossible" unique constraint violation is only one example of a
> > possible "leakage".
> 
> As I see it, the main point of serializable transactions is to
> prevent serialization anomalies from being persisted in the
> database or seen by a serializable transaction which successfully
> commits.

As I have demonstrated, "spurious error" conditions can result in
"serialisation anomaly persisted in the database".  So there is not a
real distinction here.

There is another very important use case for serialisable transactions
which is read-only report transactions.  An application doing such a
transaction needs to see a consistent snapshot.  Such a transaction is
readonly so will never commit.

Any reasonable definition of what SERIALIZABLE means needs to give a
sensible semantics for readonly transactions.

> Well, the test includes commits and teardown, but this gets you to
> the problem.  Connection2 gets this:
> 
> ERROR:  duplicate key value violates unique constraint "invoice_pkey"
> DETAIL:  Key (year, invoice_number)=(2016, 3) already exists.

I think my error trapping technique can be used to convert this into
a scenario which commits "impossible" information to the database.

Do you agree ?

> If connection1 had explicitly read the "gap" into which it inserted
> its row (i.e., with a SELECT statement) there would be a
> serialization failure instead.  Getting the RI index maintenance

Re: [HACKERS] [PATCH] Fix for documentation of timestamp type

2016-12-13 Thread Robert Haas

On Tue, Dec 13, 2016 at 10:41 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> I find this a bit unclear, because the revised text kind of jumps back
>> and forth between the floating-point and integer formats.  Perhaps
>> something like this:
>
> Your wording seems OK to me, although I'd drop the "instead".

Good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-13 Thread Robert Haas

On Fri, Dec 9, 2016 at 5:46 AM, Maksim Milyutin
 wrote:
> I would like to work on two tasks:
>  - insert (and eventually update) tuple routing for foreign partition.
>  - the ability to create an index on the parent and have all of the children
> inherit it;
>
> The first one has been implemented in pg_pathman somehow, but the code
> relies on dirty hacks, so the FDW API has to be improved. As for the
> extended index support, it doesn't look like a super-hard task.

Great!

I think that the second one will be fairly tricky.  You will need some
way of linking the child indexes back to the parent index.  And then
you will need to prohibit them from being dropped or modified
independent of the parent index.  And you will need to cascade ALTER
commands on the parent index (which will have no real storage, I hope)
down to the children.  Unfortunately, index names have to be unique on
a schema-wide basis, so we'll somehow need to generate names for the
"child" indexes.  But those names might collide with existing objects,
and will need to be preserved across a dump-and-restore.  The concept
is simple but getting all of the details right is hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix for documentation of timestamp type

2016-12-13 Thread Tom Lane

Robert Haas  writes:
> I find this a bit unclear, because the revised text kind of jumps back
> and forth between the floating-point and integer formats.  Perhaps
> something like this:

Your wording seems OK to me, although I'd drop the "instead".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-13 Thread Robert Haas

On Thu, Dec 8, 2016 at 11:58 PM, Amit Langote
 wrote:
> Hierarchical lock manager stuff is interesting.  Are you perhaps alluding
> to a new *intention* lock mode as described in the literature on multiple
> granularity locking [1]?

Yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix for documentation of timestamp type

2016-12-13 Thread Robert Haas

On Mon, Dec 12, 2016 at 8:50 AM, Aleksander Alekseev
 wrote:
> I suggest to rewrite the documentation a bit to make it more clear that
> by default timestamp is stored in microseconds. Corresponding patch is
> attached.

I find this a bit unclear, because the revised text kind of jumps back
and forth between the floating-point and integer formats.  Perhaps
something like this:

When timestamp values are stored as eight-byte integers
(currently the default), microsecond precision is available over
the full range of values.  In this case, the internal representation is the
number of microseconds before or after midnight 2000-01-01.
When timestamp values are
stored as double precision floating-point numbers instead (a
deprecated compile-time option), the internal representation is the number
of seconds before or after midnight 2000-01-01.  With this representation,
the effective limit of precision might be less than 6; in practice,
microsecond precision is achieved for dates within a few
years of 2000-01-01, but the precision degrades for dates further
away. Note that using floating-point datetimes allows a larger
range of timestamp values to be represented than
shown above: from 4713 BC up to 5874897 AD.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_catversion builtin function

2016-12-13 Thread Tom Lane

Jesper Pedersen  writes:
> Attached is a new builtin function that exposes the CATALOG_VERSION_NO 
> constant under the pg_catversion() function, e.g.

I'm pretty sure that we intentionally didn't expose that, reasoning that
users should only care about the user-visible version number.  What
exactly is the argument for exposing this?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-12-13 Thread Robert Haas

On Thu, Dec 8, 2016 at 4:34 PM, Karl O. Pinc  wrote:
>>  I do think that the blizzard of small patches that you've
>> submitted in some of your emails may possibly be a bit overwhelming.
>> We shouldn't really need a stack of a dozen or more patches to
>> implement one small feature.  Declarative partitioning only had 7.
>> Why does this need more than twice that number?
>
> I'm trying to break up changes into patches which are as small
> as possible to make them more easily understood by providing
> a guide for the reader/reviewer.  So rather than
> a single patch I'd make, say, 3.  One for documentation to describe
> the change.  Another which does whatever refactoring is necessary
> to the existing code, but which otherwise does not introduce any
> functional changes.  And another which adds the new code which makes
> use of the refactoring.  At each stage the code should continue
> to work without bugs.  The other party can then decide to incorporate
> the patchs into the main patch, which is itself another attached
> patch.

Sure, I don't think the intention is bad.  It didn't really work out
here, perhaps because the individual changes were too small.  I think
for small changes it's often more helpful to say "change X to be like
Y" than to provide a patch, or else it's worth combining several small
patches just to keep the number from getting too big.  I don't know; I
can't point to anything you really did wrong here; the process just
didn't seem to be converging.

> It is worth noting that the PG pg_current_logfile() function
> is dependent upon the content of the current_logfiles file.  So
> if the file is wrong the function will also give wrong answers.
> But I don't care if you don't.

Well, I want the current_logfiles file to be correct, but that doesn't
mean that it's reasonable to expect people to enumerate all the
logfiles that have ever existed by running it.  That would be fragile
for tons of reasons, like whether or not the server hits the
connection limit at the wrong time, whether network connectivity is
unimpaired, whether the cron job that's supposed to run it
periodically hiccups for some stupid reason, and many others.

> A related thought is this.  There are 3 abstraction levels of
> concern.  The level beneath PG, the PG level, and the level
> of the user's application.  When PG detects an error from either
> "above" or "below" its job is to describe the problem from
> its own perspective.  HINTs are attempts to cross the boundary
> into the application's perspective.

Yes, I agree.  EnterpriseDB occasionally get complaints from our
customers saying that PG is buggy because it reported an operating
system error.  No, really, if the operating system can't read the
file, that's not our fault!  Call your sysadmin!  I also agree with
your perspective on HINTs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-12-13 Thread Robert Haas

On Mon, Dec 12, 2016 at 4:31 AM, Gilles Darold  wrote:
> Applied in last version of the patch v18 as well as removing of an
> unused variable.

OK, this looks a lot closer to being committable.  But:

@@ -570,11 +583,13 @@ SysLogger_Start(void)
  * a time-based rotation.
  */
 first_syslogger_file_time = time(NULL);
-filename = logfile_getname(first_syslogger_file_time, NULL);
+last_file_name = logfile_getname(first_syslogger_file_time, NULL);
+
+syslogFile = logfile_open(last_file_name, "a", false);

-syslogFile = logfile_open(filename, "a", false);
+update_metainfo_datafile();

-pfree(filename);
+pfree(last_file_name);

 #ifdef EXEC_BACKEND
 switch ((sysloggerPid = syslogger_forkexec()))

This leaves last_file_name pointing to free'd storage, which seems
like a recipe for bugs, because the next call to
update_metainfo_datafile() might try to follow that pointer.  I think
you need to make a clear decision about what the contract is for
last_file_name and last_csv_file_name.  Perhaps the idea is to always
keep them valid, but then you need to free the old value when
reassigning them and not free the current value.

The changes to logfile_rotate() appear to be mostly unrelated
refactoring which Karl was proposing for separate commit, not to be
incorporated into this patch.  Please remove whatever deltas are not
essential to the new feature being implemented.

misc.c no longer needs to add an include of .  I hope, anyway.

+ errmsg("log format not supported, possible
values are stderr or csvlog")));

This doesn't follow our message style guidelines.
https://www.postgresql.org/docs/devel/static/error-style-guide.html

Basically, a comma splice is not an acceptable way of joining together
two independent thoughts.  Obviously, people speak and write that way
informally all the time, but we try to be a bit rigorous about grammar
in our documentation and error messages.  I think the way to do this
would be something like errmsg("log format \"%s\" is not supported"),
errhint("The supported log formats are \"stderr\" and \"csvlog\".").

+FreeFile(fd);
+ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not read file \"%s\": %m",
+LOG_METAINFO_DATAFILE)));

You don't need to FreeFile(fd) before ereport(ERROR).  See header
comments for AllocateFile().

+/* report the entry corresponding to the requested format */
+if (strcmp(logfmt, log_format) == 0)
+break;
+}
+/* Close the current log filename file. */
+if (FreeFile(fd))
+ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not close file \"%s\": %m",
+LOG_METAINFO_DATAFILE)));
+
+if (lbuffer[0] == '\0')
+PG_RETURN_NULL();
+
+/* Recheck requested log format against the one extracted from the file */
+if (logfmt != NULL && (log_format == NULL ||
+strcmp(logfmt, log_format) != 0))
+PG_RETURN_NULL();

Your email upthread claims that you fixed this per my suggestions, but
it doesn't look fixed to me.  That check to see whether lbuffer[0] ==
'\0' is either protecting against nothing at all (in which case it
could be removed) or it's protecting against some weirdness that can
only happen here because of the strange way this logic is written.
It's hard to understand all the cases here because we can exit the
loop either having found the entry we want or not, and there's no
clear indication of which one it is.  Why not change the if-statement
at the end of the loop like this:

if (logfmt == NULL || strcmp(logfmt, log_format) == 0)
{
FreeFile(fd);
PG_RETURN_TEXT_P(cstring_to_text(log_filepath));
}

Then after the loop, just:

FreeFile(fd);
PG_RETURN_NULL();

+  
+  A file recording the log file(s) currently written to by the syslogger
+  and the file's log formats, stderr
+  or csvlog.  Each line of the file is a space separated list of
+  two elements: the log format and the full path to the log file including the
+  value of .  The log format must be present
+  in  to be listed in
+  current_logfiles.
+  
+
+  
+
+  pg_ctl
+  and current_logfiles
+
+
+  stderr
+  and current_logfiles
+
+
+  log_destination configuration parameter
+  and current_logfiles
+
+
+
+Although logs directed to stderr may be written
+to the filesystem, when the writing of stderr is
+managed outside of the PostgreSQL database
+server the location of such files in the filesystem is not reflected in
+the content of current_logfiles.  One such case is
+when the pg_ctl command is used to start
+the postgres database server, capture
+the stderr output of the server, and direct it to a
+file.
+
+
+
+There are other notable situations related
+to stderr logging.
+Non-Po

Re: [HACKERS] Fixing matching of boolean index columns to sort ordering

2016-12-13 Thread David G. Johnston

On Mon, Dec 12, 2016 at 10:08 PM, Tom Lane  wrote:

> Every WHERE clause is a
>
boolean expression, so there's no principled reason why such a rule
> wouldn't result in canonicalizing e.g. "i = 42" into "(i = 42) = true",
> wreaking havoc on every other optimization we have.  Restricting it
> to only apply to simple boolean columns is no answer because (a) indexes
> can be on boolean expressions not just simple columns, and (b) part
> of the point of the canonicalization is to make logically-equivalent
> expressions look alike, so declining to apply it in some cases would
> ruin that.
>

Given our reliance on operators in indexing a canonicalization rule that
says all boolean yielding expressions must contain an operator would yield
the desired result.  "i = 42" already has an operator so no further
normalization is necessary to make it conform to such a rule.

The indexing concern may still come into play here, I'm not familiar enough
with indexes on column lists versus indexes on expressions to know off the
top of my head.  The definition of "looking alike" seems like it would be
met since all such expression would look alike in having an operator.

That said, not adding "= true" is more visually appealing - though given
all of the other things we do in ruleutils this seems like a minor addition.

David J.

Re: [HACKERS] Proposal : Parallel Merge Join

2016-12-13 Thread Dilip Kumar

On Tue, Dec 13, 2016 at 6:42 AM, Dilip Kumar  wrote:
>> This patch is hard to read because it is reorganizing a bunch of code
>> as well as adding new functionality.  Perhaps you could separate it
>> into two patches, one with the refactoring and then the other with the
>> new functionality.
>
> Okay, I can do that.

I have created two patches as per the suggestion.

1. mergejoin_refactoring_v2.patch --> Move functionality of
considering various merge join path into new function.
2. parallel_mergejoin_v2.patch -> This adds the functionality of
supporting partial mergejoin paths. This will apply on top of
mergejoin_refactoring_v2.patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

mergejoin_refactoring_v2.patch
Description: Binary data

parallel_mergejoin_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-13 Thread Robert Haas

On Mon, Dec 12, 2016 at 5:45 PM, Tom Lane  wrote:
> One way that we could make things better is to rely on the knowledge
> that EPQ isn't asked to evaluate joins for more than one row per input
> relation, and therefore using merge or hash join technology is really
> overkill.  We could make a tree of simple nestloop joins, which aren't
> going to care about sort order, if we could identify the correct join
> clauses to apply.  At least some of that could be laid off on the FDW,
> which if it's gotten this far at all, ought to know what join clauses
> need to be enforced by the foreign join.  So I'm thinking a little bit
> in terms of "just collect the foreign scans for all the base rels
> included in the join and then build a cross-product nestloop join tree,
> applying all the join clauses at the top".  This would have the signal
> value that it's guaranteed to succeed and so can be left for later,
> rather than having to expensively redo it at each level of join planning.
>
> (Hm, that does sound a lot like "throw it away and start again", doesn't
> it.  But what we've got here is busted enough that I'm not sure there
> are good alternatives.  Maybe for 9.6 we could just rewrite
> GetExistingLocalJoinPath, and limp along doing a lot of redundant
> computation during planning.)

I think there was an earlier version of the patch that actually built
up NestLoop joins as it went, but at some point (possibly at my
suggestion) it got rewritten to try to fish out existing paths
instead.  Apparently, the other idea was better.

> BTW, what's "existing" about the result of GetExistingLocalJoinPath?

It's an existing path - i.e. already added to the RelOptInfo - to
perform the join locally.

> And for that matter, what's "outer" about the content of fdw_outerpath?

It's got the same meaning here that "outer path" does in general.
Right now, a Foreign Scan only has an outer subpath if it needs one
for EPQ rechecks, but in theory I suppose it could be used for
something else; every Plan has a lefttree, whether or not we choose to
populate it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-13 Thread Kevin Grittner

On Tue, Dec 13, 2016 at 5:30 AM, Ian Jackson  wrote:

> I am concerned that there are other possible bugs of this form.
> In earlier messages on this topic, it has been suggested that the
> "impossible" unique constraint violation is only one example of a
> possible "leakage".

As I see it, the main point of serializable transactions is to
prevent serialization anomalies from being persisted in the
database or seen by a serializable transaction which successfully
commits.  It is certainly very nice from a programming perspective
if the SQLSTATE permits easy identification of which failures it
can be expected to probably yield a different result on retry, but
it doesn't seem to me that the standard requires that, and other
researchers and developers in this area have taken advantage of the
fact that constraints prevent certain types of serialization
anomalies from reaching the database.  In the initial
implementation of serializable transactions we noted papers that
described this, and counted on it for correctness.

Note that with a one year development cycle for major releases, a
feature often initially gets implemented in a "bare bones" format
that is useful but not ideal, and later releases build on it.
Unfortunately there has not been anyone putting the resources into
building on the initial implementation (in 9.1) as the current
implementation has worked well enough for people to be focused on
other areas.

> Earlier you wrote:
>
>   If I recall correctly, the constraints for which there can be
>   errors appearing due to concurrent transactions are primary key,
>   unique, and foreign key constraints.  I don't remember seeing it
>   happen, but it would not surprise me if an exclusion constraint can
>   also cause an error due to a concurrent transaction's interaction
>   with the transaction receiving the error.
>
> Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts
> before reporting constraint violations" ?

No.  It was specifically meant to address duplicate keys, and there
was one particular set of steps which it was not able to address.
See post by Thomas Munro.  Hopefully, he, I, or someone else will
have a chance to work on the one known remaining issue and look for
others.  Your efforts have been helpful; it would be great if you
can find and document any other test cases which show a
less-than-ideal SQLSTATE or other outright serialization anomalies.

> I can try playing around with other kind of constraints, to try to
> discover different aspects or versions of this bug, but my knowledge
> of the innards of databases is very limited and I may not be
> particularly effective.  Certainly if I try and fail, I wouldn't have
> confidence that no such bug existed.

Right.  Proving the absence of any bug when dealing with race
conditions is notoriously hard.

> All statements in such transactions, even aborted transactions, need
> to see results, and have behaviour, which are completely consistent
> with some serialisaton of all involved transactions.  This must apply
> up to (but not including) any serialisation failure error.

If I understand what you are saying, I disagree.  To prevent
incorrect results from being returned even when a transaction later
fails with a serialization failure would require blocking, and
would have a major detrimental effect on both concurrency and
throughput compared to the techniques PostgreSQL is using.  As far
as I'm aware, the guarantee you seek can only be provided by strict
two phase locking (S2PL).  Benchmarks by Dan R. K. Ports of MIT
CSAIL showed S2PL to be consistently slower than the SSI techniques
used by PostgreSQL -- with throughput up to almost 5x worse in some
workloads, even with SSI's requirement that results from a
transaction which later gets a serialization failure must be
ignored.  Please see Section 8, and particularly the performance of
S2PL in Figure 4 of this paper, which Dan and I presented to the
VLDB conference in Istanbul:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

> It would be nice if the documentation stated the error codes that
> might be generated.  AFAICT that's just 40P01 and 40001 ?

Those are the only ones I know of.

> (I'm not sure what 40002 is.)

That doesn't appear to me to be related to transaction
serialization issues.

>> For the record, read-write-unique-4.spec's permutation r2 w1 w2 c1 c2
>> remains an open question for further work.
>
> Is this another possible bug of this form ?

Yes.  See the last specified permutation in this file:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/isolation/specs/read-write-unique-4.spec;h=ec447823484cff99a61f2c2e67ed3aef2210d680;hb=refs/heads/master

If it's not clear how to  read that to construct the problem case,
the README file might help:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/isolation/README;h=bea278a856fffc911a7819fe8055f7e328fcac5f;hb=refs/heads/master

I guess it's short enough to jus

Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-13 Thread Ashutosh Bapat

On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane  wrote:
> Jeff Janes  writes:
>> I have a test case where I made the fdw connect back to itself, and
>> stripped out all the objects that I could and still reproduce the case.  It
>> is large, 21MB compressed, 163MB uncompressed, so I am linking it here:
>> https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc
>
> Thanks for the test case.  I believe I understand the fundamental problem,
> or one of the fundamental problems --- I'm not exactly convinced there
> aren't several here.
>
> The key issue is that GetExistingLocalJoinPath believes it can return
> any random join path as what to use for the fdw_outerpath of a foreign
> join.  You can get away with that, perhaps, as long as you only consider
> two-way joins.  But in a nest of foreign joins, this fails to consider
> the interrelationships of join paths and their children.  In particular,
> what we have happening in this example is that GetExistingLocalJoinPath
> seizes randomly on a MergePath for an upper join relation, clones it,
> sees that the outer child is a join ForeignPath, and replaces that outer
> child with its fdw_outerpath ... which was chosen equally cavalierly by
> some previous execution of GetExistingLocalJoinPath, and does not have
> the sort ordering expected by the MergePath.  So we'd generate an invalid
> EPQ plan, except that the debug cross-check in create_mergejoin_plan
> notices the discrepancy.

Thanks for the detailed explanation.

>
> I believe there are probably more problems here, or at least if there
> aren't, it's not clear why not.  Because of GetExistingLocalJoinPath's
> lack of curiosity about what's underneath the join pathnode it picks,
> it seems to me that it's possible for it to return a path tree that
> *isn't* all local joins.  If we're looking at, say, a hash path for
> a 4-way join, whose left input is a hash path for a 3-way join, whose
> left input is a 2-way foreign join, what's stopping that from being
> returned as a "local" path tree?

Right, the so called "local join tree" can contain ForeignPath
representing joins between foreign relations. AFAIU what protects us
from getting into problem is that postgresRecheckForeignScan calls
ExecProcNode() on the outer plan. So, even if there are ForeignPaths
in fdw_outerpath tree, corresponding plans would use a local join plan
to apply EPQ recheck. The code in GetExistingLocalJoinPath() avoids
the redirection at least for the inner or outer ForeignPaths. Any
comment claiming that path tree under fdw_outerpath is entirely
"local" should be removed, if it survives the fix.

>
> Likewise, it seems like the code is trying to reject any custom-path join
> types, or at least this barely-intelligible comment seems to imply that:
>
>  * Just skip anything else. We don't know if corresponding
>  * plan would build the output row from whole-row references
>  * of base relations and execute the EPQ checks.
>
> But this coding fails to notice any such join type that's below the
> level of the immediate two join inputs.

Similarly, here we assume that any custom plan would tell us whether
the given row survives EPQ recheck. As long as a custom plan doing
that correctly, it shouldn't be a problem.

>
> We've probably managed to not notice this so far because foreign joins
> generally ought to dominate any local join method, so that there wouldn't
> often be cases where the surviving paths use local joins for input
> sub-joins.  But Jeff's test case proves it can happen.
>
> I kind of wonder why this infrastructure exists at all; it's not the way
> I'd have foreseen handling EPQ for remote joins.  However, while "throw
> it away and start again" might be good advice going forward, I suppose
> it won't be very popular for applying to 9.6.
>
> One way that we could make things better is to rely on the knowledge
> that EPQ isn't asked to evaluate joins for more than one row per input
> relation, and therefore using merge or hash join technology is really
> overkill.  We could make a tree of simple nestloop joins, which aren't
> going to care about sort order, if we could identify the correct join
> clauses to apply.  At least some of that could be laid off on the FDW,
> which if it's gotten this far at all, ought to know what join clauses
> need to be enforced by the foreign join.  So I'm thinking a little bit
> in terms of "just collect the foreign scans for all the base rels
> included in the join and then build a cross-product nestloop join tree,
> applying all the join clauses at the top".  This would have the signal
> value that it's guaranteed to succeed and so can be left for later,
> rather than having to expensively redo it at each level of join planning.

I am not able to understand how this strategy of applying all join
clauses on top of cross product would work if there are OUTER joins
involved. Wouldn't nullable sides change the result?

>
> (Hm, that does sound a lot

Re: [HACKERS] Hash Indexes

2016-12-13 Thread Jesper Pedersen


On 12/11/2016 11:37 PM, Amit Kapila wrote:

On Sun, Dec 11, 2016 at 11:54 AM, Amit Kapila  wrote:

On Wed, Dec 7, 2016 at 2:02 AM, Jeff Janes  wrote:

With above fixes, the test ran successfully for more than a day.


There was a small typo in the previous patch which is fixed in
attached.  I don't think it will impact the test results if you have
already started the test with the previous patch, but if not, then it
is better to test with attached.



A mix work load (INSERT, DELETE and VACUUM primarily) is successful here 
too using _v2.


Thanks !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-13 Thread Michael Paquier

On Tue, Dec 13, 2016 at 7:15 PM, Kyotaro HORIGUCHI
 wrote:
> Thank you for the comment.

Er, it is good to see more people interested in this problem... As the
former author, still working actively on this patch, perhaps it would
be better if I continue to code this patch, no? It is a waste to step
on each other's toes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_catversion builtin function

2016-12-13 Thread Jesper Pedersen


Hi Hackers,

Attached is a new builtin function that exposes the CATALOG_VERSION_NO 
constant under the pg_catversion() function, e.g.


test=# SELECT pg_catversion();
 pg_catversion
---
 201612121
(1 row)

Although it mostly useful during the development cycle to verify if 
dump/restore is needed; it could have other use-cases.


I'm unsure of the OID assignment rules - feel free to point me towards 
information regarding this.


I'll register this patch with the next CF.

Best regards,
 Jesper
>From 39d52f5389bf3ef1814c1f201df6531feb2a5c7f Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Tue, 13 Dec 2016 08:27:18 -0500
Subject: [PATCH] pg_catversion builtin function

---
 doc/src/sgml/func.sgml  |  6 ++
 src/backend/utils/adt/version.c | 10 ++
 src/include/catalog/pg_proc.h   |  3 +++
 src/include/utils/builtins.h|  1 +
 4 files changed, 20 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0f9c9bf..6fc78ab 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15497,6 +15497,12 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
text
PostgreSQL version information. See also  for a machine-readable version.
   
+
+  
+   pg_catversion()
+   int
+   returns the catalog version number
+  
  
 

diff --git a/src/backend/utils/adt/version.c b/src/backend/utils/adt/version.c
index f247992..55f97fa 100644
--- a/src/backend/utils/adt/version.c
+++ b/src/backend/utils/adt/version.c
@@ -14,6 +14,7 @@
 
 #include "postgres.h"
 
+#include "catalog/catversion.h"
 #include "utils/builtins.h"
 
 
@@ -22,3 +23,12 @@ pgsql_version(PG_FUNCTION_ARGS)
 {
 	PG_RETURN_TEXT_P(cstring_to_text(PG_VERSION_STR));
 }
+
+/*
+ * Return the catalog version number
+ */
+Datum
+pgsql_catversion(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT32(CATALOG_VERSION_NO);
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index cd7b909..b23f54a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -235,6 +235,9 @@ DATA(insert OID = 1258 (  textcat		   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0
 DATA(insert OID =  84 (  boolne			   PGNSP PGUID 12 1 0 0 0 f f f t t f i s 2 0 16 "16 16" _null_ _null_ _null_ _null_ _null_ boolne _null_ _null_ _null_ ));
 DATA(insert OID =  89 (  version		   PGNSP PGUID 12 1 0 0 0 f f f f t f s s 0 0 25 "" _null_ _null_ _null_ _null_ _null_ pgsql_version _null_ _null_ _null_ ));
 DESCR("PostgreSQL version string");
+DATA(insert OID = 7000 (  pg_catversionPGNSP PGUID 12 1 0 0 0 f f f f t f s s 0 0 23 "" _null_ _null_ _null_ _null_ _null_ pgsql_catversion _null_ _null_ _null_ ));
+DESCR("PostgreSQL catalog version number");
+
 
 DATA(insert OID = 86  (  pg_ddl_command_in		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 32 "2275" _null_ _null_ _null_ _null_ _null_ pg_ddl_command_in _null_ _null_ _null_ ));
 DESCR("I/O");
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 7ed1623..83b2846 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -893,6 +893,7 @@ extern Datum text_format_nv(PG_FUNCTION_ARGS);
 
 /* version.c */
 extern Datum pgsql_version(PG_FUNCTION_ARGS);
+extern Datum pgsql_catversion(PG_FUNCTION_ARGS);
 
 /* xid.c */
 extern Datum xidin(PG_FUNCTION_ARGS);
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-13 Thread Michael Paquier

On Tue, Dec 13, 2016 at 12:49 PM, Fujii Masao  wrote:

Thanks for the review.

> Isn't it better to mention "an exclusive backup" here? What about
>
> EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an 
> exclusive
> backup.
> EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an exclusive
> backup.

OK, changed this way.

> I think that it's worth explaining why ExclusiveBackupState is necessary,
> in the comment.

Changed like that at the top of the declaration of ExclusiveBackupState:
 * State of an exclusive backup, necessary to control concurrent activities
 * across sessions when working on exclusive backups.

> - * exclusiveBackup is true if a backup started with pg_start_backup() is
> - * in progress, and nonExclusiveBackups is a counter indicating the 
> number
> + * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the
> + * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS 
> once
> + * it is done, and nonExclusiveBackups is a counter indicating the number
>
> Why didn't you mention EXCLUSIVE_BACKUP_STOPPING and _NONE?
> Basically it's better to explain fully how the state changes.

Let's simplify things instead and just say that the meaning of the values is
described where ExclusiveBackupState is declared.

> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("a backup is already starting")));
> +}
> +if (XLogCtl->Insert.exclusiveBackupState == 
> EXCLUSIVE_BACKUP_STOPPING)
> +{
> +WALInsertLockRelease();
> +ereport(ERROR,
> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("a backup is currently stopping")));
>
> Isn't it better to use "an exclusive backup" explicitly rather than
> "a backup" here?

Yes. It would be better to not touch the original in-progress messages
though.

> You changed the code so that pg_stop_backup() removes backup_label before
> it marks the exclusive-backup-state as not-in-progress. Could you tell me
> why you did this change? It's better to explain it in the comment.

That's actually mentioned in the comments :)

This is to ensure that there cannot be any other pg_stop_backup() running
in parallel and trying to remove the backup label file. The key point here
is that the backup_label file is removed during EXCLUSIVE_BACKUP_STOPPING,
that the removal of the backup_label file is kept last on purpose (that's
what HEAD does anyway), and that we can rollback to an in-progress state
in case of failure.

I have rewritten this block like that. Does that sound better?

+* Remove backup_label for an exclusive backup. Before doing anything
+* the status of the exclusive backup is switched to
+* EXCLUSIVE_BACKUP_STOPPING to ensure that there cannot be concurrent
+* operations. In case of failure, in which case the status of the
+* exclusive backup is switched back to in-progress. The removal of the
+* backup_label file is kept last as it is the critical piece proving
+* if an exclusive backup is running or not in the event of a system
+* crash.

Thanks,
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 084401d..8fa9cf4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -463,6 +463,29 @@ typedef union WALInsertLockPadded
 } WALInsertLockPadded;
 
 /*
+ * State of an exclusive backup, necessary to control concurrent activities
+ * across sessions when working on exclusive backups.
+ *
+ * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
+ * running, to be more precise pg_start_backup() is not being executed for
+ * an exclusive backup and there is no exclusive backup in progress.
+ * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an
+ * exclusive backup.
+ * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an
+ * exclusive backup.
+ * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() has finished
+ * running and an exclusive backup is in progress. pg_stop_backup() is
+ * needed to finish it.
+ */
+typedef enum ExclusiveBackupState
+{
+	EXCLUSIVE_BACKUP_NONE = 0,
+	EXCLUSIVE_BACKUP_STARTING,
+	EXCLUSIVE_BACKUP_STOPPING,
+	EXCLUSIVE_BACKUP_IN_PROGRESS
+} ExclusiveBackupState;
+
+/*
  * Shared state data for WAL insertion.
  */
 typedef struct XLogCtlInsert
@@ -503,13 +526,14 @@ typedef struct XLogCtlInsert
 	bool		fullPageWrites;
 
 	/*
-	 * exclusiveBackup is true if a backup started with pg_start_backup() is
-	 * in progress, and nonExclusiveBackups is a counter indicating the number
-	 * of streaming base backups currently in progress. forcePageWrites is set
-	 * to true when either of these is non-zero. lastBackupStart is the latest
-	 * checkpoint redo location used as a starting point for an online backup.
+	 * The possible values of exclusiveBackupStat

Re: [HACKERS] Broken SSL tests in master

2016-12-13 Thread Robert Haas

On Mon, Dec 12, 2016 at 1:34 AM, Heikki Linnakangas  wrote:
> On 12/01/2016 09:45 PM, Andres Freund wrote:
>>
>> And nobody has added a buildfarm module to run it manually on their
>> servers either :(
>
> I just added a module to run "make -C src/test/ssl check" in chipmunk. So at
> least that's covered now.
>
> http://pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=chipmunk&dt=2016-12-10%2012%3A08%3A31&stg=test-ssl-check

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background contrib module proposal

2016-12-13 Thread amul sul

On Tue, Dec 13, 2016 at 2:05 PM, Andrew Borodin  wrote:
> 2016-12-13 12:55 GMT+05:00 amul sul :
>> I think background-session code is not that much deviated from
>> pg_background code,
> It is not derived, though it is not much deviated. background sessions
> code do not have detailed exceptions and code comments, but it is
> doing somewhat similar things.
>>IIUC background-session patch we can manage to
>> reuse it, as suggested by Robert.  This will allow us to maintain
>> session in long run, we could reuse this session to run subsequent
>> queries instead of maintaining separate worker pool.  Thoughts?
>
> One API to deal with both features would be much better, sure.
> "Object" like sessions pool are much easier to implement on top of
> session "object" then on top of worker process, PIDs etc.
>
>>> 4. Query as a future (actually it is implemented already by
>>> pg_background_result)
>>> 5. Promised result. Declare that you are waiting for data of specific
>>> format, execute a query, someone (from another backend) will
>>> eventually place a data to promised result and your query continues
>>> execution.
>>
>> Could you please elaborate more?
>> Do you mean two way communication between foreground & background process?
>
> It is from C++11 threading: future, promise and async - these are
> related but different kinds of aquiring result between threads.
> Feature - is when caller Cl start thread T(or dequeue thread from
> pool) and Cl can wait until T finishes and provides result.
> Here waiting the result is just like selecting from pg_background_result().
> Promise - is when you declare a variable and caller do not know which
> thread will put the data to this variable. But any thread reading
> promise will wait until other thread put a data to promise.
> There are more parallelism patterns there, like async, deffered, lazy,
> but futures and promises from my point of view are most used.
>
Nice, thanks for detailed explanation.

We can use shm_mq infrastructure to share any kind of message between
two processes,
but perhaps we might end up with overestimating what originally pg_background
could used for - the user backend will launch workers and give them an
initial set
of instruction and then results will stream back from the workers to
the user backend.

>>> 6. Cancelation: a way to signal to background query that it's time to
>>> quit gracefully.
>> Let me check this too.
> I think Craig is right: any background query must be ready to be shut
> down. That's what databases are about, you can safely pull the plug at
> any moment.

SIGTERM is handled in current pg_background patch, user can terminate
backend execution using pg_cancel_backend() or pg_terminate_backend()
as shown below:

postgres=> select pg_background_launch('insert into foo
values(generate_series(1,1))');
 pg_background_launch
--
67069
(1 row)

postgres=> select pg_terminate_backend(67069);
 pg_terminate_backend
--
 t
(1 row)

postgres=> select * from pg_background_result(67069) as (x text);
ERROR:  terminating connection due to administrator command
CONTEXT:  background worker, pid 67069
postgres=>

Thanks & Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ToDo: no blocking (waiting) DDL

2016-12-13 Thread Pavel Stehule

2016-12-13 13:03 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
> > Hi
> >
> > I don't remember well, there was maybe similar ToDo.
> >
> > Yesterday I got a incident on high load system when I executed DROP INDEX
> > cmd.
> >
> > This statement waited on a finish of long transaction, but it stopped any
> > other statements.
> >
> > Can be nice when waiting on lock statement doesn't block another
> > statements.
>
> https://postgr.es/m/CAKddOFDz0+F2uspVN5BZgtN7Wp+
> vqbeluqpwfvgwap5jzvx...@mail.gmail.com


yes, it is it. Can be applied on some DDL statements too.

Thank you

regards

Pavel


>
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] ToDo: no blocking (waiting) DDL

2016-12-13 Thread Alvaro Herrera

Pavel Stehule wrote:
> Hi
> 
> I don't remember well, there was maybe similar ToDo.
> 
> Yesterday I got a incident on high load system when I executed DROP INDEX
> cmd.
> 
> This statement waited on a finish of long transaction, but it stopped any
> other statements.
> 
> Can be nice when waiting on lock statement doesn't block another
> statements.

https://postgr.es/m/cakddofdz0+f2uspvn5bzgtn7wp+vqbeluqpwfvgwap5jzvx...@mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ToDo: no blocking (waiting) DDL

2016-12-13 Thread Pavel Stehule

Hi

I don't remember well, there was maybe similar ToDo.

Yesterday I got a incident on high load system when I executed DROP INDEX
cmd.

This statement waited on a finish of long transaction, but it stopped any
other statements.

Can be nice when waiting on lock statement doesn't block another
statements.

Regards

Pavel

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-13 Thread Ian Jackson

Thanks to everyone for your attention.

Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry 
on constraint violation"):
> On Mon, Dec 12, 2016 at 8:45 AM, Ian Jackson  
> wrote:
> > AIUI the documented behavour is that "every set of successful
> > transactions is serialisable".
> 
> Well, in context that is referring to serializable transactions.
> No such guarantee is provided for other isolation levels.

Indeed.

> I didn't [get the same results].  First, I got this when I tried to
> start the concurrent transactions using the example as provided:

I'm sorry, I didn't actually try my paraphrased repro recipe, so it
contained an error:

> test=#   SELECT count(*) FROM t WHERE k=1;   -- save value

My Perl code said "k=?" and of course ? got (effectively) substituted
with "'1'" rather than "1".  I introduced the error when transcribing
the statements from my Perl script to my prose.  Sorry about that.
Next time I will test my alleged recipe with psql.

I should also have told you that I was running this on 9.1.

Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry 
on constraint violation"):
> On Mon, Dec 12, 2016 at 12:32 PM, Kevin Grittner  wrote:
> > As you can see, this generated a serialization failure.
> 
> That was on 9.6.  On earlier versions it does indeed allow the
> transaction on connection 2 to commit, yielding a non-serializable
> result.  This makes a pretty strong case for back-patching this
> commit:

Right.  That's part of my point.  Thanks.

[back to the first message]
> If you have some way to cause a set of concurrent serializable
> transactions to generate results from those transactions which
> commit which is not consistent with some one-at-a-time order of
> execution, I would be very interested in seeing the test case.
> The above, however, is not it.

I am concerned that there are other possible bugs of this form.
In earlier messages on this topic, it has been suggested that the
"impossible" unique constraint violation is only one example of a
possible "leakage".

Earlier you wrote:

  If I recall correctly, the constraints for which there can be
  errors appearing due to concurrent transactions are primary key,
  unique, and foreign key constraints.  I don't remember seeing it
  happen, but it would not surprise me if an exclusion constraint can
  also cause an error due to a concurrent transaction's interaction
  with the transaction receiving the error.

Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts
before reporting constraint violations" ?

I can try playing around with other kind of constraints, to try to
discover different aspects or versions of this bug, but my knowledge
of the innards of databases is very limited and I may not be
particularly effective.  Certainly if I try and fail, I wouldn't have
confidence that no such bug existed.

And,

Thomas Munro writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on 
constraint violation"):
> Ian's test case uses an exception handler to convert a difference in
> error code into a difference in committed effect, thereby converting a
> matter of programmer convenience into a bug.

Precisely.  I think this is a fully general technique, which means
that any situation where a transaction can "spuriously" fail is a
similar bug.  So I think that ISOLATION LEVEL SERIALIZABLE needs to do
what a naive programmer would expect:

All statements in such transactions, even aborted transactions, need
to see results, and have behaviour, which are completely consistent
with some serialisaton of all involved transactions.  This must apply
up to (but not including) any serialisation failure error.

If that is the behaviour of 9.6 then I would like to submit a
documentation patch which says so.  If the patch is to be backported,
then this ought to apply to all (patched) 9.x versions ?

It would be nice if the documentation stated the error codes that
might be generated.  AFAICT that's just 40P01 and 40001 ?  (I'm not
sure what 40002 is.)

> For the record, read-write-unique-4.spec's permutation r2 w1 w2 c1 c2
> remains an open question for further work.

Is this another possible bug of this form ?

Ian.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-13 Thread Kyotaro HORIGUCHI

Thank you for the comment.

At Tue, 13 Dec 2016 12:49:12 +0900, Fujii Masao  wrote 
in 
> On Tue, Nov 8, 2016 at 11:18 AM, Kyotaro HORIGUCHI
>  wrote:
> >> By the way, as long as I look at that, the patch applies cleanly on
> >> master and 9.6 but not further down. If a committer shows up to push
> >> this patch, I can prepare versions for back branches as needed.
> >
> > Ok, the attached is the version just rebased to the current
> > master. It is clieanly appliable without whitespace errors on
> > master and 9.6 with some displacements but fails on 9.5.
> >
> > I will mark this as "Ready for Committer".
> 
> Thanks for updating the patch! Here are the review comments;
> 
> + * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually
> + * running, to be more precise pg_start_backup() is not being executed for
> + * an exclusive backup and there is exclusive backup in progress.
> 
> "there is exclusive" should be "there is no exclusive"?
> ISTM that the description following "to be more" doesn't seem to be necessary.

I agree. One can know that by reading the description on
EXCLUSIVE_BACKUP_STARTING (and STOPPING, which is not mentioned here) 

> + * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is running and
> + * that is is not done yet.
> + * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is running and
> + * that is is not done yet.
> 
> Typo: "is is"

Oops. Sorry for overlooking such a silly typo.

> Isn't it better to mention "an exclusive backup" here? What about
> 
> EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an 
> exclusive
> backup.
> EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an exclusive
> backup.

It seems fine. Done in the attached version.

> I think that it's worth explaining why ExclusiveBackupState is necessary,
> in the comment.

The attached version contains the following comment.

| * State of an exclusive backup.
| *
| * EXCLUSIVE_BACKUP_STARTING and EXCLUSIVE_BACKUP_STOPPING blocks
| * pg_start_backup and pg_stop_backup to prevent a race condition. Both of the
| * functions returns error on these state.

Does this make sense?

> - * exclusiveBackup is true if a backup started with pg_start_backup() is
> - * in progress, and nonExclusiveBackups is a counter indicating the 
> number
> + * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the
> + * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS 
> once
> + * it is done, and nonExclusiveBackups is a counter indicating the number
> 
> Why didn't you mention EXCLUSIVE_BACKUP_STOPPING and _NONE?
> Basically it's better to explain fully how the state changes.

Hmm, it is modfied as the following in the attached.

|  * exclusiveBackupState indicates state of an exlusive backup, see
|  * ExclusiveBackupState for the detail.

and the comment for ExclusiveBackupState explains the state
transition. Does this work for you?


> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("a backup is already starting")));
> +}
> +if (XLogCtl->Insert.exclusiveBackupState == 
> EXCLUSIVE_BACKUP_STOPPING)
> +{
> +WALInsertLockRelease();
> +ereport(ERROR,
> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("a backup is currently stopping")));
> 
> Isn't it better to use "an exclusive backup" explicitly rather than
> "a backup" here?

pg_stop_backup already says like that. To unify with them, the
messages of pg_start_backup is modifed as the following.

   exlusive backup is already starting
   exlusive backup is currently stopping
   exlusive backup is currently in progress

> You changed the code so that pg_stop_backup() removes backup_label before
> it marks the exclusive-backup-state as not-in-progress. Could you tell me
> why you did this change? It's better to explain it in the comment.

Previously the backup_label is the traffic signal for the backup
state and that is the cause of this problem. Now the
exclusiveBackupState takes the place of it, so it naturally
should be _NONE after all steps have been finished.  It seems to
me so natural so that no additional explanation is needed.

If they are done in the opposite order, the existence check of
backup_label still left in pg_start_backup may fire.

|   * Check for existing backup label --- implies a backup is already
|   * running.  (XXX given that we checked exclusiveBackupState above,
|   * maybe it would be OK to just unlink any such label file?)
|   */
|  if (stat(BACKUP_LABEL_FILE, &stat_buf) != 0)
|  {
|if (errno != ENOENT)
|  ereport(ERROR,


-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 084401d..fd2d809 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/trans

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-13 Thread Kyotaro HORIGUCHI

Hello, it seems to me to work as expected.

At Thu, 08 Dec 2016 15:58:10 -0500, Tom Lane  wrote in 
<7083.1481230...@sss.pgh.pa.us>
> I've pushed the previous patch to HEAD.  Attached is a proposed patch
> (against 9.6) that we could use for the back branches; it takes the
> brute force approach of just computing the correct value on-demand
> in the two functions at issue.  The key question of course is whether
> this is acceptable from a performance standpoint.  I did a simple test
> 
> using a 1000-entry VALUES list:
> 
> select count(a) from (
> values
>   ('0'::varchar(3), '0'::varchar(4)),
>   ('1'::varchar(3), '1'::varchar(4)),
>   ('2'::varchar(3), '2'::varchar(4)),
>   ('3'::varchar(3), '3'::varchar(4)),
>   ('4'::varchar(3), '4'::varchar(4)),
>   ...
>   ('996'::varchar(3), '996'::varchar(4)),
>   ('997'::varchar(3), '997'::varchar(4)),
>   ('998'::varchar(3), '998'::varchar(4)),
>   ('999'::varchar(3), '999'::varchar(4))
> ) v(a,b);
> 
> Since all the rows do have the same typmod, this represents the worst
> case where we have to scan all the way to the end to confirm the typmod,
> and it has about as little overhead otherwise as I could think of doing.
> I ran it like this:
> 
> pgbench -U postgres -n -c 1 -T 1000 -f bigvalues.sql regression
> 
> and could not see any above-the-noise-level difference --- in fact,
> it seemed like it was faster *with* the patch, which is obviously
> impossible; I blame that on chance realignments of loops vs. cache
> line boundaries.
> 
> So I think this is an okay candidate for back-patching.  If anyone
> wants to do their own performance tests, please do.

For all the additional computation, I had the same result on
9.6. The patch accelerates the processing around the noise rate..

9.6 without the patch 103.2 tps
9.6 withthe patch 108.3 tps

For the master branch,

master102.9 tps
0b78106c^ 103.4 tps

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-12-13 Thread Amit Langote

On 2016/12/13 0:17, Tomas Vondra wrote:
> On 12/12/2016 07:37 AM, Amit Langote wrote:
>>
>> Hi Tomas,
>>
>> On 2016/12/12 10:02, Tomas Vondra wrote:
>>>
>>> 2) I'm wondering whether having 'table' in the catalog name (and also in
>>> the new relkind) is too limiting. I assume we'll have partitioned indexes
>>> one day, for example - do we expect to use the same catalogs?
>>
>> I am not sure I understand your idea of partitioned indexes, but I doubt
>> it would require entries in the catalog under consideration.  Could you
>> perhaps elaborate more?
>>
> 
> OK, let me elaborate. Let's say we have a partitioned table, and I want to
> create an index. The index may be either "global" i.e. creating a single
> relation for data from all the partitions, or "local" (i.e. partitioned
> the same way as the table).
> 
> Local indexes are easier to implement (it's essentially what we have now,
> except that we need to create the indexes manually for each partition),
> and don't work particularly well for some use cases (e.g. unique
> constraints). This is what I mean by "partitioned indexes".
> 
> If the index is partitioned just like the table, we probably won't need to
> copy the partition key info (so, nothing in pg_partitioned_table).
> I'm not sure it makes sense to partition the index differently than the
> table - I don't see a case where that would be useful.
> 
> The global indexes would work better for the unique constraint use case,
> but it clearly contradicts our idea of TID (no information about which
> partition that references).
> 
> So maybe the catalog really only needs to track info about tables? Not
> sure. I'm just saying it'd be unfortunate to have _table in the name, and
> end up using it for indexes too.

Hmm, I didn't quite think of the case where the index is partitioned
differently from the table, but perhaps that's possible with some other
databases.

What you describe as "local indexes" or "locally partitioned indexes" is
something I would like to see being pursued in the near term.  In that
case, we would allow defining indexes on the parent that are recursively
defined on the partitions and marked as inherited index, just like we have
inherited check constraints and NOT NULL constraints.  I have not studied
whether we could implement (globally) *unique* indexes with this scheme
though, wherein the index key is a superset of the partition key.

>>> 5) Half the error messages use 'child table' while the other half uses
>>> 'partition'. I think we should be using 'partition' as child tables really
>>> have little meaning outside inheritance (which is kinda hidden behind the
>>> new partitioning stuff).
>>
>> One way to go about it may be to check all sites that can possibly report
>> an error involving child tables (aka "partitions") whether they know from
>> the context which name to use.  I think it's possible, because we have
>> access to the parent relation in all such sites and looking at the relkind
>> can tell whether to call child tables "partitions".
>>
> 
> Clearly, this is a consequence of building the partitioning on top of
> inheritance (not objecting to that approach, merely stating a fact).
> 
> I'm fine with whatever makes the error messages more consistent, if it
> does not make the code significantly more complex. It's a bit confusing
> when some use 'child tables' and others 'partitions'. I suspect even a
> single DML command may return a mix of those, depending on where exactly
> it fails (old vs. new code).

So, we have mostly some old DDL (CREATE/ALTER TABLE) and maintenance
commands that understand inheritance.  All of the their error messages
apply to partitions as well, wherein they will be referred to as "child
tables" using old terms.  We now have some cases where the commands cause
additional error messages for only partitions because of additional
restrictions that apply to them.  We use "partitions" for them because
they are essentially new error messages.

There won't be a case where single DML command would mix the two terms,
because we do not allow mixing partitioning and regular inheritance.
Maybe I misunderstood you though.

>>> So if I understand the plan correctly, we first do a parallel scan of the
>>> parent, then partition_1, partition_2 etc. But AFAIK we scan the tables in
>>> Append sequentially, and each partition only has 1000 rows each, making
>>> the parallel execution rather inefficient. Unless we scan the partitions
>>> concurrently.
>>>
>>> In any case, as this happens even with plain inheritance, it's probably
>>> more about the parallel query than about the new partitioning patch.
>>
>> Yes, I have seen some discussion [2] about a Parallel Append, which would
>> result in plans you're probably thinking of.
>>
> 
> Actually, I think that thread is about allowing partial paths even if only
> some appendrel members support it. My point is that in this case it's a
> bit silly to even build the partial paths, when the partitions only have
> 100

Re: [HACKERS] pg_background contrib module proposal

2016-12-13 Thread Andrew Borodin

2016-12-13 12:55 GMT+05:00 amul sul :
> I think background-session code is not that much deviated from
> pg_background code,
It is not derived, though it is not much deviated. background sessions
code do not have detailed exceptions and code comments, but it is
doing somewhat similar things.
>IIUC background-session patch we can manage to
> reuse it, as suggested by Robert.  This will allow us to maintain
> session in long run, we could reuse this session to run subsequent
> queries instead of maintaining separate worker pool.  Thoughts?

One API to deal with both features would be much better, sure.
"Object" like sessions pool are much easier to implement on top of
session "object" then on top of worker process, PIDs etc.

>> 4. Query as a future (actually it is implemented already by
>> pg_background_result)
>> 5. Promised result. Declare that you are waiting for data of specific
>> format, execute a query, someone (from another backend) will
>> eventually place a data to promised result and your query continues
>> execution.
>
> Could you please elaborate more?
> Do you mean two way communication between foreground & background process?

It is from C++11 threading: future, promise and async - these are
related but different kinds of aquiring result between threads.
Feature - is when caller Cl start thread T(or dequeue thread from
pool) and Cl can wait until T finishes and provides result.
Here waiting the result is just like selecting from pg_background_result().
Promise - is when you declare a variable and caller do not know which
thread will put the data to this variable. But any thread reading
promise will wait until other thread put a data to promise.
There are more parallelism patterns there, like async, deffered, lazy,
but futures and promises from my point of view are most used.

>> 6. Cancelation: a way to signal to background query that it's time to
>> quit gracefully.
> Let me check this too.
I think Craig is right: any background query must be ready to be shut
down. That's what databases are about, you can safely pull the plug at
any moment.
I've remembered one more parallelism pattern: critical region. It's
when you ask the system "plz no TERM now, and, if you can, postpone
the vacuums, OOMKs and that kind of stuff" from user code. But I do
not think it's usable pattern here.

Thank you for your work.

Best regards, Andrey Borodin.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New design for FK-based join selectivity estimation

2016-12-13 Thread Adrien Nayrat

Hi hackers,

The commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a introduce an
estimation error :

create table t3 as select j from generate_series(1,1)
i,generate_series(1,100) j ;
create table t4 as select j from generate_series(1,100) j ;
create unique index ON t4(j);
alter table t3 add constraint fk foreign key (j) references t4(j);
analyze;

9.5.5
explain analyze select * from t3 where j in (select * from t4 where j<10);
 QUERY PLAN


 Hash Semi Join  (cost=2.36..18053.61 rows=9 width=4) (actual
time=0.217..282.325 rows=9 loops=1)
   Hash Cond: (t3.j = t4.j)
   ->  Seq Scan on t3  (cost=0.00..14425.00 rows=100 width=4)
(actual time=0.112..116.063 rows=100 loops=1)
   ->  Hash  (cost=2.25..2.25 rows=9 width=4) (actual time=0.083..0.083
rows=9 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on t4  (cost=0.00..2.25 rows=9 width=4) (actual
time=0.019..0.074 rows=9 loops=1)
   Filter: (j < 10)
   Rows Removed by Filter: 91
 Planning time: 0.674 ms
 Execution time: 286.043 ms

On 9.6 HEAD

explain analyze select * from t3 where j in (select * from t4 where j<10);
QUERY PLAN

---
 Hash Semi Join  (cost=2.36..18053.61 rows=100 width=4) (actual
time=0.089..232.327 rows=9 loops=1)
   Hash Cond: (t3.j = t4.j)
   ->  Seq Scan on t3  (cost=0.00..14425.00 rows=100 width=4)
(actual time=0.047..97.926 rows=100 loops=1)
   ->  Hash  (cost=2.25..2.25 rows=9 width=4) (actual time=0.032..0.032
rows=9 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on t4  (cost=0.00..2.25 rows=9 width=4) (actual
time=0.008..0.030 rows=9 loops=1)
   Filter: (j < 10)
   Rows Removed by Filter: 91
 Planning time: 0.247 ms
 Execution time: 235.295 ms
(10 rows)


Estimated row is 10x larger since 100340e2d

Regards,

-- 
Adrien NAYRAT

http://dalibo.com - http://dalibo.org



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-13 Thread Kyotaro HORIGUCHI

At Tue, 13 Dec 2016 08:46:06 +0530, Amit Kapila  wrote 
in 
> On Mon, Dec 12, 2016 at 9:54 PM, Masahiko Sawada  
> wrote:
> > On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao  wrote:
> >> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada  
> >> wrote:
> >>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila  
> >>> wrote:
> 
>  Few comments:
> >>>
> >>> Thank you for reviewing.
> >>>
>  + * In 10.0 we support two synchronization methods, priority and
>  + * quorum. The number of synchronous standbys that transactions
>  + * must wait for replies from and synchronization method are specified
>  + * in synchronous_standby_names. This parameter also specifies a list
>  + * of standby names, which determines the priority of each standby for
>  + * being chosen as a synchronous standby. In priority method, the 
>  standbys
>  + * whose names appear earlier in the list are given higher priority
>  + * and will be considered as synchronous. Other standby servers 
>  appearing
>  + * later in this list represent potential synchronous standbys. If any 
>  of
>  + * the current synchronous standbys disconnects for whatever reason,
>  + * it will be replaced immediately with the next-highest-priority 
>  standby.
>  + * In quorum method, the all standbys appearing in the list are
>  + * considered as a candidate for quorum commit.
> 
>  In the above description, is priority method represented by FIRST and
>  quorum method by ANY in the synchronous_standby_names syntax?  If so,
>  it might be better to write about it explicitly.
> >>>
> >>> Added description.
> >>>
> 
> + * specified in synchronous_standby_names. The priority method is
> + * represented by FIRST, and the quorum method is represented by ANY
> 
> Full stop is missing after "ANY".
> 
> >>>
>  6.
>  + int sync_method; /* synchronization method */
>    /* member_names contains nmembers consecutive nul-terminated C strings 
>  */
>    char member_names[FLEXIBLE_ARRAY_MEMBER];
>   } SyncRepConfigData;
> 
>  Can't we use 1 or 2 bytes to store sync_method information?
> >>>
> >>> I changed it to uint8.
> >>>
> 
> + int8 sync_method; /* synchronization method */
> 
> > I changed it to uint8.
> 
> In mail, you have mentioned uint8, but in the code it is int8.  I
> think you want to update code to use uint8.
> 
> 
> >
> >> +standby_list{ $$ =
> >> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
> >> +| NUM '(' standby_list ')'{ $$ =
> >> create_syncrep_config($1, $3, SYNC_REP_QUORUM); }
> >> +| ANY NUM '(' standby_list ')'{ $$ =
> >> create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
> >> +| FIRST NUM '(' standby_list ')'{ $$ =
> >> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }
> >>
> >> Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works
> >> differently from curent version while "list" works in the same way as
> >> current one) very confusing?
> >>
> >> I prefer to either of
> >>
> >> 1. break the backward-compatibility, i.e., treat the first syntax of
> >> "standby_list" as quorum commit
> >> 2. keep the backward-compatibility, i.e., treat the second syntax of
> >> "NUM (standby_list)" as sync rep with the priority
> >
> 
> +1.
> 
> > There were some comments when I proposed the quorum commit. If we do
> > #1 it breaks the backward-compatibility with 9.5 or before as well. I
> > don't think it's a good idea. On the other hand, if we do #2 then the
> > behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM
> > (standby_list)''. But it would not what most of user will want and
> > would confuse the users of future version who will want to use the
> > quorum commit. Since many hackers thought that the sensible default
> > behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the
> > current patch chose to changes the behaviour of s_s_names and document
> > that changes thoroughly.
> >
> 
> Your arguments are sensible, but I think we should address the point
> of confusion raised by Fujii-san.  As a developer, I feel breaking
> backward compatibility (go with Option-1 mentioned above) here is a
> good move as it can avoid confusions in future.  However, I know many
> a time users are so accustomed to the current usage that they feel
> irritated with the change in behavior even it is for their betterment,
> so it is advisable to do so only if it is necessary or we have
> feedback from a couple of users.  So in this case, if we don't want to
> go with Option-1, then I think we should go with Option-2.  If we go
> with Option-2, then we can anyway comeback to change the behavior
> which is more sensible for future after feedback from users.

This implicitly put an assumption that replication configuration
is defined by s_s_names. But in the past discussion, some people
suggested that quorum commit should be configured by another

71 matches

Mail list logo