date:20171002

Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

2017-10-02 Thread Tom Lane

Amit Kapila  writes:
> Any other ideas?

Given that the crash is so far down inside __dlopen(), and that there's
a clear reference to the string we presumably passed to that:

#11 0x7f518485e489 in _dl_open (file=0x55b692f2d2b0 
"/home/smith/postgres/inst/master/lib/pgcrypto.so", mode=-2147483390, 
caller_dlopen=0x55b691cb4c7e <

I don't actually believe that this is Postgres' fault.  I suspect that
what we're looking at here is a low-memory bug in dlopen itself, probably
something strdup'ing an input string and forgetting to check for a null
result.

Presumably somebody could dig into the libc source code and prove or
disprove this, though it would sure help to know exactly what platform
and version Andreas is testing on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-10-02 Thread Ashutosh Bapat

On Fri, Sep 29, 2017 at 9:12 PM, Robert Haas  wrote:
>
> It's possible that we might find that neither of the above approaches
> are practical and that the performance benefits of resolving the
> transaction from the original connection are large enough that we want
> to try to make it work anyhow.  However, I think we can postpone that
> work to a future time.  Any general solution to this problem at least
> needs to be ABLE to resolve transactions at a later time from a
> different session, so let's get that working first, and then see what
> else we want to do.
>

 +1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

2017-10-02 Thread Amit Kapila

On Tue, Oct 3, 2017 at 8:31 AM, Amit Kapila  wrote:
> On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreich  
> wrote:
>> Hi,
>>
>> doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced
>> a couple of parallel worker core dumps with the backtrace below.
>> Although most of the backtrace is inside the dynamic linker, it looks
>> like it was passed a pointer to gone-away shared memory.
>>
>
> It appears to be some dangling pointer, but not sure how it is
> possible.  Can you provide some more details, like do you have any
> other library which you want to get loaded in the backend (like by
> using shared_preload_libraries or by some other way)?  I think without
> that we shouldn't try to load anything in the parallel worker.
>

Another possibility could be that the memory for library space has
been overwritten either in master backend or in worker backend.  I
think that is possible in low-memory conditions if in someplace we try
to write in the memory without ensuring if space is allocated.  I have
browsed the nearby code and didn't find any such instance.  One idea
to narrow down the problem is to see if the other members in worker
backend are sane, for ex. can you try printing the value of
MyFixedParallelState as we get that value from shared memory similar
to libraryspace.  It seems from call stack that the memory of
libraryspace is corrupted, so we can move the call to
lookup/RestoreLibraryState immediately after we assign
MyFixedParallelState.  I think if after this also the memory for
libraryspace is corrupted, then probably something bad has happened in
master backend.

Any other ideas?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: full merge join on comparison clause

2017-10-02 Thread Ashutosh Bapat

On Thu, Sep 28, 2017 at 8:57 PM, Alexander Kuzmenkov
 wrote:
> Hi Ashutosh,
>
> Thanks for the review.
>
> Jeff, I'm copying you because this is relevant to our discussion about what
> to do with mergeopfamilies when adding new merge join types.
>
> You have renamed RestrictInfo member mergeopfamilies as
> equivopfamilies. I don't think that's a good name; it doesn't convey
> that these are opfamilies containing merge operators. The changes in
> check_mergejoinable() suggest that an operator may act as equality
> operator in few operator families and comparison operator in others.
> That looks odd. Actually an operator family contains operators other
> than equality operators, so you may want to retain this member and add
> a new member to specify whether the clause is an equality clause or
> not.
>
>
> For mergeopfamilies, I'm not sure what is the best thing to do. I'll try to
> explain my understanding of the situation, please correct me if I'm wrong.
>
> Before the patch, mergeopfamilies was used for two things: creating
> equivalence classes and performing merge joins.
>
> For equivalence classes: we look at the restriction clauses, and if they
> have mergeopfamilies set, it means that these clause are based on an
> equality operator, and the left and right variables must be equal. To record
> this fact, we create an equivalence class. The variables might be equal for
> one equality operator and not equal for another, so we record the particular
> operator families to which our equality operator belongs.
>
> For merge join: we look at the join clauses, and if they have
> mergeopfamilies set, it means that these clauses are based on an equality
> operator, and we can try performing this particular join as merge join.
> These opfamilies are also used beforehand to create the equivalence classes
> for left and right variables. The equivalence classes are used to match the
> join clauses to pathkeys describing the ordering of join inputs.
>
> So, if we want to start doing merge joins for operators other than equality,
> we still need to record their opfamilies, but only use them for the second
> case and not the first. I chose to put these opfamilies to different
> variables, and
> name the one used for equivalence classes 'equivopfamilies' and the one used
> for merge joins 'mergeopfamilies'. The equality operators are used for both
> cases, so we put their opfamilies into both of these variables.
>
> I agree this might look confusing. Indeed, we could keep a single variable
> for opfamilies, and add separate flags that show how they can be used, be
> that for equivalence classes, merge joins, range joins or some combination
> of them. This is similar to what Jeff did in his range merge join patch [1].
> I will think more about this and try to produce an updated patch.
>

I think we have (ab?)used mergeopfamilies to indicate equality
condition, which needs some changes. May be these two patches are
where we can do those changes.

>
> In mergejoinscansel() you have just removed Assert(op_strategy ==
> BTEqualStrategyNumber); Probably this function is written considering
> on equality operators. But now that we are using this for all other
> operators, we will need more changes to this function. That may be the
> reason why INNER join in your earlier example doesn't choose right
> costing.
>
>
> I changed mergejoinscansel() slightly to reflect the fact that the inner
> relation is scanned from the beginning if we have an inequality merge
> clause.
>
>
> The comment change in final_cost_mergejoin() needs more work. n1, n2,
> n3 are number of rows on inner side with values 1, 2, 3 resp. So n1 +
> n2 + n3 + ... = size of inner relation is correct. In that context I
> am not able to understand your change
> +* If the merge clauses contain inequality, (n1 + n2 + ...) ~=
> +* (size of inner relation)^2.
>
>
> I extended the comment in final_cost_mergejoin(). Not sure if that
> approximation makes any sense, but this is the best I could think of.
>
> Style problems are fixed.
>
> Attached please find the new version of the patch that addresses all the
> review comments except mergeopfamilies.
>
> The current commitfest is ending, but I'd like to continue working on this
> patch, so I am moving it to the next one.
>
>

Thanks for working on the comments. I am interested to continue
reviewing it in the next commitfest.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Conversion error

2017-10-02 Thread Tatsuo Ishii

> Tatsuo Ishii  writes:
>> I saw this while restoring 9.6 database to 10.0 database.
>> \connect: FATAL:  conversion between UTF8 and MULE_INTERNAL is not supported
>> Is this expected? I saw nothing releated to this in the release notes.
> 
> Don't think that's ever been supported.

Sorry, that was my misunderstanding. All versions of PostgreSQL allow
to create MULE_INTERNAL database while executing restore, but after
that it fails while loading rows into the mule internal database. So,
if there's no table in the database, the restore succeeds. That's why
I got wrong expressions.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-02 Thread Michael Paquier

On Tue, Oct 3, 2017 at 1:15 PM, Zeus Kronion  wrote:
> I previously made one minuscule contribution to the project two years ago.
> I'm interested in doing some more, and I'm trying to figure out what to
> focus on. Two SSL-related projects caught my attention:
> 1) Allow automatic selection of SSL client certificates from a certificate
> store (https://www.postgresql.org/message-id/8766.1241799...@sss.pgh.pa.us).
> It seems relatively straightforward to support an additional file format for
> key-value pairs in postgresql.crt/.key, and I think this is something I
> could take on if it's still desired.
> 2) I was surprised to learn the following from the docs:

One other thing that could be improved, and that has been already
asked for is improvement for passphrase handling, particularly since
SSL parameters can be reloaded, by adding for example a new GUC
parameter that calls a shell command which outputs what is wanted to
stdout. It could be tricky to implement as the postmaster should be
able to handle requests when launching the command. But I think you
get the idea.

>> By default, PostgreSQL will not perform any verification of the server
>> certificate. This means that it is possible to spoof the server identity
>> (for example by modifying a DNS record or by taking over the server IP
>> address) without the client knowing. In order to prevent spoofing, SSL
>> certificate verification must be used.
>
> Is there a technical reason to perform no verification by default? Wouldn't
> a safer default be desirable?

It would be nice to get into a stronger default with "require" at
least, the recommendation is to use at least "verify-ca" for any
serious deployment. Note that not long ago there were arguments about
how the default value of sslmode called 'prefer' is good at giving a
false sense of security, but this led nowhere (can't put my hands on
this thread now..).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-02 Thread Tom Lane

Zeus Kronion  writes:
> 2) I was surprised to learn the following from the docs:

>> By default, PostgreSQL will not perform any verification of the server
>> certificate.

> Is there a technical reason to perform no verification by default? Wouldn't
> a safer default be desirable?

I'm not an SSL expert, so insert appropriate grain of salt, but AIUI the
question is what are you going to verify against?  You need some local
notion of which are your trusted root certificates before you can verify
anything.  So to default to verification would be to default to failing to
connect at all until user has created a ~/.postgresql/root.crt file with
valid, relevant entries.  That seems like a nonstarter.

It's possible that we could adopt some policy like "if the root.crt file
exists then default to verify" ... but that seems messy and unreliable,
so I'm not sure it would really add any security.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Conversion error

2017-10-02 Thread Tom Lane

Tatsuo Ishii  writes:
> I saw this while restoring 9.6 database to 10.0 database.
> \connect: FATAL:  conversion between UTF8 and MULE_INTERNAL is not supported
> Is this expected? I saw nothing releated to this in the release notes.

Don't think that's ever been supported.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Conversion error

2017-10-02 Thread Tatsuo Ishii

> I saw this while restoring 9.6 database to 10.0 database.
> 
> \connect: FATAL:  conversion between UTF8 and MULE_INTERNAL is not supported
> 
> Is this expected? I saw nothing releated to this in the release notes.

This had been allowed in 9.6. So I think 10.0 silently drops the feature.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logging idle checkpoints

2017-10-02 Thread Kyotaro HORIGUCHI

At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquier  
wrote in 
> On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost  wrote:
> > I certainly don't care for the idea of adding log messages saying we
> > aren't doing anything just to match a count that's incorrectly claiming
> > that checkpoints are happening when they aren't.
> >
> > The down-thread suggestion of keeping track of skipped checkpoints might
> > be interesting, but I'm not entirely convinced it really is.  We have
> > time to debate that, of course, but I don't really see how that's
> > helpful.  At the moment, it seems like the suggestion to add that column
> > is based on the assumption that we're going to start logging skipped
> > checkpoints and having that column would allow us to match up the count
> > between the new column and the "skipped checkpoint" messages in the logs
> > and I can not help but feel that this is a ridiculous amount of effort
> > being put into the analysis of something that *didn't* happen.
> 
> Being able to look at how many checkpoints are skipped can be used as
> a tuning indicator of max_wal_size and checkpoint_timeout, or in short
> increase them if those remain idle.

We ususally adjust the GUCs based on how often checkpoint is
*executed* and how many of the executed checkpoints have been
triggered by xlog progress (or with shorter interval than
timeout). It seems enough. Counting skipped checkpoints gives
just a rough estimate of how long the system was getting no
substantial updates. I doubt that users get something valuable by
counting skipped checkpoints.

> Since their introduction in
> 335feca4, m_timed_checkpoints and m_requested_checkpoints track the
> number of checkpoint requests, not if a checkpoint has been actually
> executed or not, I am not sure that this should be changed after 10
> years. So, to put it in other words, wouldn't we want a way to track
> checkpoints that are *executed*, meaning that we could increment a
> counter after doing the skip checks in CreateRestartPoint() and
> CreateCheckPoint()?

This sounds reasonable to me.

CreateRestartPoint() is already returning ckpt_performed, it is
used to let checkpointer retry in 15 seconds rather than waiting
the next checkpoint_timeout. Checkpoint might deserve the same
treatment on skipping.

By the way RestartCheckPoint emits DEBUG2 messages on skipping.
Although restartpoint has different characteristics from
checkpoint, if we change the message level for CreateCheckPoint
(currently DEBUG1), CreateRestartPoint might should get the same
change.  (Elsewise at least they ought to have the same message
level?)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-02 Thread Zeus Kronion

I previously made one minuscule contribution to the project two years ago.
I'm interested in doing some more, and I'm trying to figure out what to
focus on. Two SSL-related projects caught my attention:
1) Allow automatic selection of SSL client certificates from a certificate
store (https://www.postgresql.org/message-id/8766.1241799...@sss.pgh.pa.us).
It seems relatively straightforward to support an additional file format
for key-value pairs in postgresql.crt/.key, and I think this is something I
could take on if it's still desired.
2) I was surprised to learn the following from the docs:

> By default, PostgreSQL will not perform any verification of the server
certificate. This means that it is possible to spoof the server identity
(for example by modifying a DNS record or by taking over the server IP
address) without the client knowing. In order to prevent spoofing, SSL
certificate
verification must be used.

Is there a technical reason to perform no verification by default? Wouldn't
a safer default be desirable?

Re: [HACKERS] path toward faster partition pruning

2017-10-02 Thread Amit Langote

Hi David.

Thanks a lot for your review comments and sorry it took me a while to reply.

On 2017/09/28 18:16, David Rowley wrote:
> On 27 September 2017 at 14:22, Amit Langote
>  wrote:
>> - 0001 includes refactoring that Dilip proposed upthread [1] (added him as
>>   an author).  I slightly tweaked his patch -- renamed the function
>>   get_matching_clause to match_clauses_to_partkey, similar to
>>   match_clauses_to_index.
> 
> Hi Amit,
> 
> I've made a pass over the 0001 patch while trying to get myself up to
> speed with the various developments that are going on in partitioning
> right now.
> 
> These are just my thoughts from reading over the patch. I understand
> that there's quite a bit up in the air right now about how all this is
> going to work, but I have some thoughts about that too, which I
> wouldn't mind some feedback on as my line of thought may be off.
> 
> Anyway, I will start with some small typos that I noticed while reading:
> 
> partition.c:
> 
> + * telling what kind of NullTest has been applies to the corresponding
> 
> "applies" should be "applied"

Will fix.

> plancat.c:
> 
> * might've occurred to satisfy the query.  Rest of the fields are set in
> 
> "Rest of the" should be "The remaining"

Will fix.

> Any onto more serious stuff:
> 
> allpath.c:
> 
> get_rel_partitions()
> 
> I wonder if this function does not belong in partition.c. I'd have
> expected a function to exist per partition type, RANGE and LIST, I'd
> imagine should have their own function in partition.c to eliminate
> partitions
> which cannot possibly match, and return the remainder. I see people
> speaking of HASH partitioning, but we might one day also want
> something like RANDOM or ROUNDROBIN (I'm not really kidding, imagine
> routing records to be processed into foreign tables where you never
> need to query them again). It would be good if we could easily expand
> this list and not have to touch files all over the optimizer to do
> that. Of course, there would be other work to do in the executor to
> implement any new partitioning method too.

I think there will have to be at least some code in the optimizer.  That
is, the code to match the query to the table's partition keys and using
the constant values therein to then look up the table's partitions.  More
importantly, once the partitions (viz. their offsets in the table's
partition descriptor) have been looked up by partition.c, we must be able
to then map them to their planner data structures viz. their
AppendRelInfo, and subsequently RelOptInfo.  This last part will have to
be in the optimizer.  In fact, that was the role of get_rel_partitions in
the initial versions of the patch, when neither of the code for matching
keys and for pruning using constants was implemented.

One may argue that the first part, that is, matching clauses to match to
the partition key and subsequent lookup of partitions using constants
could both be implemented in partition.c, but it seems better to me to
keep at least the part of matching clauses to the partition keys within
the planner (just like matching clauses to indexes is done by the
planner).  Looking up partitions using constants cannot be done outside
partition.c anyway, so that's something we have to implement there.

> I know there's mention of it somewhere about get_rel_partitions() not
> being so smart about WHERE partkey > 20 AND partkey > 10, the code
> does not understand what's more restrictive. I think you could
> probably follow the same rules here that are done in
> eval_const_expressions(). Over there I see that evaluate_function()
> will call anything that's not marked as volatile.

Hmm, unless I've missed it, I don't see in evaluate_function() anything
about determining which clause is more restrictive.  AFAIK, such
determination depends on the btree operator class semantics (at least in
the most common cases where, say, ">" means greater than in a sense that
btree code uses it), so I was planning to handle it the way btree code
handles it in _bt_preprocess_keys().  In fact, the existing code in
predtest.c, which makes decisions of the similar vein also relies on btree
semantics.  It's OK to do so, because partitioning methods that exist
today and for which we'd like to have such smarts use btree semantics to
partition the data.  Also, we won't need to optimize such cases for hash
partitioning anyway.

> I'd imagine, for
> each partition key, you'd want to store a Datum with the minimum and
> maximum possible value based on the quals processed. If either the
> minimum or maximum is still set to NULL, then it's unbounded at that
> end. If you encounter partkey = Const, then minimum and maximum can be
> set to the value of that Const. From there you can likely ignore any
> other quals for that partition key, as if the query did contain
> another qual with partkey = SomeOtherConst, then that would have
> become a gating qual during the constant folding process. This way

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane

Andres Freund  writes:
> On 2017-10-02 19:50:51 -0400, Tom Lane wrote:
>> What I saw was that the backend process was consuming 100% of (one) CPU,
>> while the I/O transaction rate viewed by "iostat 1" started pretty low
>> --- under 10% of what the machine is capable of --- and dropped from
>> there as the copy proceeded.  I did not think to check if that was user
>> or kernel-space CPU, but I imagine it has to be the latter.

> So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or
> msync() that's the problem here. I guess you didn't run a profile?

Interestingly, profiling with Activity Monitor seems to blame the problem
entirely on munmap() ... which squares with the place I hit every time
when randomly stopping the process with gdb^Hlldb, so I'm inclined to
believe it.

This still offers no insight as to why CREATE DATABASE is hitting the
problem while regular flush activity doesn't.

> One interesting thing here is that in the CREATE DATABASE case there'll
> probably be a lot larger contiguous mappings than in *_flush_after
> cases. So it might be related to the size of the mapping / flush "unit".

Meh, the mapping is only 64K in this case vs. 8K in the other.  Hard
to credit that it breaks that easily.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

2017-10-02 Thread Amit Kapila

On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreich  wrote:
> Hi,
>
> doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced
> a couple of parallel worker core dumps with the backtrace below.
> Although most of the backtrace is inside the dynamic linker, it looks
> like it was passed a pointer to gone-away shared memory.
>

It appears to be some dangling pointer, but not sure how it is
possible.  Can you provide some more details, like do you have any
other library which you want to get loaded in the backend (like by
using shared_preload_libraries or by some other way)?  I think without
that we shouldn't try to load anything in the parallel worker.  Also,
if you can get the failed query (check in server log), it would be
great.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Tests for reloptions

2017-10-02 Thread Michael Paquier

On Sun, Oct 1, 2017 at 5:18 AM, Nikolay Shaplov  wrote:
> src/backend/access/common/reloptions.c get only 7 lines, it was quite covered
> by existing test, but all most of the access methods gets some coverage
> increase:
>
> src/backend/access/brin 1268 -> 1280 (+18)
> src/backend/access/gin  2924 -> 2924 (0)
> src/backend/access/gist 1673 -> 2108 (+435)
> src/backend/access/hash 1580 -> 1638 (+58)
> src/backend/access/heap 2863 -> 2866 (+3)
> src/backend/access/nbtree   2565 -> 2647 (+82)
> src/backend/access/spgist   2066 -> 2068 (+2)
>
> Though I should say that incredible coverage boost for gist, is the result of
> not steady results of test run. The real value should be much less...

+-- Test buffering and fillfactor reloption takes valid values
+create index gist_pointidx2 on gist_point_tbl using gist(p) with
(buffering = on, fillfactor=50);
+create index gist_pointidx3 on gist_point_tbl using gist(p) with
(buffering = off);
+create index gist_pointidx4 on gist_point_tbl using gist(p) with
(buffering = auto);
Those are the important bits doing the boost in gist visibly. To be kept.

-CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops);
+CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random
float8_ops) WITH (fillfactor=60);
Woah. So that alone increases hash by 58 steps.

+CREATE INDEX bloomidx2 ON tst USING bloom (i, t) WITH (length=0);
+ERROR:  value 0 out of bounds for option "length"
+DETAIL:  Valid values are between "1" and "4096".
contrib/bloom contributes to the coverage of reloptions.c thanks to
its coverage of the add_ routines when the library is loaded. And
those don't actually improve any coverage, right?

> Nevertheless tests touches the reloptions related code, checks for proper
> error handling, and it is good.

Per your measurements reloptions.c is improved by 1.7%, or 7 lines as
you say. Five of them are in parse_one_reloption for integer (2) and
reals (2), plus one error at the top. Two are in transformRelOptions
for a valid namespace handling. In your patch reloptions.sql is 141
lines. Couldn't it be shorter with the same improvements? It looks
that a lot of tests are overlapping with existing ones.

> I think we should commit it.

My take is that a lighter version could be committed instead. My
suggestion is to keep the new test reloptions minimal so as it tests
the relopt types and their bounds, and I think that you could remove
the same bounding checks in the other tests that you added: bloom,
gist, etc.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-02 Thread Robert Haas

On Thu, Sep 21, 2017 at 8:21 AM, Ashutosh Bapat
 wrote:
> Here's set of rebased patches. The patch with extra tests is not for
> committing. All other patches, except the last one, will need to be
> committed together. The last patch may be committed along with other
> patches or as a separate patch.

In set_append_rel_size, is it necessary to set attr_needed =
bms_copy(rel->attr_needed[index]) rather than just pointing to the
existing value?  If so, perhaps the comments should explain the
reasons.  I would have thought that the values wouldn't change after
this point, in which case it might not be necessary to copy them.

Regarding nomenclature and my previous griping about wisdom, I was
wondering about just calling this a "partition join" like you have in
the regression test.  So the GUC would be enable_partition_join, you'd
have generate_partition_join_paths(), etc.  Basically just delete
"wise" throughout.

The elog(DEBUG3) in try_partition_wise_join() doesn't follow message
style guidelines and I think should just be removed.  It was useful
for development, I'm sure, but it's time for it to go.

+elog(ERROR, "unrecognized path node type %d", (int) nodeTag(path));

I think we should use the same formulation as elsewhere, namely
"unrecognized node type: %d".  And likewise probably "unexpected join
type: %d".

partition_join_extras.sql has a bunch of whitespace damage, although
it doesn't really matter since, as you say, that's not for commit.

(This is not a full review, just a few thoughts.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-10-02 Thread Michael Paquier

On Tue, Oct 3, 2017 at 12:54 AM, Alvaro Herrera  wrote:
> Michael Paquier wrote:
>> On Mon, Oct 2, 2017 at 11:44 PM, Robert Haas  wrote:
>> > On Mon, Oct 2, 2017 at 10:19 AM, Michael Paquier
>> >  wrote:
>> >> So those bits could be considered for integration.
>> >
>> > Yes, and they also tend to suggest that the rest of the patch has value.
>>
>> Well, there are cases where you don't need any locking checks, and the
>> proposed patch ignores that. Take for example pageinspect which works
>> on a copy of a page, or just WAL replay which serializes everything,
>> and in both cases PageGetLSN can be used safely. So for compatibility
>> reasons I think that PageGetLSN should be kept alone to not surprise
>> anybody using it, or at least an equivalent should be introduced. It
>> would be interesting to make BufferGetLSNAtomic hold tighter checks,
>> like something that makes use of LWLockHeldByMe for example.

Jacob, here are some ideas to make this thread move on. I would
suggest to produce a set of patches that do things incrementally:
1) One patch that changes the calls of PageGetLSN to
BufferGetLSNAtomic which are now not appropriate. You have spotted
some of them in the btree and gist code, but not all based on my first
lookup. There is still one in gistFindCorrectParent and one in btree
_bt_split. The monitoring of the other calls (sequence.c and
vacuumlazy.c) looked safe. There is another one in XLogRecordAssemble
that should be fixed I think.
2) A second patch that strengthens a bit checks around
BufferGetLSNAtomic. One idea would be to use LWLockHeldByMe, as you
are originally suggesting.
A comment could be as well added in bufpage.h for PageGetLSN to let
users know that it should be used carefully, in the vein of what is
mentioned in src/backend/access/transam/README.

> I'd argue about this in the same direction I argued about
> BufferGetPage() needing an LSN check that's applied separately: if it's
> too easy for a developer to do the wrong thing (i.e. fail to apply the
> separate LSN check after reading the page) then the routine should be
> patched to do the check itself, and another routine should be offered
> for the cases that explicitly can do without the check.
>
> I was eventually outvoted in the BufferGetPage() saga, though, so I'm
> not sure that that discussion sets precedent.

Oh... I don't recall this discussion. A quick lookup at the archives
does not show me a specific thread either.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Combining expr{Type,Typmod,Collation}() into one function.

2017-10-02 Thread Andres Freund

Hi,

I'd recently noticed that expr* functions actually show up in profiles
because we use them at some very common paths
(e.g. ExecTypeFromTLInternal()) and that we commonly call all the three
variants from $subject in sequence.

Looking at their code I was wondering whether it's reasonable to combine
them into one function. The performance effects in my case were
neglegible, but it might still be worth it just to reduce duplication.

I've attached my *WIP* patch to do so. Unless somebody finds this
interesting and prods me, I don't plan to do something further with
this.  If we were to go with this, some cleanup would be needed.

Greetings,

Andres Freund
>From 04dc797d6e99d7fe5902a31ae5fdd73950c8c48f Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 21 Sep 2017 12:13:38 -0700
Subject: [PATCH] WIP: Combine expr{Type,Typmod,Collation} into one function.

Slight speedup, removal of some duplication. Not entirely sure it's
worth it.
---
 src/backend/executor/execTuples.c |  12 +-
 src/backend/nodes/nodeFuncs.c | 929 +-
 src/include/nodes/nodeFuncs.h |   1 +
 3 files changed, 435 insertions(+), 507 deletions(-)

diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 51d2c5d166..1035efb4aa 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -919,18 +919,24 @@ ExecTypeFromTLInternal(List *targetList, bool hasoid, bool skipjunk)
 	foreach(l, targetList)
 	{
 		TargetEntry *tle = lfirst(l);
+		Oid			type;
+		int32		typmod;
+		Oid			collation;
 
 		if (skipjunk && tle->resjunk)
 			continue;
+		exprTypeInfo((Node *) tle->expr,
+	 , , );
+
 		TupleDescInitEntry(typeInfo,
 		   cur_resno,
 		   tle->resname,
-		   exprType((Node *) tle->expr),
-		   exprTypmod((Node *) tle->expr),
+		   type,
+		   typmod,
 		   0);
 		TupleDescInitEntryCollation(typeInfo,
 	cur_resno,
-	exprCollation((Node *) tle->expr));
+	collation);
 		cur_resno++;
 	}
 
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 8e6f27e153..caf5ee6a42 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -35,69 +35,134 @@ static bool planstate_walk_members(List *plans, PlanState **planstates,
 
 
 /*
- *	exprType -
- *	  returns the Oid of the type of the expression's result.
+ *	exprTypeInfo -
+ *	  lookup an expression result's type, typmod, collation.
  */
-Oid
-exprType(const Node *expr)
+void
+exprTypeInfo(const Node *expr, Oid *type, int32 *typmod, Oid *collation)
 {
-	Oid			type;
+	/* initialize to defaults, overridden where appropriate */
+	*type = InvalidOid;
+	*typmod = -1;
+	*collation = InvalidOid;
 
 	if (!expr)
-		return InvalidOid;
+		return;
 
 	switch (nodeTag(expr))
 	{
 		case T_Var:
-			type = ((const Var *) expr)->vartype;
-			break;
+			{
+const Var *var = (const Var *) expr;
+*type = var->vartype;
+*typmod = var->vartypmod;
+*collation = var->varcollid;
+break;
+			}
 		case T_Const:
-			type = ((const Const *) expr)->consttype;
-			break;
+			{
+const Const *c = (const Const *) expr;
+*type = c->consttype;
+*typmod = c->consttypmod;
+*collation = c->constcollid;
+break;
+			}
 		case T_Param:
-			type = ((const Param *) expr)->paramtype;
-			break;
+			{
+const Param *p = (const Param *) expr;
+*type = p->paramtype;
+*typmod = p->paramtypmod;
+*collation = p->paramcollid;
+break;
+			}
 		case T_Aggref:
-			type = ((const Aggref *) expr)->aggtype;
-			break;
+			{
+const Aggref *a = (const Aggref *) expr;
+*type = a->aggtype;
+*collation = a->aggcollid;
+break;
+			}
 		case T_GroupingFunc:
-			type = INT4OID;
+			*type = INT4OID;
 			break;
 		case T_WindowFunc:
-			type = ((const WindowFunc *) expr)->wintype;
-			break;
+			{
+const WindowFunc *w = (const WindowFunc *) expr;
+*type = w->wintype;
+*collation = w->wincollid;
+break;
+			}
 		case T_ArrayRef:
 			{
 const ArrayRef *arrayref = (const ArrayRef *) expr;
 
 /* slice and/or store operations yield the array type */
 if (arrayref->reflowerindexpr || arrayref->refassgnexpr)
-	type = arrayref->refarraytype;
+	*type = arrayref->refarraytype;
 else
-	type = arrayref->refelemtype;
+	*type = arrayref->refelemtype;
+
+*typmod = arrayref->reftypmod;
+*collation = arrayref->refcollid;
 			}
 			break;
 		case T_FuncExpr:
-			type = ((const FuncExpr *) expr)->funcresulttype;
-			break;
+			{
+const FuncExpr *func = (const FuncExpr *) expr;
+int32		coercedTypmod;
+
+*type = func->funcresulttype;
+
+/* Be smart about length-coercion functions... */
+if (exprIsLengthCoercion(expr, ))
+	*typmod = coercedTypmod;
+else
+	*typmod = -1;
+*collation = func->funccollid;
+break;
+			}
 		case T_NamedArgExpr:
-			type = exprType((Node *) ((const NamedArgExpr *)

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-10-02 Thread Michael Paquier

On Tue, Oct 3, 2017 at 1:30 AM, Robert Haas  wrote:
> On Fri, Sep 15, 2017 at 6:29 PM, Michael Paquier
>  wrote:
>> I would like to point out that per the RFC, if the client attempts a
>> SSL connection with SCRAM and that the server supports channel
>> binding, then it has to publish the SASL mechanism for channel
>> binding, aka SCRAM-PLUS. If the client tries to force the use of SCRAM
>> even if SCRAM-PLUS is specified, this is seen as a downgrade attack by
>> the server which must reject the connection. So this parameter has
>> meaning only if you try to connect to a PG10 server using a PG11
>> client (assuming that channel binding gets into PG11). If you connect
>> with a PG11 client to a PG11 server with SSL, the server publishes
>> SCRAM-PLUS, the client has to use it, hence this turns out to make
>> cbind=disable and prefer meaningless in the long-term. If the client
>> does not use SSL, then there is no channel binding, and cbind=require
>> loses its value. So cbind's fate is actually linked to sslmode.
>
> That seems problematic.  What if the client supports SCRAM but not
> channel binding?

Peter has outlined here that my interpretation of the RFC was wrong on
the client side to begin with:
https://www.postgresql.org/message-id/f74525e4-6c53-c653-6860-a8cc8d7c8...@2ndquadrant.com
If a client does not support channel binding (it is not compiled with
SSL or the connection is done without SSL), it should not send 'y' but
'n'. It should be up to the client to decide if it wants to use
channel binding or not. libpq is also going to need some extra logic
to send 'y' when it thinks that the server should have channel binding
support. This can be done by looking at the backend version.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type

2017-10-02 Thread Kyotaro HORIGUCHI

Hi,

Sorry, I saw this once but somehow my attension was blown away on
the way.

At Tue, 3 Oct 2017 02:41:34 +0300, Alexander Korotkov 
 wrote in 

> On Tue, Oct 3, 2017 at 12:20 AM, Maksim Milyutin 
> wrote:
> 
> > I have tested the following case:
> >
> > create type pair as (x int, y int);
> > prepare test as select json_populate_record(null::pair, '{"x": 1, "y":
> > 2}'::json);
> > drop type pair cascade;
> >
> > execute test;
> >
> > -- The following output is obtained before patch
> > ERROR:  cache lookup failed for type 16419
> >
> > -- After applying patch
> > ERROR:  type "pair" does not exist
> >
> > But after recreating 'pair' type I'll get the following message:
> > ERROR:  cached plan must not change result type
> >
> > I don't know whether it's right behavior. Anyhow your point is a good
> > motivation to experiment and investigate different scenarios of work with
> > cached plan that depends on non-stable type. Thanks for that.
> >
> 
> I think ideally, cached plan should be automatically invalidated and stored
> procedure should work without error.
> Not really sure if it's feasible...

Without the patch dropping a table used in a prepared query
results in the similar error. So I suppose it's the desired
behavior in the case.

execute test;
| ERROR:  relation "t3" does not exist


The first thought that patch gave me is that the problem is not
limited to constants. Actually the following sequence also
reproduces similar failure even with this patch.

create table t2 (x int , y int);
create type pair as (x int, y int);
prepare test as select row(x, y)::pair from t2;
drop type pair;
execute test;
| ERROR:  cache lookup failed for type 16410

In this case the causal expression is in the following form.

  TargetEntry (
expr = (
  RowExpr:
typeid = 16410,
row_format = COERCE_EXPLICIT_CAST,
args = List (Var(t2.x), Var(t2.y))
)
  )


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest 201709 is now closed

2017-10-02 Thread Haribabu Kommi

On Tue, Oct 3, 2017 at 3:12 AM, Robert Haas  wrote:

> On Mon, Oct 2, 2017 at 11:57 AM, Tom Lane  wrote:
> > Daniel Gustafsson  writes:
> >> Thanks to everyone who participated, and to everyone who have responded
> to my
> >> nagging via the CF app email function. This is clearly an awesome
> community.
> >
> > And thanks to you for your hard work as CFM!  That's tedious and
> > largely thankless work, but it's needed to keep things moving.
>
> +1.
>
> I think we need to figure out some plan for tackling the backlog of
> Ready for Committer patches, though.  There are currently 42 of them,
> 10 of which are listed (maybe not all correctly) as bug fixes.  That's
> not great.

How about having an another extra week like (commit week) after commitfest
closed(may be new state like "commit" or etc) for the Ready for committer
patches to let have the feed back from committers, In case of some rework
is required and it will be get ready by the next commitfest begins.

I feel something along these lines may get earlier feedback to the the
patches.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Logging idle checkpoints

2017-10-02 Thread Michael Paquier

On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost  wrote:
> I certainly don't care for the idea of adding log messages saying we
> aren't doing anything just to match a count that's incorrectly claiming
> that checkpoints are happening when they aren't.
>
> The down-thread suggestion of keeping track of skipped checkpoints might
> be interesting, but I'm not entirely convinced it really is.  We have
> time to debate that, of course, but I don't really see how that's
> helpful.  At the moment, it seems like the suggestion to add that column
> is based on the assumption that we're going to start logging skipped
> checkpoints and having that column would allow us to match up the count
> between the new column and the "skipped checkpoint" messages in the logs
> and I can not help but feel that this is a ridiculous amount of effort
> being put into the analysis of something that *didn't* happen.

Being able to look at how many checkpoints are skipped can be used as
a tuning indicator of max_wal_size and checkpoint_timeout, or in short
increase them if those remain idle. Since their introduction in
335feca4, m_timed_checkpoints and m_requested_checkpoints track the
number of checkpoint requests, not if a checkpoint has been actually
executed or not, I am not sure that this should be changed after 10
years. So, to put it in other words, wouldn't we want a way to track
checkpoints that are *executed*, meaning that we could increment a
counter after doing the skip checks in CreateRestartPoint() and
CreateCheckPoint()?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-02 Thread Michael Paquier

On Tue, Oct 3, 2017 at 9:07 AM, Alexander Korotkov
 wrote:
> +1,
> I see 3 options there:
> 1) Drop high-order bit, as you proposed.
> 2) Allow negative queryIds.
> 3) Implement unsigned 64-type.
>
> #1 causes minor loss of precision which looks rather insignificant in given
> context.
> #2 might be rather unexpected for users whose previously had non-negative
> queryIds.  Changing queryId from 32-bit to 64-bit itself might require some
> adoption from monitoring software. But queryIds are user-visible, and
> negative queryIds would look rather nonlogical.

Per the principal of least astonishment perhaps:
https://en.wikipedia.org/wiki/Principle_of_least_astonishment
Negative values tend to be considered as error codes as well.

> #3 would be attaching hard and long-term problem by insufficient reason.
> Thus, #1 looks like most harmless solution.

In this case going for #1 looks like the safest bet.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-02 Thread Michael Paquier

On Tue, Oct 3, 2017 at 6:54 AM, Tom Lane  wrote:
> I wrote:
>> If this is the only problem then I'd agree we should stick to a spinlock
>> (I assume the strings in question can't be very long).  I was thinking
>> more about what to do if we find other violations that are harder to fix.

I don't think that there is any need to switch to a LWLock. Any issues
in need to be dealt with here don't require it, if we are fine with
the memcpy method of course.

> I took a quick look through walreceiver.c, and there are a couple of
> obvious problems of the same ilk in WalReceiverMain, which is doing this:
>
> walrcv->lastMsgSendTime = walrcv->lastMsgReceiptTime = 
> walrcv->latestWalEndTime = GetCurrentTimestamp();
>
> (ie, a potential kernel call) inside a spinlock.  But there seems no
> real problem with just collecting the timestamp before we enter that
> critical section.

No problems seen either from here.

> I also don't especially like the fact that just above there it reaches
> elog(PANIC) with the lock still held, though at least that's a case that
> should never happen.

This part has been around since the beginning in 1bb2558. I agree that
the lock should be released before doing the logging.

> Further down, it's doing a pfree() inside the spinlock, apparently
> for no other reason than to save one "if (tmp_conninfo)".

Check.

> I don't especially like the Asserts inside spinlocks, either.  Personally,
> I think if those conditions are worth testing then they're worth testing
> for real (in production).  Variables that are manipulated by multiple
> processes are way more likely to assume unexpected states than local
> variables.

Those could be replaced by similar checks using some
PG_USED_FOR_ASSERTS_ONLY out of the spin lock sections, though I am
not sure if those are worth worrying. What do others think?

> I'm also rather befuddled by the fact that this code sets and clears
> walrcv->latch outside the critical sections.  If that field is used
> by any other process, surely that's completely unsafe.  If it isn't,
> why is it being kept in shared memory?

Do you mean the introduction of WalRcvForceReply by 314cbfc? This is
more recent, and has been discussed during the review of the
remote_apply patch here to avoid sending SIGUSR1 too much from the
startup process to the WAL receiver:
https://www.postgresql.org/message-id/CA+TgmobPsROS-gFk=_KJdW5scQjcKtpiLhsH9Cw=uwh1htf...@mail.gmail.com

I am attaching a patch that addresses the bugs for the spin lock sections.
-- 
Michael

walreceiver-spin-calls.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Conversion error

2017-10-02 Thread Tatsuo Ishii

I saw this while restoring 9.6 database to 10.0 database.

\connect: FATAL:  conversion between UTF8 and MULE_INTERNAL is not supported

Is this expected? I saw nothing releated to this in the release notes.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest 201709 is now closed

2017-10-02 Thread Amit Langote

On 2017/10/03 7:16, Michael Paquier wrote:
> On Tue, Oct 3, 2017 at 1:23 AM, Alexander Korotkov
>  wrote:
>> On Mon, Oct 2, 2017 at 6:57 PM, Tom Lane  wrote:
>>>
>>> Daniel Gustafsson  writes:
 Thanks to everyone who participated, and to everyone who have responded
 to my
 nagging via the CF app email function. This is clearly an awesome
 community.
>>>
>>> And thanks to you for your hard work as CFM!  That's tedious and
>>> largely thankless work, but it's needed to keep things moving.
>>
>> +1,
>> Thank you very much, Daniel!  It was a pleasure working with you at
>> commitfest.
> 
> Thanks for doing a bunch of work, Daniel. This is a difficult task.

+1.  Thank you, Daniel!

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-02 Thread Alexander Korotkov

On Tue, Oct 3, 2017 at 12:32 AM, Tom Lane  wrote:

> Peter Geoghegan  writes:
> > You need to change the SQL interface as well, although I'm not sure
> > exactly how. The problem is that you are now passing a uint64 queryId
> > to Int64GetDatumFast() within pg_stat_statements_internal(). That
> > worked when queryId was a uint32, because you can easily represent
> > values <= UINT_MAX as an int64/int8. However, you cannot represent the
> > second half of the range of uint64 within a int64/int8. I think that
> > this will behave different depending on USE_FLOAT8_BYVAL, if nothing
> > else.
>
> Maybe intentionally drop the high-order bit, so that it's a 63-bit ID?
>

+1,
I see 3 options there:
1) Drop high-order bit, as you proposed.
2) Allow negative queryIds.
3) Implement unsigned 64-type.

#1 causes minor loss of precision which looks rather insignificant in given
context.
#2 might be rather unexpected for users whose previously had non-negative
queryIds.  Changing queryId from 32-bit to 64-bit itself might require some
adoption from monitoring software. But queryIds are user-visible, and
negative queryIds would look rather nonlogical.
#3 would be attaching hard and long-term problem by insufficient reason.
Thus, #1 looks like most harmless solution.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Andres Freund

On 2017-10-02 19:50:51 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-10-02 18:33:17 -0400, Tom Lane wrote:
> >> I'm kind of surprised that machine B doesn't show obvious tanking in this
> >> test given that msync() makes it suck so badly at copying a database.
> >> I wonder what is different from the kernel's standpoint ... maybe the
> >> sheer number of different files mmap'd by a single process during the
> >> copy?
> 
> > Yea, that's curious. I've really no clue about OSX, so pardon my
> > question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu
> > blocked?
> 
> What I saw was that the backend process was consuming 100% of (one) CPU,
> while the I/O transaction rate viewed by "iostat 1" started pretty low
> --- under 10% of what the machine is capable of --- and dropped from
> there as the copy proceeded.  I did not think to check if that was user
> or kernel-space CPU, but I imagine it has to be the latter.

So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or
msync() that's the problem here. I guess you didn't run a profile?

One interesting thing here is that in the CREATE DATABASE case there'll
probably be a lot larger contiguous mappings than in *_flush_after
cases. So it might be related to the size of the mapping / flush "unit".

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane

Andres Freund  writes:
> On 2017-10-02 18:33:17 -0400, Tom Lane wrote:
>> I'm kind of surprised that machine B doesn't show obvious tanking in this
>> test given that msync() makes it suck so badly at copying a database.
>> I wonder what is different from the kernel's standpoint ... maybe the
>> sheer number of different files mmap'd by a single process during the
>> copy?

> Yea, that's curious. I've really no clue about OSX, so pardon my
> question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu
> blocked?

What I saw was that the backend process was consuming 100% of (one) CPU,
while the I/O transaction rate viewed by "iostat 1" started pretty low
--- under 10% of what the machine is capable of --- and dropped from
there as the copy proceeded.  I did not think to check if that was user
or kernel-space CPU, but I imagine it has to be the latter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type

2017-10-02 Thread Alexander Korotkov

On Tue, Oct 3, 2017 at 12:20 AM, Maksim Milyutin 
wrote:

> I have tested the following case:
>
> create type pair as (x int, y int);
> prepare test as select json_populate_record(null::pair, '{"x": 1, "y":
> 2}'::json);
> drop type pair cascade;
>
> execute test;
>
> -- The following output is obtained before patch
> ERROR:  cache lookup failed for type 16419
>
> -- After applying patch
> ERROR:  type "pair" does not exist
>
> But after recreating 'pair' type I'll get the following message:
> ERROR:  cached plan must not change result type
>
> I don't know whether it's right behavior. Anyhow your point is a good
> motivation to experiment and investigate different scenarios of work with
> cached plan that depends on non-stable type. Thanks for that.
>

I think ideally, cached plan should be automatically invalidated and stored
procedure should work without error.
Not really sure if it's feasible...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Andres Freund

Hi,

On 2017-10-02 18:33:17 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > To demonstrate what I'm observing here, on linux with a fairly fast ssd:
> > ...
> 
> I tried to replicate this test as closely as I could on the Mac hardware
> I have laying about.

Thanks!

> I only bothered with the synchronous_commit=off case, though, since
> you say that shows the worst effects.

That's the case on linux at least, but I'd suspect it's a much more
general thing - you just can't get that much data dirty with pgbench
otherwise.


> I'm kind of surprised that machine B doesn't show obvious tanking in this
> test given that msync() makes it suck so badly at copying a database.
> I wonder what is different from the kernel's standpoint ... maybe the
> sheer number of different files mmap'd by a single process during the
> copy?

Yea, that's curious. I've really no clue about OSX, so pardon my
question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu
blocked?


> If we could arrange to not use pg_flush_after in copydir.c on macOS,
> I'd be okay with leaving it alone for the configurable flush_after
> calls.  But I can't think of a way to do that that wouldn't be a
> complete kluge.  I don't much want to do
> 
> +#ifndef __darwin__
>   pg_flush_data(dstfd, offset, nbytes);
> +#endif
> 
> but I don't see any better alternative ...

What we could do is introduce a guc (~create_database_flush_data) that
controls whether we flush here, and have the defaults set differently on
OSX. Not sure if that's actually any better.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] list of credits for release notes

2017-10-02 Thread Bruce Momjian

On Mon, Oct  2, 2017 at 02:12:50PM -0400, Stephen Frost wrote:
> > How should this be handled for the Postgres 11 release notes?
> 
> Ideally, we would let the individuals choose how to be recognized in
> release notes, and anywhere else we recognize them.  We have the start
> of that in https://postgresql.org/account/profile but that isn't very
> easily tied to things in the commit history or elsewhere, yet.  I'd
> suggest that we try to improve on that by:

My smaller question is how will this list be generated in PG 11?  From
the commit log when the release notes are created, or some other method?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane

Andres Freund  writes:
> To demonstrate what I'm observing here, on linux with a fairly fast ssd:
> ...

I tried to replicate this test as closely as I could on the Mac hardware
I have laying about.  I only bothered with the synchronous_commit=off
case, though, since you say that shows the worst effects.  I used the
same parameters you did and the same pgbench settings.  I attach the
pgbench output for six cases, flush_after disabled or enabled on three
different machines:

(A) 2016 MacBook Pro, 2.7GHz i7 + SSD, Sierra, HFS+ file system
(B) 2013 MacBook Pro, 2.3GHz i7 + SSD, High Sierra, APFS file system
(C) 2012 Mac Mini, 2.3GHz i7 + 5400-RPM SATA, High Sierra, HFS+ file system

There is some benefit on the SSD machines, but it's in the range of a
few percent --- clearly, these kernels are not as subject to the basic
I/O-scheduling problem as Linux is.  The spinning-rust machine shows a
nice gain in overall TPS with flush enabled, but it's actually a bit
worse off in terms of the worst-case slowdown --- note that only that
case shows things coming to a complete halt.  It'd be interesting to
check the behavior of a pre-High-Sierra kernel with spinning rust,
but I don't have any remotely modern machine answering that description.

I'm kind of surprised that machine B doesn't show obvious tanking in this
test given that msync() makes it suck so badly at copying a database.
I wonder what is different from the kernel's standpoint ... maybe the
sheer number of different files mmap'd by a single process during the
copy?

> What I'm basically wondering is whether we're screwing somebody over
> that made the effort to manually configure this on OSX. It's fairly
> obvious we need to find a way to disable the msync() by default.

I suspect that anybody who cares about DB performance on macOS will
be running it on SSD-based hardware these days.  The benefit seen on
the Mac Mini would have been worth the trouble of a custom configuration
a few years ago, but I'm dubious that it matters in the real world
anymore.

If we could arrange to not use pg_flush_after in copydir.c on macOS,
I'd be okay with leaving it alone for the configurable flush_after
calls.  But I can't think of a way to do that that wouldn't be a
complete kluge.  I don't much want to do

+#ifndef __darwin__
pg_flush_data(dstfd, offset, nbytes);
+#endif

but I don't see any better alternative ...

regards, tom lane

progress: 1.0 s, 8132.9 tps, lat 0.978 ms stddev 0.434
progress: 2.0 s, 8841.0 tps, lat 0.905 ms stddev 0.260
progress: 3.0 s, 9020.1 tps, lat 0.887 ms stddev 0.418
progress: 4.0 s, 9005.9 tps, lat 0.888 ms stddev 0.353
progress: 5.0 s, 9167.1 tps, lat 0.873 ms stddev 0.259
progress: 6.0 s, 9248.1 tps, lat 0.865 ms stddev 0.333
progress: 7.0 s, 9295.0 tps, lat 0.861 ms stddev 0.325
progress: 8.0 s, 9435.0 tps, lat 0.848 ms stddev 0.190
progress: 9.0 s, 9453.0 tps, lat 0.846 ms stddev 0.261
progress: 10.0 s, 9717.1 tps, lat 0.823 ms stddev 0.230
progress: 11.0 s, 9658.8 tps, lat 0.828 ms stddev 0.179
progress: 12.0 s, 9332.8 tps, lat 0.857 ms stddev 0.110
progress: 13.0 s, 9430.2 tps, lat 0.848 ms stddev 0.272
progress: 14.0 s, 9479.0 tps, lat 0.844 ms stddev 0.143
progress: 15.0 s, 9381.0 tps, lat 0.853 ms stddev 0.132
progress: 16.0 s, 9464.8 tps, lat 0.845 ms stddev 0.260
progress: 17.0 s, 9563.3 tps, lat 0.836 ms stddev 0.183
progress: 18.0 s, 9646.0 tps, lat 0.829 ms stddev 0.138
progress: 19.0 s, 9554.0 tps, lat 0.837 ms stddev 0.146
progress: 20.0 s, 9296.0 tps, lat 0.861 ms stddev 0.213
progress: 21.0 s, 9255.9 tps, lat 0.864 ms stddev 0.174
progress: 22.0 s, 9224.1 tps, lat 0.867 ms stddev 0.163
progress: 23.0 s, 9381.2 tps, lat 0.853 ms stddev 0.226
progress: 24.0 s, 9263.9 tps, lat 0.864 ms stddev 0.225
progress: 25.0 s, 9320.8 tps, lat 0.858 ms stddev 0.226
progress: 26.0 s, 9538.8 tps, lat 0.837 ms stddev 0.185
progress: 27.0 s, 9588.3 tps, lat 0.836 ms stddev 0.192
progress: 28.0 s, 9720.0 tps, lat 0.823 ms stddev 0.135
progress: 29.0 s, 9958.0 tps, lat 0.803 ms stddev 0.121
progress: 30.0 s, 9080.1 tps, lat 0.881 ms stddev 0.249
progress: 31.0 s, 8969.8 tps, lat 0.892 ms stddev 0.334
progress: 32.0 s, 9059.9 tps, lat 0.883 ms stddev 0.210
progress: 33.0 s, 9052.2 tps, lat 0.884 ms stddev 0.308
progress: 34.0 s, 8738.9 tps, lat 0.915 ms stddev 0.313
progress: 35.0 s, 8969.0 tps, lat 0.891 ms stddev 0.484
progress: 36.0 s, 8959.8 tps, lat 0.893 ms stddev 0.530
progress: 37.0 s, 8943.3 tps, lat 0.895 ms stddev 0.486
progress: 38.0 s, 8593.7 tps, lat 0.930 ms stddev 0.699
progress: 39.0 s, 8715.1 tps, lat 0.919 ms stddev 0.661
progress: 40.0 s, 8990.3 tps, lat 0.890 ms stddev 0.366
progress: 41.0 s, 8940.8 tps, lat 0.895 ms stddev 0.459
progress: 42.0 s, 9226.0 tps, lat 0.867 ms stddev 0.278
progress: 43.0 s, 9149.2 tps, lat 0.874 ms stddev 0.353
progress: 44.0 s, 9252.0 tps, lat 0.865 ms stddev 0.215
progress: 45.0 s, 7473.9 tps, lat 1.070 ms stddev 2.826
progress: 46.0 s,

Re: [HACKERS] Commitfest 201709 is now closed

2017-10-02 Thread Michael Paquier

On Tue, Oct 3, 2017 at 1:23 AM, Alexander Korotkov
 wrote:
> On Mon, Oct 2, 2017 at 6:57 PM, Tom Lane  wrote:
>>
>> Daniel Gustafsson  writes:
>> > Thanks to everyone who participated, and to everyone who have responded
>> > to my
>> > nagging via the CF app email function. This is clearly an awesome
>> > community.
>>
>> And thanks to you for your hard work as CFM!  That's tedious and
>> largely thankless work, but it's needed to keep things moving.
>
> +1,
> Thank you very much, Daniel!  It was a pleasure working with you at
> commitfest.

Thanks for doing a bunch of work, Daniel. This is a difficult task.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Binary search in fmgr_isbuiltin() is a bottleneck.

2017-10-02 Thread Andres Freund

On 2017-10-02 17:57:51 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Done that way. It's a bit annoying, because we've to take care to
> > initialize the "unused" part of the array with a valid signalling it's
> > an unused mapping. Can't use 0 for that because fmgr_builtins[0] is a
> > valid entry.
> 
> The prototype code I posted further upthread just used -1 as the "unused"
> marker. There's no reason the array can't be int16 rather than uint16,
> and "if (index < 0)" is probably a faster test anyway.

Right, but whether we use -1 or UINT16_MAX or such doesn't matter. The
relevant bit is that we can't use 0, so we can't rely on the rest of the
array being zero initialized, but instead of to initialize all of it
explicitly.  I've no real feelings about using -1 or UINT16_MAX - I'd be
very surprised if there's any sort of meaningful performance difference.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Binary search in fmgr_isbuiltin() is a bottleneck.

2017-10-02 Thread Tom Lane

Andres Freund  writes:
> Done that way. It's a bit annoying, because we've to take care to
> initialize the "unused" part of the array with a valid signalling it's
> an unused mapping. Can't use 0 for that because fmgr_builtins[0] is a
> valid entry.

The prototype code I posted further upthread just used -1 as the "unused"
marker.  There's no reason the array can't be int16 rather than uint16,
and "if (index < 0)" is probably a faster test anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-02 Thread Tom Lane

I wrote:
> If this is the only problem then I'd agree we should stick to a spinlock
> (I assume the strings in question can't be very long).  I was thinking
> more about what to do if we find other violations that are harder to fix.

I took a quick look through walreceiver.c, and there are a couple of
obvious problems of the same ilk in WalReceiverMain, which is doing this:

walrcv->lastMsgSendTime = walrcv->lastMsgReceiptTime = 
walrcv->latestWalEndTime = GetCurrentTimestamp();

(ie, a potential kernel call) inside a spinlock.  But there seems no
real problem with just collecting the timestamp before we enter that
critical section.

I also don't especially like the fact that just above there it reaches
elog(PANIC) with the lock still held, though at least that's a case that
should never happen.

Further down, it's doing a pfree() inside the spinlock, apparently
for no other reason than to save one "if (tmp_conninfo)".

I don't especially like the Asserts inside spinlocks, either.  Personally,
I think if those conditions are worth testing then they're worth testing
for real (in production).  Variables that are manipulated by multiple
processes are way more likely to assume unexpected states than local
variables.

I'm also rather befuddled by the fact that this code sets and clears
walrcv->latch outside the critical sections.  If that field is used
by any other process, surely that's completely unsafe.  If it isn't,
why is it being kept in shared memory?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Binary search in fmgr_isbuiltin() is a bottleneck.

2017-10-02 Thread Andres Freund

On 2017-09-28 19:06:27 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-09-28 18:52:28 -0400, Tom Lane wrote:
> >> Uh, what?  Access to fmgr_nbuiltins shouldn't be part of any critical path
> >> anymore after this change.
> 
> > Indeed. But the size of the the oid -> fmgr_builtins index array is
> > relevant now. We could of course just make that dependent on
> > FirstBootstrapObjectId, but that'd waste some memory.
> 
> Not enough to notice, considering there are pg_proc OIDs up in the 8K
> range already.  We blow 2KB of never-accessed space for far less good
> reason than this.

Done that way. It's a bit annoying, because we've to take care to
initialize the "unused" part of the array with a valid signalling it's
an unused mapping. Can't use 0 for that because fmgr_builtins[0] is a
valid entry. We could introduce a dummy element at that position, but
that doesn't really seem nice either.  Therefore the first attached
commit moves find_defined_symbol from genbki to Catalog.pm, so we can
easily extract FirstBootstrapObjectId in Gen_fmgrtab.pl.

Greetings,

Andres Freund
>From 16e35356cead73291d676764072abfebc2efa79b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 2 Oct 2017 14:14:42 -0700
Subject: [PATCH 1/2] Move genbki.pl's find_defined_symbol to Catalog.pm.

Will be used in Gen_fmgrtab.pl in a followup commit.
---
 src/backend/catalog/Catalog.pm | 35 ++-
 src/backend/catalog/genbki.pl  | 34 --
 src/backend/utils/Makefile |  2 +-
 3 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 7abfda3d3a..54f83533b6 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -19,7 +19,7 @@ use warnings;
 require Exporter;
 our @ISA   = qw(Exporter);
 our @EXPORT= ();
-our @EXPORT_OK = qw(Catalogs SplitDataLine RenameTempFile);
+our @EXPORT_OK = qw(Catalogs SplitDataLine RenameTempFile FindDefinedSymbol);
 
 # Call this function with an array of names of header files to parse.
 # Returns a nested data structure describing the data in the headers.
@@ -252,6 +252,39 @@ sub RenameTempFile
 	rename($temp_name, $final_name) || die "rename: $temp_name: $!";
 }
 
+
+# Find a symbol defined in a particular header file and extract the value.
+#
+# The include path has to be passed as a reference to an array.
+sub FindDefinedSymbol
+{
+	my ($catalog_header, $include_path, $symbol) = @_;
+
+	for my $path (@$include_path)
+	{
+
+		# Make sure include path ends in a slash.
+		if (substr($path, -1) ne '/')
+		{
+			$path .= '/';
+		}
+		my $file = $path . $catalog_header;
+		next if !-f $file;
+		open(my $find_defined_symbol, '<', $file) || die "$file: $!";
+		while (<$find_defined_symbol>)
+		{
+			if (/^#define\s+\Q$symbol\E\s+(\S+)/)
+			{
+return $1;
+			}
+		}
+		close $find_defined_symbol;
+		die "$file: no definition found for $symbol\n";
+	}
+	die "$catalog_header: not found in any include directory\n";
+}
+
+
 # verify the number of fields in the passed-in DATA line
 sub check_natts
 {
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 2eebb061b7..256a9c9c93 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -87,9 +87,11 @@ open my $shdescr, '>', $shdescrfile . $tmpext
 # NB: make sure that the files used here are known to be part of the .bki
 # file's dependencies by src/backend/catalog/Makefile.
 my $BOOTSTRAP_SUPERUSERID =
-  find_defined_symbol('pg_authid.h', 'BOOTSTRAP_SUPERUSERID');
+  Catalog::FindDefinedSymbol('pg_authid.h', \@include_path,
+			 'BOOTSTRAP_SUPERUSERID');
 my $PG_CATALOG_NAMESPACE =
-  find_defined_symbol('pg_namespace.h', 'PG_CATALOG_NAMESPACE');
+  Catalog::FindDefinedSymbol('pg_namespace.h', \@include_path,
+			 'PG_CATALOG_NAMESPACE');
 
 # Read all the input header files into internal data structures
 my $catalogs = Catalog::Catalogs(@input_files);
@@ -500,34 +502,6 @@ sub emit_schemapg_row
 	return $row;
 }
 
-# Find a symbol defined in a particular header file and extract the value.
-sub find_defined_symbol
-{
-	my ($catalog_header, $symbol) = @_;
-	for my $path (@include_path)
-	{
-
-		# Make sure include path ends in a slash.
-		if (substr($path, -1) ne '/')
-		{
-			$path .= '/';
-		}
-		my $file = $path . $catalog_header;
-		next if !-f $file;
-		open(my $find_defined_symbol, '<', $file) || die "$file: $!";
-		while (<$find_defined_symbol>)
-		{
-			if (/^#define\s+\Q$symbol\E\s+(\S+)/)
-			{
-return $1;
-			}
-		}
-		close $find_defined_symbol;
-		die "$file: no definition found for $symbol\n";
-	}
-	die "$catalog_header: not found in any include directory\n";
-}
-
 sub usage
 {
 	die <

[HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

2017-10-02 Thread Andreas Seltenreich

Hi,

doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced
a couple of parallel worker core dumps with the backtrace below.
Although most of the backtrace is inside the dynamic linker, it looks
like it was passed a pointer to gone-away shared memory.

regards,
Andreas

Core was generated by `postgres: bgworker: parallel worker for PID 24326
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x7f5184852a36 in fillin_rpath (rpath=, 
rpath@entry=0x55b692f0d360 "/home/smith/postgres/inst/master/lib", 
result=result@entry=0x55b692f1b380, sep=sep@entry=0x7f5184868060 ":", 
check_trusted=check_trusted@entry=0, what=what@entry=0x7f51848683bd "RUNPATH", 
where=where@entry=0x55b692f2d2f0 
"/home/smith/postgres/inst/master/lib/pgcrypto.so", l=0x55b692f2d330) at 
dl-load.c:444
#2  0x7f5184852daf in decompose_rpath (sps=sps@entry=0x55b692f2d6d8, 
rpath=, l=l@entry=0x55b692f2d330, what=what@entry=0x7f51848683bd 
"RUNPATH") at dl-load.c:618
#3  0x7f5184852ef7 in cache_rpath (l=l@entry=0x55b692f2d330, 
sp=sp@entry=0x55b692f2d6d8, tag=tag@entry=29, what=what@entry=0x7f51848683bd 
"RUNPATH") at dl-load.c:652
#4  0x7f5184853c62 in cache_rpath (what=0x7f51848683bd "RUNPATH", tag=29, 
sp=0x55b692f2d6d8, l=0x55b692f2d330) at dl-load.c:2307
#5  _dl_map_object (loader=0x55b692f2d330, name=0x7f517f300cc3 "libz.so.1", 
type=2, trace_mode=0, mode=, nsid=) at 
dl-load.c:2314
#6  0x7f5184857e70 in openaux (a=a@entry=0x7ffd4f686130) at dl-deps.c:63
#7  0x7f518485a4f4 in _dl_catch_error 
(objname=objname@entry=0x7ffd4f686128, 
errstring=errstring@entry=0x7ffd4f686120, 
mallocedp=mallocedp@entry=0x7ffd4f68611f, operate=operate@entry=0x7f5184857e40 
, args=args@entry=0x7ffd4f686130) at dl-error.c:187
#8  0x7f51848580df in _dl_map_object_deps (map=map@entry=0x55b692f2d330, 
preloads=preloads@entry=0x0, npreloads=npreloads@entry=0, 
trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648) at 
dl-deps.c:254
#9  0x7f518485ea02 in dl_open_worker (a=a@entry=0x7ffd4f6863c0) at 
dl-open.c:280
#10 0x7f518485a4f4 in _dl_catch_error 
(objname=objname@entry=0x7ffd4f6863b0, 
errstring=errstring@entry=0x7ffd4f6863b8, 
mallocedp=mallocedp@entry=0x7ffd4f6863af, operate=operate@entry=0x7f518485e8f0 
, args=args@entry=0x7ffd4f6863c0) at dl-error.c:187
#11 0x7f518485e489 in _dl_open (file=0x55b692f2d2b0 
"/home/smith/postgres/inst/master/lib/pgcrypto.so", mode=-2147483390, 
caller_dlopen=0x55b691cb4c7e , nsid=-2, 
argc=, argv=, env=0x55b692eef880) at dl-open.c:660
#12 0x7f5184020ee9 in dlopen_doit (a=a@entry=0x7ffd4f6865f0) at dlopen.c:66
#13 0x7f518485a4f4 in _dl_catch_error (objname=0x55b692eef6d0, 
errstring=0x55b692eef6d8, mallocedp=0x55b692eef6c8, operate=0x7f5184020e90 
, args=0x7ffd4f6865f0) at dl-error.c:187
#14 0x7f5184021521 in _dlerror_run (operate=operate@entry=0x7f5184020e90 
, args=args@entry=0x7ffd4f6865f0) at dlerror.c:163
#15 0x7f5184020f82 in __dlopen (file=, mode=mode@entry=258) 
at dlopen.c:87
#16 0x55b691cb4c7e in internal_load_library 
(libname=libname@entry=0x7f51848be7f8 ) at dfmgr.c:231
#17 0x55b691cb5928 in RestoreLibraryState (start_address=0x7f51848be7f8 
) at dfmgr.c:754
#18 0x55b6919459d9 in ParallelWorkerMain (main_arg=) at 
parallel.c:1030
#19 0x55b691b23746 in StartBackgroundWorker () at bgworker.c:835
#20 0x55b691b2faf5 in do_start_bgworker (rw=0x55b692f0e050) at 
postmaster.c:5680
#21 maybe_start_bgworkers () at postmaster.c:5884
#22 0x55b691b305c8 in sigusr1_handler (postgres_signal_arg=) 
at postmaster.c:5073
#23 
#24 0x7f5183a5f273 in __select_nocancel () at 
../sysdeps/unix/syscall-template.S:84
#25 0x55b6918b8c0b in ServerLoop () at postmaster.c:1717
#26 0x55b691b31c65 in PostmasterMain (argc=3, argv=0x55b692eea5f0) at 
postmaster.c:1361
#27 0x55b6918bac4d in main (argc=3, argv=0x55b692eea5f0) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-02 Thread Tom Lane

Andres Freund  writes:
> On 2017-10-02 17:30:25 -0400, Tom Lane wrote:
>> Or replace the spinlock with an LWLock?

> That'd probably be a good idea, but I'm loathe to do so in the back
> branches. Not at this callsite, but some others, there's some potential
> for contention.

If this is the only problem then I'd agree we should stick to a spinlock
(I assume the strings in question can't be very long).  I was thinking
more about what to do if we find other violations that are harder to fix.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-02 Thread Tom Lane

Peter Geoghegan  writes:
> You need to change the SQL interface as well, although I'm not sure
> exactly how. The problem is that you are now passing a uint64 queryId
> to Int64GetDatumFast() within pg_stat_statements_internal(). That
> worked when queryId was a uint32, because you can easily represent
> values <= UINT_MAX as an int64/int8. However, you cannot represent the
> second half of the range of uint64 within a int64/int8. I think that
> this will behave different depending on USE_FLOAT8_BYVAL, if nothing
> else.

Maybe intentionally drop the high-order bit, so that it's a 63-bit ID?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-02 Thread Andres Freund

On 2017-10-02 17:30:25 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Yes, that'd be a bad idea. It's not great to have memcpys in a critical
> > section, but it's way better than pallocs. So we need to use some local
> > buffers that this get copied to.
> 
> Or replace the spinlock with an LWLock?

That'd probably be a good idea, but I'm loathe to do so in the back
branches. Not at this callsite, but some others, there's some potential
for contention.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-02 Thread Tom Lane

Andres Freund  writes:
> On 2017-10-02 22:56:49 +0200, Andreas Seltenreich wrote:
>> low-memory testing with REL_10_STABLE at 1f19550a87 produced the
>> following PANIC:
>> stuck spinlock detected at pg_stat_get_wal_receiver, walreceiver.c:1397

> Ugh.

Egad.

> Yes, that'd be a bad idea. It's not great to have memcpys in a critical
> section, but it's way better than pallocs. So we need to use some local
> buffers that this get copied to.

Or replace the spinlock with an LWLock?  In any case, I think it would be
a good idea to look at every other critical section touching that lock
to see if there are any other blatant coding-rule violations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type

2017-10-02 Thread Maksim Milyutin


Hi, Alexander!

Thanks for the comments.


02.10.17 20:02, Alexander Korotkov wrote:
Please, register this patch at upcoming commitfest to ensure it 
wouldn't be forgotten.
Regression test changes (both .sql and .out) are essential parts of 
the patch.  I see no point in posting them separately.  Please, 
incorporate them into your patch.


OK, I'll take your advice.

Did you check this patch with manually created composite type (made by 
CREATE TYPE typname AS ...)?

I have tested the following case:

create type pair as (x int, y int);
prepare test as select json_populate_record(null::pair, '{"x": 1, "y": 
2}'::json);

drop type pair cascade;

execute test;

-- The following output is obtained before patch
ERROR:  cache lookup failed for type 16419

-- After applying patch
ERROR:  type "pair" does not exist

But after recreating 'pair' type I'll get the following message:
ERROR:  cached plan must not change result type

I don't know whether it's right behavior. Anyhow your point is a good 
motivation to experiment and investigate different scenarios of work 
with cached plan that depends on non-stable type. Thanks for that.


--
Regards,
Maksim Milyutin

Re: [HACKERS] 64-bit queryId?

2017-10-02 Thread Peter Geoghegan

On Mon, Oct 2, 2017 at 9:10 AM, Robert Haas  wrote:
> On Mon, Oct 2, 2017 at 11:02 AM, Joshua D. Drake  
> wrote:
>> +1 to both of these as well.
>
> OK, so here's a patch.  Review appreciated.

You need to change the SQL interface as well, although I'm not sure
exactly how. The problem is that you are now passing a uint64 queryId
to Int64GetDatumFast() within pg_stat_statements_internal(). That
worked when queryId was a uint32, because you can easily represent
values <= UINT_MAX as an int64/int8. However, you cannot represent the
second half of the range of uint64 within a int64/int8. I think that
this will behave different depending on USE_FLOAT8_BYVAL, if nothing
else.

The background here is that we used int8 as the output type for the
function when queryId was first exposed via the SQL interface because
there was no 32-bit unsigned int type that we could have used (except
possibly Oid, but that's weird). You see the same pattern in a couple
of other places.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] list of credits for release notes

2017-10-02 Thread Tatsuo Ishii

> On Wed, Sep 27, 2017 at 8:29 PM, Michael Paquier
>  wrote:
>> Looking at this list, the first name is followed by the family name,
>> so there are inconsistencies with some Japanese names:
>> - Fujii Masao should be Masao Fujii.
>> - KaiGai Kohei should be Kohei Kaigai.
> 
> But his emails say Fujii Masao, not Masao Fujii.

Michael is correct.

Sometimes people choose family name first in the emails.  However I am
sure "Fujii" is the family name and "Masao" is the firstname.

> KaiGai's case is a bit trickier, as his email name has changed over time.

Michael is correct about Kaigai too.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-02 Thread Andres Freund

On 2017-10-02 22:56:49 +0200, Andreas Seltenreich wrote:
> Hi,
> 
> low-memory testing with REL_10_STABLE at 1f19550a87 produced the
> following PANIC:
> 
> stuck spinlock detected at pg_stat_get_wal_receiver, walreceiver.c:1397

Ugh.

> I was about to wrap the pstrdup()s with a PG_TRY block, but I can't find
> a spinlock being released in a PG_CATCH block anywhere, so maybe that's
> a bad idea?

Yes, that'd be a bad idea. It's not great to have memcpys in a critical
section, but it's way better than pallocs. So we need to use some local
buffers that this get copied to.

This seems to have been introduced as part of b1a9bad9e74 and then
9ed551e0a4f.  Authors CCed.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-02 Thread Thomas Munro

On Tue, Oct 3, 2017 at 9:56 AM, Andreas Seltenreich  wrote:
> low-memory testing with REL_10_STABLE at 1f19550a87 produced the
> following PANIC:
>
> stuck spinlock detected at pg_stat_get_wal_receiver, walreceiver.c:1397
>
> I was about to wrap the pstrdup()s with a PG_TRY block, but I can't find
> a spinlock being released in a PG_CATCH block anywhere, so maybe that's
> a bad idea?

No comment on what might be holding the spinlock there, but perhaps
the spinlock-protected code should strncpy into stack-local buffers
instead of calling pstrdup()?  The buffers could be statically sized
with NAMEDATALEN and MAXCONNINFO.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-02 Thread Andreas Seltenreich

Hi,

low-memory testing with REL_10_STABLE at 1f19550a87 produced the
following PANIC:

stuck spinlock detected at pg_stat_get_wal_receiver, walreceiver.c:1397

I was about to wrap the pstrdup()s with a PG_TRY block, but I can't find
a spinlock being released in a PG_CATCH block anywhere, so maybe that's
a bad idea?

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-02 Thread Gavin Flower


On 03/10/17 04:02, Joshua D. Drake wrote:

On 10/01/2017 04:22 PM, Robert Haas wrote:

On Sun, Oct 1, 2017 at 3:48 PM, Greg Stark  wrote:

Well these kinds of monitoring systems tend to be used by operations
people who are a lot more practical and a lot less worried about
theoretical concerns like that.


+1, well said.


In context the point was merely that the default
pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
values are enough. It wouldn't be hard for there to be 64k different
queries over time and across all the databases in a fleet and at that
point it becomes likely there'll be a 32-bit collision.


Yeah.

I think Alexander Korotkov's points are quite good, too.



+1 to both of these as well.

jD


Did a calculation:

# probability of collision
54561        0.43
54562    0.55

Essentially, you hit a greater than 50% chance of a collision before you 
get to 55 thousand statements.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Andres Freund

On 2017-10-02 15:59:05 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-10-02 15:54:43 -0400, Tom Lane wrote:
> >> Should I expect there to be any difference at all?  We don't enable
> >> *_flush_after by default on non-Linux platforms.
> 
> > Right, you'd have to enable that. But your patch would neuter an
> > intentionally enabled config too, no?
> 
> Well, if you want to suggest a specific scenario to try, I'm happy to.
> I am not going to guess as to what will satisfy you.

To demonstrate what I'm observing here, on linux with a fairly fast ssd:

with:
-c autovacuum_analyze_threshold=2147483647 # to avoid analyze snapshot issue
-c fsync=on
-c synchronous_commit=on
-c shared_buffers=4GB
-c max_wal_size=30GB
-c checkpoint_timeout=30s
-c checkpoint_flush_after=0
-c bgwriter_flush_after=0
and
pgbench -i -s 100 -q

a pgbench -M prepared -c 8 -j 8 -n -P1 -T 100
often has periods like:

synchronous_commit = on:
progress: 73.0 s, 395.0 tps, lat 20.029 ms stddev 4.001
progress: 74.0 s, 289.0 tps, lat 23.730 ms stddev 23.337
progress: 75.0 s, 88.0 tps, lat 104.029 ms stddev 178.038
progress: 76.0 s, 400.0 tps, lat 20.055 ms stddev 4.844
latency average = 21.599 ms
latency stddev = 13.865 ms
tps = 370.346506 (including connections establishing)
tps = 370.372550 (excluding connections establishing)

with synchronous_commit=off those periods are a lot worse:
progress: 57.0 s, 21104.3 tps, lat 0.379 ms stddev 0.193
progress: 58.0 s, 9994.1 tps, lat 0.536 ms stddev 3.140
progress: 59.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 60.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 61.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 62.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 63.0 s, 3319.6 tps, lat 12.860 ms stddev 253.664
progress: 64.0 s, 20997.0 tps, lat 0.381 ms stddev 0.190
progress: 65.0 s, 20409.1 tps, lat 0.392 ms stddev 0.303
...
latency average = 0.745 ms
latency stddev = 20.470 ms
tps = 10743.53 (including connections establishing)
tps = 10743.815591 (excluding connections establishing)

contrasting that to checkpoint_flush_after=256kB and
bgwriter_flush_after=512kB:

synchronous_commit=on
worst:
progress: 87.0 s, 298.0 tps, lat 26.874 ms stddev 26.691

latency average = 21.898 ms
latency stddev = 6.416 ms
tps = 365.308180 (including connections establishing)
tps = 365.318793 (excluding connections establishing)

synchronous_commit=on

worst:

progress: 30.0 s, 7026.8 tps, lat 1.137 ms stddev 11.070

latency average = 0.550 ms
latency stddev = 5.599 ms
tps = 14547.842213 (including connections establishing)
tps = 14548.325102 (excluding connections establishing)

If you do the same on rotational disks, the stall periods can get a
*lot* worse (multi-minute stalls with pretty much no activity).


What I'm basically wondering is whether we're screwing somebody over
that made the effort to manually configure this on OSX. It's fairly
obvious we need to find a way to disable the msync() by default.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane

Andres Freund  writes:
> On 2017-10-02 15:54:43 -0400, Tom Lane wrote:
>> Should I expect there to be any difference at all?  We don't enable
>> *_flush_after by default on non-Linux platforms.

> Right, you'd have to enable that. But your patch would neuter an
> intentionally enabled config too, no?

Well, if you want to suggest a specific scenario to try, I'm happy to.
I am not going to guess as to what will satisfy you.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Andres Freund

On 2017-10-02 15:54:43 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-10-02 15:42:25 -0400, Tom Lane wrote:
> >> I experimented with this further by seeing whether the msync() code path
> >> is of any value on Sierra either.  The answer seems to be "no": cloning
> >> a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
> >> laptop using unmodified HEAD, but if I dike out the msync() logic then
> >> it takes 16-17 seconds.  Both numbers jump around a little, but using
> >> msync is strictly worse.
> 
> > Well, that's only measuring one type of workload. Could you run a normal
> > pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the
> > latency differences are?
> 
> Should I expect there to be any difference at all?  We don't enable
> *_flush_after by default on non-Linux platforms.

Right, you'd have to enable that. But your patch would neuter an
intentionally enabled config too, no?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane

Andres Freund  writes:
> On 2017-10-02 15:42:25 -0400, Tom Lane wrote:
>> I experimented with this further by seeing whether the msync() code path
>> is of any value on Sierra either.  The answer seems to be "no": cloning
>> a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
>> laptop using unmodified HEAD, but if I dike out the msync() logic then
>> it takes 16-17 seconds.  Both numbers jump around a little, but using
>> msync is strictly worse.

> Well, that's only measuring one type of workload. Could you run a normal
> pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the
> latency differences are?

Should I expect there to be any difference at all?  We don't enable
*_flush_after by default on non-Linux platforms.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Andres Freund

On 2017-10-02 15:42:25 -0400, Tom Lane wrote:
> I wrote:
> > In short, therefore, APFS cannot cope with the way we're using msync().
> 
> I experimented with this further by seeing whether the msync() code path
> is of any value on Sierra either.  The answer seems to be "no": cloning
> a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
> laptop using unmodified HEAD, but if I dike out the msync() logic then
> it takes 16-17 seconds.  Both numbers jump around a little, but using
> msync is strictly worse.

Well, that's only measuring one type of workload. Could you run a normal
pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the
latency differences are?


> I propose therefore that an appropriate fix is to unconditionally disable
> the msync code path on Darwin, as we have already done for Windows.  When
> and if Apple changes their kernel so that this path is actually of some
> value, we can figure out how to detect whether to use it.

I'm inclined to think you're right.

This is a surprisingly massive regression for a new OS release - we're
not the only database that uses msync...


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Brent Dearth

Thanks for this breakdown Tom!

FWIW - I'm on Postgres 9.6.5 as bundled with Postgres.app (2.0.5) running
on 2013 MBP (2.7GHz i7 / 16GB / SSD) setup. It looks like this might be a
priority for an upcoming release, so I might try to hold out for
downstream, but thanks for the patch. It will help if we need get custom
builds out to fellow devs if this becomes too unbearable.

On Mon, Oct 2, 2017 at 1:42 PM, Tom Lane  wrote:

> I wrote:
> > In short, therefore, APFS cannot cope with the way we're using msync().
>
> I experimented with this further by seeing whether the msync() code path
> is of any value on Sierra either.  The answer seems to be "no": cloning
> a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
> laptop using unmodified HEAD, but if I dike out the msync() logic then
> it takes 16-17 seconds.  Both numbers jump around a little, but using
> msync is strictly worse.
>
> I propose therefore that an appropriate fix is to unconditionally disable
> the msync code path on Darwin, as we have already done for Windows.  When
> and if Apple changes their kernel so that this path is actually of some
> value, we can figure out how to detect whether to use it.
>
> The msync logic seems to date back to this thread:
>
> https://www.postgresql.org/message-id/flat/alpine.DEB.2.
> 10.150601132.28433%40sto
>
> wherein Andres opined
> >> I think this patch primarily needs:
> >> * Benchmarking on FreeBSD/OSX to see whether we should enable the
> >>   mmap()/msync(MS_ASYNC) method by default. Unless somebody does so, I'm
> >>   inclined to leave it off till then.
>
> but so far as I can tell from the thread, only testing on FreeBSD ever
> got done.  So there's no evidence that this was ever beneficial on macOS,
> and we now have evidence that it's between counterproductive and
> catastrophic depending on which kernel version you look at.
>
> regards, tom lane
>

Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list

2017-10-02 Thread Robert Haas

On Mon, Oct 2, 2017 at 12:44 PM, Tom Lane  wrote:
>> And I'm saying - that argument is bogus.  Regardless of what people
>> want or what we have historically done in the case where the
>> record/row is the only INTO target, when there are multiple targets it
>> seems clear that they want to match up the query's output columns with
>> the INTO targets 1:1.  So let's just do that.
>
> Arguing that that's what people want (even if I granted your argument,
> which I do not) does not make the inconsistency magically disappear,
> nor will it stop people from being confused by that inconsistency.
> Furthermore, if we do it like this, we will be completely backed into
> a backwards-compatibility corner if someone does come along and say
> "hey, I wish I could do the other thing".

That is not really true.  Even if we define INTO a, b, c as I am
proposing (and Pavel, too, I think), I think we can later define INTO
*a, INTO (a), INTO a ..., INTO a MULTIPLE, INTO a STRANGELY, and INTO
%@!a??ppp#zorp to mean anything we like (unless one or more of those
already have some semantics already, in which case pick something that
doesn't).

If we're really on the fence about which behavior people will want, we
could implement both with a syntax marker for each, say SELECT ...
INTO a #rhaas for the behavior I like and SELECT ... INTO a #tgl_ftw
for the other behavior, and require specifying one of those
decorations.  Maybe that's more or less what you were proposing.  If
we're going to have a default, though, I think it should be the one
you described as "inconsistent with the single row case" rather than
the one you described as "very bug-prone", because I agree with those
characterizations but feel that the latter is a much bigger problem
than the former and, again, not what anybody actually wants.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane

I wrote:
> In short, therefore, APFS cannot cope with the way we're using msync().

I experimented with this further by seeing whether the msync() code path
is of any value on Sierra either.  The answer seems to be "no": cloning
a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
laptop using unmodified HEAD, but if I dike out the msync() logic then
it takes 16-17 seconds.  Both numbers jump around a little, but using
msync is strictly worse.

I propose therefore that an appropriate fix is to unconditionally disable
the msync code path on Darwin, as we have already done for Windows.  When
and if Apple changes their kernel so that this path is actually of some
value, we can figure out how to detect whether to use it.

The msync logic seems to date back to this thread:

https://www.postgresql.org/message-id/flat/alpine.DEB.2.10.150601132.28433%40sto

wherein Andres opined
>> I think this patch primarily needs:
>> * Benchmarking on FreeBSD/OSX to see whether we should enable the
>>   mmap()/msync(MS_ASYNC) method by default. Unless somebody does so, I'm
>>   inclined to leave it off till then.

but so far as I can tell from the thread, only testing on FreeBSD ever
got done.  So there's no evidence that this was ever beneficial on macOS,
and we now have evidence that it's between counterproductive and
catastrophic depending on which kernel version you look at.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

2017-10-02 Thread Nico Williams

So yes, distinguishing stored vs. not stored computed columns is useful,
especially if the expression can refer to other columns of the same row,
though not only then.

Examples:

  -- useful only if stored (assuming these never get updated)
  inserted_at TIMESTAMP WITHOUT TIME ZONE AS (clock_timestamp())

  -- useful only if stored
  uuid uuid AS (uuid_generate_v4())

  -- useful only if stored
  who_done_it TEXT (current_user)

  -- useful especially if not stored
  user_at_host TEXT (user || '@' || host)

  -- useful if stored
  original_user_at_host TEXT (user || '@' || host)

I assume once set, a stored computed column cannot be updated, though
maybe being able to allow this would be ok.

Obviously all of this can be done with triggers and VIEWs...  The most
useful case is where a computed column is NOT stored, because it saves
you having to have a table and a view, while support for the stored case
merely saves you having to have triggers.  Of course, triggers for
computing columns are rather verbose, so not having to write those would
be convenient.

Similarly with RLS.  RLS is not strictly necessary since VIEWs and
TRIGGERs allow one to accomplish much the same results, but it's a lot
of work to get that right, while RLS makes most policies very pithy.
(RLS for *update* policies, however, still can't refer to NEW and OLD,
so one still has to resort to triggers for updates in many cases).

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-10-02 Thread Alexander Korotkov

On Mon, Oct 2, 2017 at 9:11 PM, Andrew Borodin 
wrote:

> On Mon, Oct 2, 2017 at 8:00 PM, Alexander Korotkov
>  wrote:
> > What happen if exactly this "continue" fires?
> >
> >> if (GistFollowRight(stack->page))
> >> {
> >> if (!xlocked)
> >> {
> >> LockBuffer(stack->buffer, GIST_UNLOCK);
> >> LockBuffer(stack->buffer, GIST_EXCLUSIVE);
> >> xlocked = true;
> >> /* someone might've completed the split when we unlocked */
> >> if (!GistFollowRight(stack->page))
> >> continue;
> >
> >
> > In this case we might get xlocked == true without calling
> > CheckForSerializableConflictIn().
> Indeed! I've overlooked it. I'm remembering this issue, we were
> considering not fixing splits if in Serializable isolation, but
> dropped the idea.
>

Yeah, current insert algorithm assumes that split must be fixed before we
can correctly traverse the tree downwards.


> CheckForSerializableConflictIn() must be after every exclusive lock.
>

I'm not sure, that fixing split is the case to necessary call
CheckForSerializableConflictIn().  This lock on leaf page is not taken to
do modification of the page.  It's just taken to ensure that nobody else is
fixing this split the same this.  After fixing the split, it might appear
that insert would go to another page.

> I think it would be rather safe and easy for understanding to more
> > CheckForSerializableConflictIn() directly before gistinserttuple().
> The difference is that after lock we have conditions to change page,
> and before gistinserttuple() we have actual intent to change page.
>
> From the point of future development first version is better (if some
> new calls occasionally spawn in), but it has 3 calls while your
> proposal have 2 calls.
> It seems to me that CheckForSerializableConflictIn() before
> gistinserttuple() is better as for now.
>

Agree.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] generated columns

2017-10-02 Thread Nico Williams

On Mon, Oct 02, 2017 at 02:30:38PM -0400, Tom Lane wrote:
> Nico Williams  writes:
> > On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote:
> >> So for me, i'd rather default to compute on read, as long storing the
> >> pre-computed value is an option when necessary.
> 
> > Sure, I agree.  I was just wondering whether there might be any other
> > difference besides performance characteristics.  The answer to that is,
> > I think, "no".
> 
> What about non-immutable functions in the generation expression?

Aha, thanks!  Yes, that would be noticeable.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

2017-10-02 Thread David Fetter

On Mon, Oct 02, 2017 at 02:30:38PM -0400, Tom Lane wrote:
> Nico Williams  writes:
> > On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote:
> >> So for me, i'd rather default to compute on read, as long storing the
> >> pre-computed value is an option when necessary.
> 
> > Sure, I agree.  I was just wondering whether there might be any other
> > difference besides performance characteristics.  The answer to that is,
> > I think, "no".
> 
> What about non-immutable functions in the generation expression?

Assuming they're permitted, which...well, I could make a case, they
should be mutually exclusive with the cached option.

I guess documenting the behavior in the manual would suffice, tempting
as it would be to include a NOTICE when the table goes from having 0
or more generated columns all of which are immutable to having at
least one that's not.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

2017-10-02 Thread Tom Lane

Nico Williams  writes:
> On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote:
>> So for me, i'd rather default to compute on read, as long storing the
>> pre-computed value is an option when necessary.

> Sure, I agree.  I was just wondering whether there might be any other
> difference besides performance characteristics.  The answer to that is,
> I think, "no".

What about non-immutable functions in the generation expression?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Tom Lane

Brent Dearth  writes:
> I just recently "upgraded" to High Sierra and experiencing horrendous CREATE
> DATABASE performance. Creating a database from a 3G template DB used to
> take ~1m but post-upgrade is taking ~22m at a sustained write of around
> 4MB/s. Occasionally, attempting to create an empty database hangs
> indefinitely as well. When this happens, restarting the Postgres server
> allows empty database initialization in ~1s.

What PG version are you running?

I tried to reproduce this, using HEAD and what I had handy:
(a) Late 2016 MacBook Pro, 2.7GHz i7, still on Sierra
(b) Late 2013 MacBook Pro, 2.3GHz i7, High Sierra, drive is converted to APFS

I made a ~7.5GB test database using "pgbench -i -s 500 bench" and
then cloned it with "create database b2 with template bench".

Case 1: fsync off.
Machine A did the clone in 5.6 seconds, machine B in 12.9 seconds.

Considering the CPU speed difference and the fact that Apple put
significantly faster SSDs into the 2016 models, I'm not sure this
difference is due to anything but better hardware.

Case 2: fsync on.
Machine A did the clone in 7.5 seconds, machine B in 2523.5 sec (42 min!).

So something is badly busted in APFS' handling of fsync, and/or
we're doing it in a bad way.

Interestingly, pg_test_fsync shows only about a factor-of-2 difference
in the timings for regular file fsyncs.  So I poked into non-fsync
logic that we'd added recently, and after awhile found that diking out
the msync code path in pg_flush_data reduces machine B's time to an
entirely reasonable 11.5 seconds.

In short, therefore, APFS cannot cope with the way we're using msync().
I observe that the copy gets slower and slower as it runs (watching the
transfer rate with "iostat 1"), which makes me wonder if there's some
sort of O(N^2) issue in the kernel logic for this.  But anyway, as
a short-term workaround you might try

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index b0c174284b..af35de5f7d 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -451,7 +451,7 @@ pg_flush_data(int fd, off_t offset, off_t nbytes)
return;
}
 #endif
-#if !defined(WIN32) && defined(MS_ASYNC)
+#if 0
{
void   *p;
static int  pagesize = 0;

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] list of credits for release notes

2017-10-02 Thread Stephen Frost

Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Sep 29, 2017 at 12:00:05PM -0400, Peter Eisentraut wrote:
> > On 9/29/17 11:35, Robert Haas wrote:
> > > On Wed, Sep 27, 2017 at 8:29 PM, Michael Paquier
> > >  wrote:
> > >> Looking at this list, the first name is followed by the family name,
> > >> so there are inconsistencies with some Japanese names:
> > >> - Fujii Masao should be Masao Fujii.
> > >> - KaiGai Kohei should be Kohei Kaigai.
> > > 
> > > But his emails say Fujii Masao, not Masao Fujii.
> > > 
> > > KaiGai's case is a bit trickier, as his email name has changed over time.
> > 
> > Yes, I used the form that the person used in their emails.
> 
> How should this be handled for the Postgres 11 release notes?

Ideally, we would let the individuals choose how to be recognized in
release notes, and anywhere else we recognize them.  We have the start
of that in https://postgresql.org/account/profile but that isn't very
easily tied to things in the commit history or elsewhere, yet.  I'd
suggest that we try to improve on that by:

- Allowing anyone to include contributor information in their .Org
  account, even if they aren't listed on the Contributors page.

- Add in a field along the lines of "Name to be used publicly" and let
  them decide what they'd like, possibly even with the option to not be
  publicly listed at all.

- Add tracking of multiple email addresses into the .Org profile
  (somehow sync'd with the pglister system)

- Start including contributor email addresses in commit messages along
  with names.

I don't think we're that far off from this being possible to do in a
more formal way that minimizes the risk of getting things incorrect,
missing someone, or mis-attributing something.  This all involves mostly
work on the .Org system, which we do have some folks working on now but
is also open source and it certainly wouldn't hurt to have more people
involved, if there are others who are interested.  The place to actually
start the discussion of such changes would be -www though.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-10-02 Thread Andrew Borodin

On Mon, Oct 2, 2017 at 8:00 PM, Alexander Korotkov
 wrote:
> What happen if exactly this "continue" fires?
>
>> if (GistFollowRight(stack->page))
>> {
>> if (!xlocked)
>> {
>> LockBuffer(stack->buffer, GIST_UNLOCK);
>> LockBuffer(stack->buffer, GIST_EXCLUSIVE);
>> xlocked = true;
>> /* someone might've completed the split when we unlocked */
>> if (!GistFollowRight(stack->page))
>> continue;
>
>
> In this case we might get xlocked == true without calling
> CheckForSerializableConflictIn().
Indeed! I've overlooked it. I'm remembering this issue, we were
considering not fixing splits if in Serializable isolation, but
dropped the idea.
CheckForSerializableConflictIn() must be after every exclusive lock.

> I think it would be rather safe and easy for understanding to more
> CheckForSerializableConflictIn() directly before gistinserttuple().
The difference is that after lock we have conditions to change page,
and before gistinserttuple() we have actual intent to change page.

>From the point of future development first version is better (if some
new calls occasionally spawn in), but it has 3 calls while your
proposal have 2 calls.
It seems to me that CheckForSerializableConflictIn() before
gistinserttuple() is better as for now.

Best regards, Andrey Borodin.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

2017-10-02 Thread Nico Williams

On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote:
> I know that for my use-cases, having both options available would be very
> appreciated.  The vast majority of the computed columns I would use in my
> database would be okay to compute on read.  But there are for sure some
> which would be performance prohibitive to have compute on read, so i'd
> rather have those stored.
> 
> So for me, i'd rather default to compute on read, as long storing the
> pre-computed value is an option when necessary.

Sure, I agree.  I was just wondering whether there might be any other
difference besides performance characteristics.  The answer to that is,
I think, "no".


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] list of credits for release notes

2017-10-02 Thread Bruce Momjian

On Fri, Sep 29, 2017 at 12:00:05PM -0400, Peter Eisentraut wrote:
> On 9/29/17 11:35, Robert Haas wrote:
> > On Wed, Sep 27, 2017 at 8:29 PM, Michael Paquier
> >  wrote:
> >> Looking at this list, the first name is followed by the family name,
> >> so there are inconsistencies with some Japanese names:
> >> - Fujii Masao should be Masao Fujii.
> >> - KaiGai Kohei should be Kohei Kaigai.
> > 
> > But his emails say Fujii Masao, not Masao Fujii.
> > 
> > KaiGai's case is a bit trickier, as his email name has changed over time.
> 
> Yes, I used the form that the person used in their emails.

How should this be handled for the Postgres 11 release notes?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-10-02 Thread Peter Geoghegan

On Mon, Oct 2, 2017 at 9:42 AM, Peter Eisentraut
 wrote:
> I understand where you are coming from.  My experience with developing
> this feature has been that it is quite fragile in some ways.  We have
> had this out there for testing for many months, and we have fixed many
> bugs, and follow-up bugs, up to very recently.  We don't have good
> automatic test coverage, so we need to rely on user testing.  Making
> significant code and design changes a week or two before the final
> release is just too late, even if the changes were undisputed.  And this
> wasn't just one specific isolated change, it was a set of overlapping
> changes that seemingly kept changing and expanding.

For the record, I don't accept that summary. We could have not
bothered with the sanitization thing at all, and I wouldn't have cared
very much. I even revised my proposal such that you would never have
needed to do the language tag mapping at all [1] (albeit while
desupporting ICU versions prior to 4.6). And, as recently as 7 days
ago, you yourself said:

"Reading over this code again, it is admittedly not quite clear why
this "canonicalization" is in there right now.  I think it had
something to do with how we built the keyword variants at one point.
It might not make sense.  I'd be glad to take that out and use the
result straight  from uloc_getAvailable() for collcollate." [2]

This would have been far more radical than what I proposed.

> I'm satisfied that what we are shipping is "good enough".  It has been
> in development for over a year, it has been reviewed and tested for
> months, and a lot of credit for that goes to you.

It is "good enough", I suppose. I am disappointed by this particular
outcome, but that's mostly because it could have been avoided. I'm
still very happy that you took on this project, and I don't want that
to be overshadowed by this particular disagreement.

> I'm available to discuss the changes you are envisioning, preferably one
> at a time, in the near future as part of the PG11 development process.

I don't really think that there is anything more to be done about the
issues raised on this thread, with the possible exception of the
DEBUG1 display name string thing. The sanitization just isn't valuable
enough to add after the fact. Apart from the risks of adding it after
the fact, another reason not to bother with it is that it's
*incredibly* forgiving. So forgiving that you could reasonably argue
that we might as well not have it at all.

I may well do more on ICU with you in the future, but not in the area
of how things are canonicalized or sanitized. It's too late for that
now.

[1] 
https://www.postgresql.org/message-id/CAH2-WzmVtRyNg2gT=u=ktEC-jM3aKq4bYzJ0u2=osxe+o3k...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/f6c0fca7-e277-3f46-c0c1-adc001bff...@2ndquadrant.com
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Issues with logical replication

2017-10-02 Thread Konstantin Knizhnik


I have faced two issues with logical replication.
Repro scenario:

1. start two Postgres instances (I start both at local machine).
2. Initialize pgbench tables at both instances:
pgbench -i -s 10 postgres
3. Create publication of pgbench_accounts table at one node:
create publication pub for table pgbench_accounts;
4. Create subscription at another node:
delete from pgbench_accounts;
CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost 
port=5432 sslmode=disable' publication pub;
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS 
TRIGGER AS $$

BEGIN
--  RETURN NEW;
END $$ LANGUAGE plpgsql;
   CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE 
ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE 
replication_trigger_proc();

   ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger;
5. Start pgbench at primary node:
pgbench -T 1000 -P 2 -c 10 postgres


Please notice commented "return new" statement. Invocation of this 
function cause error and in log we see repeated messages:


2017-10-02 16:47:46.764 MSK [32129] LOG:  logical replication table 
synchronization worker for subscription "sub", table "pgbench_accounts" 
has started
2017-10-02 16:47:46.771 MSK [32129] ERROR:  control reached end of 
trigger procedure without RETURN
2017-10-02 16:47:46.771 MSK [32129] CONTEXT:  PL/pgSQL function 
replication_trigger_proc()

COPY pgbench_accounts, line 1: "110 "
2017-10-02 16:47:46.772 MSK [28020] LOG:  worker process: logical 
replication worker for subscription 17264 sync 17251 (PID 32129) exited 
with exit code 1

...


After few minutes of work primary node (with publication) is crashed 
with the following stack trace:


Program terminated with signal SIGABRT, Aborted.
#0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

#1  0x7f3608f92028 in __GI_abort () at abort.c:89
#2  0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30 
"!(((xid) != ((TransactionId) 0)))",
errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c", 
lineNumber=582) at assert.c:54
#3  0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, 
oper=XLTW_None) at lmgr.c:582
#4  0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198, 
cutoff=898498) at snapbuild.c:1400
#5  0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910, 
lsn=798477760, running=0x2749198) at snapbuild.c:1311
#6  0x0081c339 in SnapBuildProcessRunningXacts 
(builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106
#7  0x0080a1c7 in DecodeStandbyOp (ctx=0x27ef870, 
buf=0x7ffd301858d0) at decode.c:314
#8  0x00809ce1 in LogicalDecodingProcessRecord (ctx=0x27ef870, 
record=0x27efb30) at decode.c:117
#9  0x0080ddf9 in DecodingContextFindStartpoint (ctx=0x27ef870) 
at logical.c:458
#10 0x00823968 in CreateReplicationSlot (cmd=0x27483a8) at 
walsender.c:934

#11 0x008246ee in exec_replication_command (
cmd_string=0x27b9520 "CREATE_REPLICATION_SLOT 
\"sub_17264_sync_17251\" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at 
walsender.c:1511
#12 0x0088eb44 in PostgresMain (argc=1, argv=0x275b738, 
dbname=0x275b578 "postgres", username=0x272b800 "knizhnik")

at postgres.c:4086
#13 0x007ef9a1 in BackendRun (port=0x27532a0) at postmaster.c:4357
#14 0x007ef0cb in BackendStartup (port=0x27532a0) at 
postmaster.c:4029

#15 0x007eb68b in ServerLoop () at postmaster.c:1753
#16 0x007eac77 in PostmasterMain (argc=3, argv=0x2729670) at 
postmaster.c:1361

#17 0x00728552 in main (argc=3, argv=0x2729670) at main.c:228


Now fix the trigger function:
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
BEGIN
  RETURN NEW;
END $$ LANGUAGE plpgsql;

And manually perform at master two updates inside one transaction:

postgres=# begin;
BEGIN
postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
UPDATE 1
postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
UPDATE 1
postgres=# commit;


and in replcas log we see:
2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply 
worker for subscription "sub" has started

2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible tuple
2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical 
replication worker for subscription 16403 (PID 2954) exited with exit code 1


Error happens in trigger.c:
#3  0x0069bddb in GetTupleForTrigger (estate=0x2e36b50, 
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at 
trigger.c:3103
#4  0x0069b259 in ExecBRUpdateTriggers

Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list

2017-10-02 Thread Pavel Stehule

2017-10-02 18:44 GMT+02:00 Tom Lane :

> Robert Haas  writes:
> > On Mon, Oct 2, 2017 at 12:28 PM, Tom Lane  wrote:
> >> I'm not sure if that's true or not.  I am sure, though, that since
> >> we've done B for twenty years we can't just summarily change to A.
>
> > I agree, but so what?  You said that we couldn't adopt Pavel's
> > proposal for this reason:
>
> > ===
> > IIRC, the reason for disallowing that is that it's totally unclear what
> > the semantics ought to be.  Is that variable a single target (demanding
> > a compatible composite-valued column from the source query), or does it
> > eat one source column per field within the record/row?  The former is
> 100%
> > inconsistent with what happens if the record/row is the only INTO target;
> > while the latter would be very bug-prone, and it's especially unclear
> what
> > ought to happen if it's an as-yet-undefined record variable.
> > ===
>
> > And I'm saying - that argument is bogus.  Regardless of what people
> > want or what we have historically done in the case where the
> > record/row is the only INTO target, when there are multiple targets it
> > seems clear that they want to match up the query's output columns with
> > the INTO targets 1:1.  So let's just do that.
>
> Arguing that that's what people want (even if I granted your argument,
> which I do not) does not make the inconsistency magically disappear,
> nor will it stop people from being confused by that inconsistency.
> Furthermore, if we do it like this, we will be completely backed into
> a backwards-compatibility corner if someone does come along and say
> "hey, I wish I could do the other thing".
>
> I'm fine with doing something where we add new notation to dispel
> the ambiguity.  I don't want to put in another layer of inconsistency
> and then have even more backwards-compatibility problems constraining
> our response to the inevitable complaints.
>

I didn't talk about record type. I talked just only about composite
variables (ROW in our terminology).

I don't think so for this case the special syntax is necessary, although we
can use a parallel assignment with different semantics for this case.

What is a motivation for this thread?

I had to migrate lot of Oracle procedures where was usually two OUT
variables - first - known composite type (some state variable), and second
- result (text or int variable). Now, the CALL of this function in Postgres
is:

SELECT fx() INTO rec;
var_state := rec.state;
var_result := rec.result;

It works, Ora2pg supports it, plpgsql_check is able to check it, but it is
not elegant and less readable.

So, when target is not clean REC or ROW, I am think so we can allow
assignment with few limits

1. The REC type should not be used
2. The target and source fields should be same - this assignment should not
be tolerant like now. Because, this situation is not supported now, there
is not a compatibility risk

Some modern and now well known languages like GO supports parallel
assignment. Can be it the special syntax requested by Tom?

So there are two proposals:

1. Implement safe restrictive SELECT INTO where target can be combination
of REC or scalars
2. Parallel assignment with new behave, that allows any list of REC, ROW or
scalar as target - but composite should be attached to composite var, and
scalar to scalar. List of scalars should be disallowed as target for
composite value should be a) disallowed every time, b) disallowed when some
target var is a composite.

The differences between assign command and INTO command can be messy too.
So the best solution should be one rules for := and INTO - but I am not
sure if it is possible

Comments?

Regards

Pavel

> regards, tom lane
>

Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type

2017-10-02 Thread Alexander Korotkov

On Mon, Oct 2, 2017 at 3:43 PM, Maksim Milyutin 
wrote:

> On 26.09.2017 23:25, Maksim Milyutin wrote:
>
> Updated patchset contains more transparent definition of composite type
> for constant node and regression test for described above buggy case.
>
> Is there any interest on the problem in this thread?
>

Probably everybody are too busy with upcoming release and other patches.
Since this patch is a bug fix, it definitely should be considered.  Please,
register this patch at upcoming commitfest to ensure it wouldn't be
forgotten.
Regression test changes (both .sql and .out) are essential parts of the
patch.  I see no point in posting them separately.  Please, incorporate
them into your patch.
Did you check this patch with manually created composite type (made by
CREATE TYPE typname AS ...)?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Issues with logical replication

2017-10-02 Thread Petr Jelinek

On 02/10/17 17:49, Konstantin Knizhnik wrote:
> I have faced two issues with logical replication.
> Reproducing scenario:
> 
> 1. start two Postgres instances (I start both at local machine).
> 2. Initialize pgbench tables at both instances:
>     pgbench -i -s 10 postgres
> 3. Create publication of pgbench_accounts table at one node:
>     create publication pub for table pgbench_accounts;
> 4. Create subscription at another node:
>     delete from pgbench_accounts;
>     CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost
> port=5432 sslmode=disable' publication pub;
>     CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS
> TRIGGER AS $$
>     BEGIN
>     --  RETURN NEW;
>     END $$ LANGUAGE plpgsql;
>    CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE
> ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE
> replication_trigger_proc();
>    ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger;
> 5. Start pgbench at primary node:
>     pgbench -T 1000 -P 2 -c 10 postgres
> 
> 
> Please notice commented "return new" statement. Invocation of this
> function cause error and in log we see repeated messages:
> 
> 2017-10-02 16:47:46.764 MSK [32129] LOG:  logical replication table
> synchronization worker for subscription "sub", table "pgbench_accounts"
> has started
> 2017-10-02 16:47:46.771 MSK [32129] ERROR:  control reached end of
> trigger procedure without RETURN
> 2017-10-02 16:47:46.771 MSK [32129] CONTEXT:  PL/pgSQL function
> replication_trigger_proc()
>     COPY pgbench_accounts, line 1: "1    1    0 "
> 2017-10-02 16:47:46.772 MSK [28020] LOG:  worker process: logical
> replication worker for subscription 17264 sync 17251 (PID 32129) exited
> with exit code 1
> ...
> 
> 
> After few minutes of work primary node (with publication) is crashed
> with the following stack trace:
> 
> Program terminated with signal SIGABRT, Aborted.
> #0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 56    ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x7f3608f92028 in __GI_abort () at abort.c:89
> #2  0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30
> "!(((xid) != ((TransactionId) 0)))",
>     errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c",
> lineNumber=582) at assert.c:54
> #3  0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
> oper=XLTW_None) at lmgr.c:582
> #4  0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198,
> cutoff=898498) at snapbuild.c:1400
> #5  0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910,
> lsn=798477760, running=0x2749198) at snapbuild.c:1311
> #6  0x0081c339 in SnapBuildProcessRunningXacts
> (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106


Hmm this would suggest that XLOG_RUNNING_XACTS record contains invalid
transaction ids but we specifically skip those in
GetRunningTransactionData(). Can you check pg_waldump output for that
record (lsn is shown in the backtrace)?

> 
> Now fix the trigger function:
> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
> BEGIN
>   RETURN NEW;
> END $$ LANGUAGE plpgsql;
> 
> And manually perform at master two updates inside one transaction:
> 
> postgres=# begin;
> BEGIN
> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
> UPDATE 1
> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
> UPDATE 1
> postgres=# commit;
> 
> 
> and in replica log we see:
> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
> worker for subscription "sub" has started
> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
> tuple
> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
> replication worker for subscription 16403 (PID 2954) exited with exit
> code 1
> 
> Error happens in trigger.c:
> 
> #3  0x0069bddb in GetTupleForTrigger (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>     lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
> trigger.c:3103
> #4  0x0069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>     fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
> #5  0x006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>     at execReplication.c:461
> #6  0x00820894 in apply_handle_update (s=0x7ffc442163b0) at
> worker.c:736

We have locked the same tuple in RelationFindReplTupleByIndex() just
before this gets called and didn't get the same error. I guess we do
something wrong with snapshots. Will need to investigate more.

-- 
  Petr Jelinek

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2017-10-02 Thread Alexander Korotkov

On Sun, Oct 1, 2017 at 11:53 AM, Shubham Barai 
wrote:

> Yes, page-level predicate locking should happen only when fast update is
> off.
> Actually, I forgot to put conditions in updated patch. Does everything
> else look ok ?
>

I think that isolation tests should be improved.  It doesn't seems that any
posting tree would be generated by the tests that you've provided, because
all the TIDs could fit the single posting list.  Note, that you can get
some insight into GIN physical structure using pageinspect contrib.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] generated columns

2017-10-02 Thread Adam Brusselback

I know that for my use-cases, having both options available would be very
appreciated.  The vast majority of the computed columns I would use in my
database would be okay to compute on read.  But there are for sure some
which would be performance prohibitive to have compute on read, so i'd
rather have those stored.

So for me, i'd rather default to compute on read, as long storing the
pre-computed value is an option when necessary.

Just my $0.02
Thanks,
-Adam

Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list

2017-10-02 Thread Tom Lane

Robert Haas  writes:
> On Mon, Oct 2, 2017 at 12:28 PM, Tom Lane  wrote:
>> I'm not sure if that's true or not.  I am sure, though, that since
>> we've done B for twenty years we can't just summarily change to A.

> I agree, but so what?  You said that we couldn't adopt Pavel's
> proposal for this reason:

> ===
> IIRC, the reason for disallowing that is that it's totally unclear what
> the semantics ought to be.  Is that variable a single target (demanding
> a compatible composite-valued column from the source query), or does it
> eat one source column per field within the record/row?  The former is 100%
> inconsistent with what happens if the record/row is the only INTO target;
> while the latter would be very bug-prone, and it's especially unclear what
> ought to happen if it's an as-yet-undefined record variable.
> ===

> And I'm saying - that argument is bogus.  Regardless of what people
> want or what we have historically done in the case where the
> record/row is the only INTO target, when there are multiple targets it
> seems clear that they want to match up the query's output columns with
> the INTO targets 1:1.  So let's just do that.

Arguing that that's what people want (even if I granted your argument,
which I do not) does not make the inconsistency magically disappear,
nor will it stop people from being confused by that inconsistency.
Furthermore, if we do it like this, we will be completely backed into
a backwards-compatibility corner if someone does come along and say
"hey, I wish I could do the other thing".

I'm fine with doing something where we add new notation to dispel
the ambiguity.  I don't want to put in another layer of inconsistency
and then have even more backwards-compatibility problems constraining
our response to the inevitable complaints.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-10-02 Thread Peter Eisentraut

On 10/1/17 14:26, Peter Geoghegan wrote:
> It does seem too late. It's disappointing that we did not do better
> here. This problem was entirely avoidable.

I understand where you are coming from.  My experience with developing
this feature has been that it is quite fragile in some ways.  We have
had this out there for testing for many months, and we have fixed many
bugs, and follow-up bugs, up to very recently.  We don't have good
automatic test coverage, so we need to rely on user testing.  Making
significant code and design changes a week or two before the final
release is just too late, even if the changes were undisputed.  And this
wasn't just one specific isolated change, it was a set of overlapping
changes that seemingly kept changing and expanding.  Analyzing and
reviewing that under pressure, and then effectively owning it going
forward, too, is not something I could commit to.

I'm satisfied that what we are shipping is "good enough".  It has been
in development for over a year, it has been reviewed and tested for
months, and a lot of credit for that goes to you.  The "old" locale
support took many years to take shape, and this one probably will, too,
but at some point I feel we have to let it be for a while and take a
breather and get some feedback and field experience.

I'm available to discuss the changes you are envisioning, preferably one
at a time, in the near future as part of the PG11 development process.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list

2017-10-02 Thread Robert Haas

On Mon, Oct 2, 2017 at 12:28 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Sep 19, 2017 at 3:18 PM, Tom Lane  wrote:
>>> That's certainly a case that we ought to support somehow.  The problem is
>>> staying reasonably consistent with the two-decades-old precedent of the
>>> existing behavior for one target variable.
>
>> My point is that you objected to Pavel's proposal saying "it's not
>> clear whether users want A or B".  But I think they always want A.
>
> I'm not sure if that's true or not.  I am sure, though, that since
> we've done B for twenty years we can't just summarily change to A.

I agree, but so what?  You said that we couldn't adopt Pavel's
proposal for this reason:

===
IIRC, the reason for disallowing that is that it's totally unclear what
the semantics ought to be.  Is that variable a single target (demanding
a compatible composite-valued column from the source query), or does it
eat one source column per field within the record/row?  The former is 100%
inconsistent with what happens if the record/row is the only INTO target;
while the latter would be very bug-prone, and it's especially unclear what
ought to happen if it's an as-yet-undefined record variable.
===

And I'm saying - that argument is bogus.  Regardless of what people
want or what we have historically done in the case where the
record/row is the only INTO target, when there are multiple targets it
seems clear that they want to match up the query's output columns with
the INTO targets 1:1.  So let's just do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-10-02 Thread Robert Haas

On Fri, Sep 15, 2017 at 6:29 PM, Michael Paquier
 wrote:
> I would like to point out that per the RFC, if the client attempts a
> SSL connection with SCRAM and that the server supports channel
> binding, then it has to publish the SASL mechanism for channel
> binding, aka SCRAM-PLUS. If the client tries to force the use of SCRAM
> even if SCRAM-PLUS is specified, this is seen as a downgrade attack by
> the server which must reject the connection. So this parameter has
> meaning only if you try to connect to a PG10 server using a PG11
> client (assuming that channel binding gets into PG11). If you connect
> with a PG11 client to a PG11 server with SSL, the server publishes
> SCRAM-PLUS, the client has to use it, hence this turns out to make
> cbind=disable and prefer meaningless in the long-term. If the client
> does not use SSL, then there is no channel binding, and cbind=require
> loses its value. So cbind's fate is actually linked to sslmode.

That seems problematic.  What if the client supports SCRAM but not
channel binding?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list

2017-10-02 Thread Tom Lane

Robert Haas  writes:
> On Tue, Sep 19, 2017 at 3:18 PM, Tom Lane  wrote:
>> That's certainly a case that we ought to support somehow.  The problem is
>> staying reasonably consistent with the two-decades-old precedent of the
>> existing behavior for one target variable.

> My point is that you objected to Pavel's proposal saying "it's not
> clear whether users want A or B".  But I think they always want A.

I'm not sure if that's true or not.  I am sure, though, that since
we've done B for twenty years we can't just summarily change to A.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest 201709 is now closed

2017-10-02 Thread Alexander Korotkov

On Mon, Oct 2, 2017 at 6:57 PM, Tom Lane  wrote:

> Daniel Gustafsson  writes:
> > Thanks to everyone who participated, and to everyone who have responded
> to my
> > nagging via the CF app email function. This is clearly an awesome
> community.
>
> And thanks to you for your hard work as CFM!  That's tedious and
> largely thankless work, but it's needed to keep things moving.
>

+1,
Thank you very much, Daniel!  It was a pleasure working with you at
commitfest.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list

2017-10-02 Thread Robert Haas

On Tue, Sep 19, 2017 at 3:18 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think the fact that single-target INTO lists and multiple-target
>> INTO lists are handled completely differently is extremely poor
>> language design.  It would have been far better, as you suggested
>> downthread, to have added some syntax up front to let people select
>> the behavior that they want, but I think there's little hope of
>> changing this now without creating even more pain.
>
> How so?  The proposal I gave is fully backwards-compatible.  It's
> likely not the way we'd do it in a green field, but we don't have
> a green field.
>
>> I have a really hard time, however, imagining that anyone writes
>> SELECT a, b, c, d, e, f, g, h, i, j, k INTO x, y, z and wants some of
>> a-k to go into x, some more to go into y, and some more to go into z
>> (and heaven help you if you drop a column from x or y -- now the whole
>> semantics of the query change, yikes).  What's reasonable is to write
>> SELECT a, b, c INTO x, y, z and have those correspond 1:1.
>
> That's certainly a case that we ought to support somehow.  The problem is
> staying reasonably consistent with the two-decades-old precedent of the
> existing behavior for one target variable.

My point is that you objected to Pavel's proposal saying "it's not
clear whether users want A or B".  But I think they always want A.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-10-02 Thread chenhj

On 2017-10-02 23:24:30，"Alexander Korotkov"  wrote:

On Sun, Oct 1, 2017 at 8:27 PM, chenhj  wrote:





Now, this patch looks good for me.  It applies cleanly, builds cleanly, passes 
regression tests, new functionality is covered by regression tests.  Code is OK 
for me and docs too.


I'm marking this patch as "Ready for committer".  BTW, authors field in the 
commitfest app is empty (https://commitfest.postgresql.org/15/1302/).  Please, 
put your name there.


I hope this patch will be committed during 2017-11 commitfest.  Be ready to 
rebase this patch if needed.  Thank you for your work.


I had filled the authors field of this patch in commitfest, and will rebase 
this patch if needed. Thank you for your help!


--
Best Regards,
Chen Huajun

Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-10-02 Thread Robert Haas

On Mon, Oct 2, 2017 at 11:05 AM, Michael Paquier
 wrote:
> Well, there are cases where you don't need any locking checks, and the
> proposed patch ignores that.

I understand that, but shouldn't we then look for a way to adjust the
patch so that it doesn't have that issue any longer, rather than just
kicking it to the curb?  I mean, just saying "patch suxxor, next"
doesn't seem like the right approach to something that has apparently
already found real problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest 201709 is now closed

2017-10-02 Thread Robert Haas

On Mon, Oct 2, 2017 at 11:57 AM, Tom Lane  wrote:
> Daniel Gustafsson  writes:
>> Thanks to everyone who participated, and to everyone who have responded to my
>> nagging via the CF app email function. This is clearly an awesome community.
>
> And thanks to you for your hard work as CFM!  That's tedious and
> largely thankless work, but it's needed to keep things moving.

+1.

I think we need to figure out some plan for tackling the backlog of
Ready for Committer patches, though.  There are currently 42 of them,
10 of which are listed (maybe not all correctly) as bug fixes.  That's
not great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-02 Thread Robert Haas

On Mon, Oct 2, 2017 at 11:02 AM, Joshua D. Drake  wrote:
> +1 to both of these as well.

OK, so here's a patch.  Review appreciated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


64-bit-queryid-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest 201709 is now closed

2017-10-02 Thread Tom Lane

Daniel Gustafsson  writes:
> Thanks to everyone who participated, and to everyone who have responded to my
> nagging via the CF app email function. This is clearly an awesome community.

And thanks to you for your hard work as CFM!  That's tedious and
largely thankless work, but it's needed to keep things moving.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-10-02 Thread Alvaro Herrera

Michael Paquier wrote:
> On Mon, Oct 2, 2017 at 11:44 PM, Robert Haas  wrote:
> > On Mon, Oct 2, 2017 at 10:19 AM, Michael Paquier
> >  wrote:
> >> So those bits could be considered for integration.
> >
> > Yes, and they also tend to suggest that the rest of the patch has value.
> 
> Well, there are cases where you don't need any locking checks, and the
> proposed patch ignores that. Take for example pageinspect which works
> on a copy of a page, or just WAL replay which serializes everything,
> and in both cases PageGetLSN can be used safely. So for compatibility
> reasons I think that PageGetLSN should be kept alone to not surprise
> anybody using it, or at least an equivalent should be introduced. It
> would be interesting to make BufferGetLSNAtomic hold tighter checks,
> like something that makes use of LWLockHeldByMe for example.

I'd argue about this in the same direction I argued about
BufferGetPage() needing an LSN check that's applied separately: if it's
too easy for a developer to do the wrong thing (i.e. fail to apply the
separate LSN check after reading the page) then the routine should be
patched to do the check itself, and another routine should be offered
for the cases that explicitly can do without the check.

I was eventually outvoted in the BufferGetPage() saga, though, so I'm
not sure that that discussion sets precedent.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Brent Dearth

I just recently "upgraded" to High Sierra and experiencing horrendous CREATE
DATABASE performance. Creating a database from a 3G template DB used to
take ~1m but post-upgrade is taking ~22m at a sustained write of around
4MB/s. Occasionally, attempting to create an empty database hangs
indefinitely as well. When this happens, restarting the Postgres server
allows empty database initialization in ~1s.

I had been running on an encrypted APFS volume (FileVault), but after
dropping encryption, saw the tasks drop to about *~15m* per run. Still a
far cry from the previous *~1m* threshold.

A multi-threaded pg_restore seems to sustain writes of ~38M/s and completes
in about the same time as pre-upgrade (Sierra), so I'm not sure it's
entirely related to APFS / disk IO.

I've completely rebuilt the Postgres data directory, re-installed Postgres
(Postgres.app 2.0.5) etc. I don't have any reasonable explanation for what
could have broken so catastrophically.

Coworker has seen the exact same issue. Has anyone else experienced this
yet or have any insight as to what could be happening?


Thanks in advance!

[HACKERS] Issues with logical replication

2017-10-02 Thread Konstantin Knizhnik


I have faced two issues with logical replication.
Reproducing scenario:

1. start two Postgres instances (I start both at local machine).
2. Initialize pgbench tables at both instances:
pgbench -i -s 10 postgres
3. Create publication of pgbench_accounts table at one node:
create publication pub for table pgbench_accounts;
4. Create subscription at another node:
delete from pgbench_accounts;
CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost 
port=5432 sslmode=disable' publication pub;
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS 
TRIGGER AS $$

BEGIN
--  RETURN NEW;
END $$ LANGUAGE plpgsql;
   CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE 
ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE 
replication_trigger_proc();

   ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger;
5. Start pgbench at primary node:
pgbench -T 1000 -P 2 -c 10 postgres


Please notice commented "return new" statement. Invocation of this 
function cause error and in log we see repeated messages:


2017-10-02 16:47:46.764 MSK [32129] LOG:  logical replication table 
synchronization worker for subscription "sub", table "pgbench_accounts" 
has started
2017-10-02 16:47:46.771 MSK [32129] ERROR:  control reached end of 
trigger procedure without RETURN
2017-10-02 16:47:46.771 MSK [32129] CONTEXT:  PL/pgSQL function 
replication_trigger_proc()

COPY pgbench_accounts, line 1: "110 "
2017-10-02 16:47:46.772 MSK [28020] LOG:  worker process: logical 
replication worker for subscription 17264 sync 17251 (PID 32129) exited 
with exit code 1

...


After few minutes of work primary node (with publication) is crashed 
with the following stack trace:


Program terminated with signal SIGABRT, Aborted.
#0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

#1  0x7f3608f92028 in __GI_abort () at abort.c:89
#2  0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30 
"!(((xid) != ((TransactionId) 0)))",
errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c", 
lineNumber=582) at assert.c:54
#3  0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, 
oper=XLTW_None) at lmgr.c:582
#4  0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198, 
cutoff=898498) at snapbuild.c:1400
#5  0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910, 
lsn=798477760, running=0x2749198) at snapbuild.c:1311
#6  0x0081c339 in SnapBuildProcessRunningXacts 
(builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106
#7  0x0080a1c7 in DecodeStandbyOp (ctx=0x27ef870, 
buf=0x7ffd301858d0) at decode.c:314
#8  0x00809ce1 in LogicalDecodingProcessRecord (ctx=0x27ef870, 
record=0x27efb30) at decode.c:117
#9  0x0080ddf9 in DecodingContextFindStartpoint (ctx=0x27ef870) 
at logical.c:458
#10 0x00823968 in CreateReplicationSlot (cmd=0x27483a8) at 
walsender.c:934

#11 0x008246ee in exec_replication_command (
cmd_string=0x27b9520 "CREATE_REPLICATION_SLOT 
\"sub_17264_sync_17251\" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at 
walsender.c:1511
#12 0x0088eb44 in PostgresMain (argc=1, argv=0x275b738, 
dbname=0x275b578 "postgres", username=0x272b800 "knizhnik")

at postgres.c:4086
#13 0x007ef9a1 in BackendRun (port=0x27532a0) at postmaster.c:4357
#14 0x007ef0cb in BackendStartup (port=0x27532a0) at 
postmaster.c:4029

#15 0x007eb68b in ServerLoop () at postmaster.c:1753
#16 0x007eac77 in PostmasterMain (argc=3, argv=0x2729670) at 
postmaster.c:1361

#17 0x00728552 in main (argc=3, argv=0x2729670) at main.c:228


Now fix the trigger function:
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
BEGIN
  RETURN NEW;
END $$ LANGUAGE plpgsql;

And manually perform at master two updates inside one transaction:

postgres=# begin;
BEGIN
postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
UPDATE 1
postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
UPDATE 1
postgres=# commit;


and in replica log we see:
2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply 
worker for subscription "sub" has started
2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible 
tuple
2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical 
replication worker for subscription 16403 (PID 2954) exited with exit 
code 1


Error happens in trigger.c:

#3  0x0069bddb in GetTupleForTrigger (estate=0x2e36b50, 
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at 
trigger.c:3103
#4  0x0069b259 in ExecBRUpdateTriggers

[HACKERS] Commitfest 201709 is now closed

2017-10-02 Thread Daniel Gustafsson

We have now entered October, which marks the end of Commitfest 201709.  All
patches have now been dealt with and the commitfest is closed.  Before starting
on moving, and closing, patches the stat for this commitfest were:

Needs review: 69.
Waiting on Author: 22.
Ready for Committer: 40.
Committed: 88.
Moved to next CF: 11.
Rejected: 8.
Returned with Feedback: 18.
===
Total: 256.

29 patches were in need of review but didn’t have a reviewer assigned (and no
review posted och -hackers), these have all been moved to the next CF.  On top
of that, 40 patches were Ready for committer, and have been moved to the next
commitfest.  The next commitfest is thus quite the smorgasbord for committers
already.

A few patches had been "Waiting for author" during the entire commitfest
without seeing any updates, and were thus closed as “Returned with feddback”.
A few others patches were Returned with feedback too, due to either not moving
from Waiting for author, or due to review comments being made.

And last, but not least, a lot of patches were pushed to the next commitfest.
What I’ve tried to do is move patches such that every patch submitted has a
fair chance at a good review.  Also, patches being actively discussed and
worked on were moved of course.  With the volumes in mind, I’m absolutely
certain I’ve botched one or two up though.  If you have a patch that was moved,
and you intend to rewrite enough of it to warrant a resubmission then please go
in and close your entry.

In summary, this commitfest saw a large number of patches, and while lots of
patches got reviewed and committed, there is a lot of improvement to be had
when it comes to closing them.  We clearly need more bandwidth among reviewers
and committers to cope with the increasing size of the commitfests.  There is a
clear risk that too much gets moved to the next commitfest making each new
commitfest progressively larger and harder to handle.  I don’t have any good
answers, but we probably need to think of something going forward.

Thanks to everyone who participated, and to everyone who have responded to my
nagging via the CF app email function. This is clearly an awesome community.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-10-02 Thread Alexander Korotkov

On Sun, Oct 1, 2017 at 8:27 PM, chenhj  wrote:

> On  2017-10-01 04:09:19,"Alexander Korotkov" 
> wrote:
>
> On Sat, Sep 30, 2017 at 8:18 PM, chenhj  wrote:
>
>> On 2017-09-30 02:17:54，"Alexander Korotkov" > > wrote:
>>
>>
>> Great.  Now code of this patch looks good for me.
>> However, we forgot about documentation.
>>
>>   
>>>The result is equivalent to replacing the target data directory with
>>> the
>>>source one. Only changed blocks from relation files are copied;
>>>all other files are copied in full, including configuration files. The
>>>advantage of pg_rewind over taking a new base backup,
>>> or
>>>tools like rsync, is that pg_rewind
>>> does
>>>not require reading through unchanged blocks in the cluster. This
>>> makes
>>>it a lot faster when the database is large and only a small
>>>fraction of blocks differ between the clusters.
>>>   
>>
>>
>> At least, this paragraph need to be adjusted, because it states whose
>> files are copied.  And probably latter paragraphs whose state about WAL
>> files.
>>
>>
>>
>> Your are rigth.
>> I wrote a draft as following, but i'm afraid whether the english
>> statement is accurate.
>>
>
> I'm not native english speaker too :(
>
> Only the WAL files between the point of divergence and the current WAL
>> insert location of the source server are copied, *for* other WAL files are
>> useless for the target server.
>
>
> I'm not sure about this usage of word *for*.  For me, it probably should
> be just removed.  Rest of changes looks good for me.  Please, integrate
> them into the patch.
>
>
> I had removed the *for* , Pleae check the new patch again.
>

Now, this patch looks good for me.  It applies cleanly, builds cleanly,
passes regression tests, new functionality is covered by regression tests.
Code is OK for me and docs too.

I'm marking this patch as "Ready for committer".  BTW, authors field in the
commitfest app is empty (https://commitfest.postgresql.org/15/1302/).
Please, put your name there.

I hope this patch will be committed during 2017-11 commitfest.  Be ready to
rebase this patch if needed.  Thank you for your work.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] generated columns

2017-10-02 Thread Nico Williams

On Thu, Aug 31, 2017 at 12:16:43AM -0400, Peter Eisentraut wrote:
> In previous discussions, it has often been a source of confusion whether
> these generated columns are supposed to be computed on insert/update and
> stored, or computed when read.  The SQL standard is not explicit, but
> appears to lean toward stored.  DB2 stores.  Oracle computes on read.

Question: How would one know the difference between storing computed
  columns vs. computing them on read?

Answer?:  Performance.  If the computation is slow, then you'll really
  notice on read.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-10-02 Thread Michael Paquier

On Mon, Oct 2, 2017 at 11:44 PM, Robert Haas  wrote:
> On Mon, Oct 2, 2017 at 10:19 AM, Michael Paquier
>  wrote:
>> So those bits could be considered for integration.
>
> Yes, and they also tend to suggest that the rest of the patch has value.

Well, there are cases where you don't need any locking checks, and the
proposed patch ignores that. Take for example pageinspect which works
on a copy of a page, or just WAL replay which serializes everything,
and in both cases PageGetLSN can be used safely. So for compatibility
reasons I think that PageGetLSN should be kept alone to not surprise
anybody using it, or at least an equivalent should be introduced. It
would be interesting to make BufferGetLSNAtomic hold tighter checks,
like something that makes use of LWLockHeldByMe for example.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-02 Thread Joshua D. Drake


On 10/01/2017 04:22 PM, Robert Haas wrote:

On Sun, Oct 1, 2017 at 3:48 PM, Greg Stark  wrote:

Well these kinds of monitoring systems tend to be used by operations
people who are a lot more practical and a lot less worried about
theoretical concerns like that.


+1, well said.


In context the point was merely that the default
pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
values are enough. It wouldn't be hard for there to be 64k different
queries over time and across all the databases in a fleet and at that
point it becomes likely there'll be a 32-bit collision.


Yeah.

I think Alexander Korotkov's points are quite good, too.



+1 to both of these as well.

jD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logging idle checkpoints

2017-10-02 Thread Stephen Frost

Vik, all,

* Vik Fearing (vik.fear...@2ndquadrant.com) wrote:
> I recently had a sad because I noticed that checkpoint counts were
> increasing in pg_stat_bgwriter, but weren't accounted for in my logs
> with log_checkpoints enabled.

> After some searching, I found that it was the idle checkpoints that
> weren't being logged.  I think this is a missed trick in 6ef2eba3f57.

> Attached is a one-liner fix.  I realize how imminent we are from
> releasing v10 but I hope there is still time for such a minor issue as this.

Idle checkpoints aren't, well, really checkpoints though.  If anything,
seems like we shouldn't be including skipped checkpoints in the
pg_stat_bgwriter count because we aren't actually doing a checkpoint.

I certainly don't care for the idea of adding log messages saying we
aren't doing anything just to match a count that's incorrectly claiming
that checkpoints are happening when they aren't.

The down-thread suggestion of keeping track of skipped checkpoints might
be interesting, but I'm not entirely convinced it really is.  We have
time to debate that, of course, but I don't really see how that's
helpful.  At the moment, it seems like the suggestion to add that column
is based on the assumption that we're going to start logging skipped
checkpoints and having that column would allow us to match up the count
between the new column and the "skipped checkpoint" messages in the logs
and I can not help but feel that this is a ridiculous amount of effort
being put into the analysis of something that *didn't* happen.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-10-02 Thread Alexander Korotkov

Hi, Andrew!

On Mon, Oct 2, 2017 at 1:40 PM, Andrew Borodin 
wrote:

> Thanks for looking into the patch!
>
> On Thu, Sep 28, 2017 at 3:59 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>>
>>
>> In gistdoinsert() you do CheckForSerializableConflictIn() only if page
>> wasn't exclusively locked before (xlocked is false).
>>
>> if (!xlocked)
>>> {
>>> LockBuffer(stack->buffer, GIST_UNLOCK);
>>> LockBuffer(stack->buffer, GIST_EXCLUSIVE);
>>> CheckForSerializableConflictIn(r, NULL, stack->buffer);
>>> xlocked = true;
>>
>>
>> However, page might be exclusively locked before.  And in this case
>> CheckForSerializableConflictIn() would be skipped.  That happens very
>> rarely (someone fixes incomplete split before we did), but nevertheless.
>>
>
> if xlocked = true, page was already checked for conflict after setting
> exclusive lock on it's buffer.  I still do not see any problem here...
>

What happen if exactly this "continue" fires?

if (GistFollowRight(stack->page))
> {
> if (!xlocked)
> {
> LockBuffer(stack->buffer, GIST_UNLOCK);
> LockBuffer(stack->buffer, GIST_EXCLUSIVE);
> xlocked = true;
> /* someone might've completed the split when we unlocked */
> if (!GistFollowRight(stack->page))
> continue;


In this case we might get xlocked == true without
calling CheckForSerializableConflictIn().  This is very rare codepath, but
still...
I think it would be rather safe and easy for understanding to
more CheckForSerializableConflictIn() directly before gistinserttuple().

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-10-02 Thread Robert Haas

On Mon, Oct 2, 2017 at 10:19 AM, Michael Paquier
 wrote:
> So those bits could be considered for integration.

Yes, and they also tend to suggest that the rest of the patch has value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-02 Thread Robert Haas

On Thu, Sep 14, 2017 at 10:45 PM, Amit Kapila  wrote:
> The latest patch again needs to be rebased.  Find rebased patch
> attached with this email.

I read through this patch this morning.   Copying Tom in the hopes
that he might chime in on the following two issues in particular:

1. Is there any superior alternative to adding ptype to ParamExecData?
 I think the reason why we don't have this already is because the plan
tree that populates the Param must output the right type and the plan
tree that reads the Param must expect the right type, and there's no
need for anything else.  But for serialization and deserialization
this seems to be necessary.  I wonder whether it would be better to
try to capture this at the time the Param is generated (e.g.
var->vartype in assign_param_for_var) rather than derived at execution
time by applying exprType(), but I'm not sure how that would work
exactly, or whether it's really any better.

2. Do max_parallel_hazard_walker and set_param_references() have the
right idea about which parameters are acceptable candidates for this
optimization?  The idea seems to be to collect the setParam sets for
all initplans between the current query level and the root.  That
looks correct to me but I'm not an expert on this parameter stuff.

Some other comments:

- I think parallel_hazard_walker should likely work by setting
safe_param_ids to the right set of parameter IDs rather than testing
whether the parameter is safe by checking either whether it is in
safe_param_ids or some other condition is met.

- contains_parallel_unsafe_param() sounds like a function that merely
returns true or false, but it actually has major side effects.  Those
side effects also look unsafe; mutating previously-generated paths can
corrupt the rel's pathlist, because it will no longer have the sort
order and other characteristics that add_path() creates and upon which
other code relies.

- Can't is_initplan_is_below_current_query_level() be confused when
there are multiple subqueries in the tree?  For example if the
toplevel query has subqueries a and b and a has a sub-subquery aa
which has an initplan, won't this function think that b has an
initplan below the current query level?  If not, maybe a comment is in
order explaining why not?

- Why do we even need contains_parallel_unsafe_param() and
is_initplan_is_below_current_query_level() in the first place, anyway?
 I think there's been some discussion of that on this thread, but I'm
not sure I completely understand it, and the comments in the patch
don't help me understand why we now need this restriction.

- The new code in ExplainPrintPlan() needs a comment.

- I have typically referred in comments to "Gather or Gather Merge"
rather than "gather or gather merge".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-10-02 Thread Michael Paquier

On Mon, Oct 2, 2017 at 9:09 PM, Robert Haas  wrote:
> I think the first question we ought to be asking ourselves is whether
> any of the PageGetLSN -> BufferGetLSNAtomic changes the patch
> introduces are live bugs.  If they are, then we ought to fix those
> separately (and probably back-patch).  If they are not, then we need
> to think about how to adjust the patch so that it doesn't complain
> about things that are in fact OK.

If you look at each item one by one that the patch touches based on
the contract defined in transam/README...

--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -640,7 +640,7 @@ gistdoinsert(Relation r, IndexTuple itup, Size
freespace, GISTSTATE *giststate)
}

stack->page = (Page) BufferGetPage(stack->buffer);
-   stack->lsn = PageGetLSN(stack->page);
+   stack->lsn = BufferGetLSNAtomic(stack->buffer);
There is an incorrect use of PageGetLSN here. A shared lock can be
taken on the page but there is no buffer header lock used when using
PageGetLSN.

@@ -890,7 +890,7 @@ gistFindPath(Relation r, BlockNumber child,
OffsetNumber *downlinkoffnum)
break;
}

-   top->lsn = PageGetLSN(page);
+   top->lsn = BufferGetLSNAtomic(buffer);
Here as well only a shared lock is taken on the page.

@@ -61,7 +61,7 @@ gistkillitems(IndexScanDesc scan)
 * read. killedItems could be not valid so LP_DEAD hints applying is not
 * safe.
 */
-   if (PageGetLSN(page) != so->curPageLSN)
+   if (BufferGetLSNAtomic(buffer) != so->curPageLSN)
{
UnlockReleaseBuffer(buffer);
so->numKilled = 0;  /* reset counter */
@@ -384,7 +384,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem
*pageItem, double *myDistances,
 * safe to apply LP_DEAD hints to the page later. This allows us to drop
 * the pin for MVCC scans, which allows vacuum to avoid blocking.
 */
-   so->curPageLSN = PageGetLSN(page);
+   so->curPageLSN = BufferGetLSNAtomic(buffer);
Those two as well only use a shared lock.

@@ -249,7 +249,7 @@ gistbulkdelete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,

ptr = (GistBDItem *) palloc(sizeof(GistBDItem));
ptr->blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
-   ptr->parentlsn = PageGetLSN(page);
+   ptr->parentlsn = BufferGetLSNAtomic(buffer);
ptr->next = stack->next;
stack->next = ptr;
Here also a shared lock is only taken, that's a VACUUM code path for
Gist indexes.

+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -1224,7 +1224,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection
dir, OffsetNumber offnum)
 * safe to apply LP_DEAD hints to the page later.  This allows us to drop
 * the pin for MVCC scans, which allows vacuum to avoid blocking.
 */
-   so->currPos.lsn = PageGetLSN(page);
+   so->currPos.lsn = BufferGetLSNAtomic(so->currPos.buf);
Here the caller of _bt_readpage should have taken a lock, but the page
is not necessarily pinned. Still, _bt_getbuf makes sure that the pin
is done.

+++ b/src/backend/access/nbtree/nbtutils.c
@@ -1772,7 +1772,7 @@ _bt_killitems(IndexScanDesc scan)
return;

page = BufferGetPage(buf);
-   if (PageGetLSN(page) == so->currPos.lsn)
+   if (BufferGetLSNAtomic(buf) == so->currPos.lsn)
so->currPos.buf = buf;
Same here thanks to _bt_getbuf.

So those bits could be considered for integration.

Also, if I read the gist code correctly, there is one other case in
gistFindCorrectParent. And in btree, there is one occurence in
_bt_split. In XLogRecordAssemble, there could be more checks regarding
the locks taken on the buffer registered.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 132 matches

Mail list logo