Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2018-01-20 Thread Jing Wang
Hi Stephen and Thomas,

Thanks your review comments.
Enclosed please find the latest patch.

>/src/backend/parser/gram.y: In function ‘base_yyparse’:
>/src/backend/parser/gram.y:1160:19: warning: assignment from incompatible
pointer type [-Wincompatible-pointer-types]
>| IN_P DATABASE db_spec_name { $$ = $3; }
The warning has been dismissed.

>When it should be:
>ALTER DATABASE { name |
CURRENT_DATABASE } OWNER TO { new_owner |
CURRENT_USER | SESSION_USER }
Yes. It should be.

>Please don't include whitespace-only hunks, like this one:
Ok.

>The TAP regression tests for pg_dump are failing.
The test case has been updated.

>make makeDbSpec() return a DbSpec and then try to minimize the
>forced-casting happening.
Makes sense. It has been changed.


Regards,
Jing Wang
Fujitsu Australia


support_CURRENT_DATABASE_keyword_v4.6.patch
Description: Binary data


Re: [Sender Address Forgery]Re: pg_(total_)relation_size and partitioned tables

2018-01-20 Thread Michael Paquier
On Fri, Jan 19, 2018 at 06:28:41PM +0900, Amit Langote wrote:
> Do you mean pg_partition_tree(regclass), that returns all partitions in
> the partition tree whose root is passed as the parameter?
> 
> Perhaps, like the following (roughly implemented in the attached)?
> 
> select  pg_partition_root(p) as root_parent,
> pg_partition_parent(p) as parent,
> p as relname,
> pg_total_relation_size(p) as size
> frompg_partition_tree_tables('p') p
> order by 4;
>  root_parent | parent | relname |  size
> -++-+-
>  p   || p   |   0
>  p   | p  | p123|   0
>  p   | p123   | p12 |   0
>  p   | p123   | p3  | 3653632
>  p   | p12| p1  | 3653632
>  p   | p12| p2  | 3653632
> (6 rows)

It seems that you caught the idea. As long as each leaf and root is
listed uniquely, that would be acceptable to me, and useful for users.
If pg_partition_tree_tables() uses the top of the partition as input,
all the tree should show up. If you use a leaf, anything under the leaf
should show up. If a leaf is defined and it has no underlying leaves,
then only this outmost leaf should be listed.

+/*
+ * Returns Oids of tables in a publication.
+ */
Close enough, but that's not a publication.

>> Documentation, as well as regression tests, would be welcome :)
> 
> OK, I will add those things in the next version.

Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Supporting huge pages on Windows

2018-01-20 Thread Magnus Hagander
On Fri, Dec 1, 2017 at 5:02 AM, Michael Paquier 
wrote:

> On Fri, Nov 24, 2017 at 9:28 AM, Tsunakawa, Takayuki
>  wrote:
> > From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com]
> >> I hope Tsunakawa-san doesn't mind me posting another rebased version of
> >> his patch.  The last version conflicted with the change from SGML to XML
> >> that just landed in commit 3c49c6fa.
> >
> > Thank you very much for keeping it fresh.  I hope the committer could
> have some time.
>
> I have moved this patch to next CF. Magnus, you are registered as a
> reviewer of this patch. Are you planning to look at it and potentially
> commit it?
>

Apologies for the delays here. Yes, I was and am.

I took a look at this patch again. I've made some small updates to comments
etc. There was also from what I can tell one actual bug in the code -- a
missing free() of delPrivs.

The debug message for ERROR_NO_SYSTEM_RESOURCES said it turned off huge
pages because of not enough huge pages, but AIUI it can be because of other
resources as well. So I dropped that specific.


However, reading through a few things -- we now use the
API GetLargePageMinimum() unconditionally. This API appeared in Windows
Vista and Windows 2003. That means that if we apply this patch, those are
now our minimum versions of Windows. At least unless Microsoft backpatched
things.

This is probably equally true of some other things we do, but those are
runtime, and we would just give an error on old platforms. If I'm thinking
right, then direct linking to GetLargePageMinimum() will simply make it
impossible to start a PostgreSQL backend with or without huge pages on
previous versions, because it will give a linker error.

I wonder if we need to do something similar to what we already do for
CreateRestrictedToken() in pg_ctl.c for example. That one actually is done
for compatibility with NT4/2000 -- CreateRestrictedToken was added in XP.
So while we could consider getting rid of that workaround now, perhaps we
need to put a similar one in place for this?

I don't have a Windows build box around ATM, or a Windows XP, so if
somebody could try the attached version, build a postgres and see if it
even starts on a Windows XP machine, that would be useful input!

If we can confirm/deny the situation around XP and decide what to do about
it, I am now happy to commit the rest of this patch.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 37a61a13c8..cc156c6385 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1369,14 +1369,26 @@ include_dir 'conf.d'

 

-At present, this feature is supported only on Linux. The setting is
-ignored on other systems when set to try.
+At present, this feature is supported only on Linux and Windows. The
+setting is ignored on other systems when set to try.

 

 The use of huge pages results in smaller page tables and less CPU time
-spent on memory management, increasing performance. For more details,
-see .
+spent on memory management, increasing performance. For more details about
+using huge pages on Linux, see .
+   
+
+   
+Huge pages are known as large pages on Windows.  To use them, you need to
+assign the user right Lock Pages in Memory to the Windows user account
+that runs PostgreSQL.
+You can use Windows Group Policy tool (gpedit.msc) to assign the user right
+Lock Pages in Memory.
+To start the database server on the command prompt as a standalone process,
+not as a Windows service, the command prompt must be run as an administrator
+User Access Control (UAC) must be disabled. When the UAC is enabled, the normal
+command prompt revokes the user right Lock Pages in Memory when started.

 

diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index 4991ed46f1..9362f1c4a4 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -21,6 +21,7 @@ HANDLE		UsedShmemSegID = INVALID_HANDLE_VALUE;
 void	   *UsedShmemSegAddr = NULL;
 static Size UsedShmemSegSize = 0;
 
+static bool EnableLockPagesPrivilege(int elevel);
 static void pgwin32_SharedMemoryDelete(int status, Datum shmId);
 
 /*
@@ -103,6 +104,66 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2)
 	return true;
 }
 
+/*
+ * EnableLockPagesPrivilege
+ *
+ * Try to acquire SeLockMemoryPrivilege so we can use large pages.
+ */
+static bool
+EnableLockPagesPrivilege(int elevel)
+{
+	HANDLE hToken;
+	TOKEN_PRIVILEGES tp;
+	LUID luid;
+
+	if (!OpenProcessToken(GetCurrentProcess(), TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY, &hToken))
+	{
+		ereport(elevel,
+(errmsg("could not enable Lock Pages in Memory user right: error code %lu", GetL

Re: [HACKERS] log_destination=file

2018-01-20 Thread Magnus Hagander
On Tue, Nov 14, 2017 at 5:33 PM, Robert Haas  wrote:

> On Sun, Sep 10, 2017 at 5:29 AM, Dmitry Dolgov <9erthali...@gmail.com>
> wrote:
> > average latency:
> >
> > clients patch   master
> > 10  0.321   0.286
> > 20  0.669   0.602
> > 30  1.016   0.942
> > 40  1.358   1.280
> > 50  1.727   1.637
>
> That's still a noticeable slowdown, though.  And we've had previous
> reports of the overhead of logging being significant as well:
>
> http://postgr.es/m/CACLsApsA7U0GCFpojVQem6SGTEkv8
> vnwdbfhvi+dqo+gu5g...@mail.gmail.com
>
> I seem to recall a discussion, perhaps in-person, around the time Theo
> submitted that patch where it was reported that the logging collector
> could not be used on some systems he was working with because it
> became a major performance bottleneck.  With each backend writing its
> own messages to a file, it was tolerable, but when you tried to funnel
> everything through a single process, the back-pressure slowed down the
> entire system unacceptably.
>

Finally found myself back at this one, because I still think this is a
problem we definitely need to adress (whether with this file or not).

The funneling into a single process is definitely an issue.

But we don't really solve that problem today wit logging to stderr, do we?
Because somebody has to pick up the log as it came from stderr. Yes, you
get more overhead when sending the log to devnull, but that isn't really a
realistic scenario. The question is what to do when you actually want to
collect that much logging that quickly.

If each backend could actually log to *its own file*, then things would get
sped up. But we can't do that today. Unless you use the hooks and build it
yourself.

Per the thread referenced, using the hooks to handle the
very-high-rate-logging case seems to be the conclusion. But is that still
the conclusion, or do we feel we need to also have a native solution?

And if the conclusion is that hooks is the way to go for that, then is the
slowdown of this patch actually a relevant problem to it?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: improve type conversion of SPI_processed in Python

2018-01-20 Thread Peter Eisentraut
On 1/12/18 11:06, Tom Lane wrote:
>> Would that even be necessary?  Why not use the LongLong variant all the
>> time then?
> 
> I've not looked at the Python internals to see if one is noticeably faster
> than the other, but if it isn't, yeah we could simplify that.  In any case
> my main point is let's keep these two functions using similar approaches.

I ran extensive tests on this and went ahead with the simplified versions.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

2018-01-20 Thread Amit Kapila
On Fri, Sep 29, 2017 at 8:20 PM, Alexander Korotkov
 wrote:
> On Fri, Sep 8, 2017 at 4:07 AM, Thomas Munro 
> wrote:
>>
>> Hi Shubham,
>>
>> On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai 
>> wrote:
>> > If these two hash keys (78988658 and 546789888) mapped to the same
>> > bucket, this will result in false serialization failure.
>> > Please correct me if this assumption about false positives is wrong.
>>
>> I wonder if there is an opportunity to use computed hash values
>> directly in predicate lock tags, rather than doing it on the basis of
>> buckets.  Perhaps I'm missing something important about the way that
>> locks need to escalate that would prevent that from working.
>
>
> +1,
> Very nice idea!  Locking hash values directly seems to be superior over
> locking hash index pages.
> Shubham, do you have any comment on this?
>

As Shubham seems to be running out of time, I thought of helping him
by looking into the above-suggested idea.  I think one way to lock a
particular hash value is we can use TID of heap tuple associated with
each index entry (index entry for the hash index will be hash value).
However, do we really need it for implementing predicate locking for
hash indexes?  If we look at the "Index AM implementations" section of
README-SSI, it doesn't seem to be required.  Basically, if we look at
the strategy of predicate locks in btree [1], it seems to me locking
at page level for hash index seems to be a right direction as similar
to btree, the corresponding heap tuple read will be locked.

What do you think?

[1] -
* B-tree index searches acquire predicate locks only on the index
*leaf* pages needed to lock the appropriate index range. If, however,
a search discovers that no root page has yet been created, a predicate
lock on the index relation is required.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: master make check fails on Solaris 10

2018-01-20 Thread Marina Polyakova

On 18-01-2018 20:49, Marina Polyakova wrote:

On 18-01-2018 20:34, Tom Lane wrote:

...
What you could do in the meantime is work on finding a variation of
Victor's test that will detect the bug regardless of -O level.
If we do have hope that future gcc versions will handle this 
correctly,

we'll need a better test rather than just summarily dismissing
host_cpu = sparc.


Thanks, I'll try..


I tried different options of gcc but it did not help..
Perhaps searching in the source code of gcc will clarify something, but 
I'm sorry that I'm now too busy for this..


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [PATCH] Atomic pgrename on Windows

2018-01-20 Thread Magnus Hagander
On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier 
wrote:

> On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
>  wrote:
> > Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
> > appears to be possible to atomically replace file on Windows –
> ReplaceFile()
> > does that.  ReplaceFiles() requires target file to exist, this is why we
> > still need to call MoveFileEx() when it doesn't exist.
>
> Do you think that it could be safer to unlink the target file first
> with pgunlink()? This way you make sure that the target file is
> removed and not locked. This change makes me worrying about the
> introduction of more race conditions.


Unlinking it first seems dangerous, as pointed out by Andres.

What about first trying ReplaceFile() and then if it fails with "target
doesn't exist", then call MoveFileEx().

Or the other way around -- try MoveFileEx() first since that seems to work
most of the time today (if it mostly broke we'd be in trouble already), and
if it fails with a sharing violation, try ReplaceFile()? And perhaps end up
doing it something similar to what we do with shared memory which is just
to loop over it and try  each a couple of time, before giving up and
failing?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: WIP Patch: Precalculate stable functions, infrastructure v1

2018-01-20 Thread Marina Polyakova

As I said, thank you so much for your comments!!

On 17-01-2018 0:30, Tom Lane wrote:

...
This is indeed quite a large patch, but it seems to me it could become
smaller. After a bit of review:

1. I do not like what you've done with ParamListInfo. The changes 
around

that are invasive, accounting for a noticeable part of the patch bulk,
and I don't think they're well designed. Having to cast back and forth
between ParamListInfo and ParamListInfoCommon and so on is ugly and 
prone

to cause (or hide) errors. And I don't really understand why
ParamListInfoPrecalculationData exists at all. Couldn't you have gotten
the same result with far fewer notational changes by defining another
PARAM_FLAG bit in the existing pflags field? (Or alternatively, maybe
the real need here is for another ParamKind value for Param nodes?)

I also dislike this approach because it effectively throws away the
support for "virtual" param arrays that I added in commit 6719b238e:
ParamListInfoPrecalculationData has no support for dynamically 
determined
parameter properties, which is surely something that somebody will 
need.

(It's just luck that the patch doesn't break plpgsql today.) I realize
that that's a recent commit and the code I'm complaining about predates
it, but we need to adjust this so that it fits in with the new 
approach.

See comment block at lines 25ff in params.h.


I'll try to use ParamListInfoData for generic plans (= to get cached 
expressions for params of prepared statements where possible) without 
changing its infrastructure.


2. I don't follow the need for the also-rather-invasive changes to 
domain

constraint data structures. I do see that the patch attempts to make
CoerceToDomain nodes cacheable, which is flat wrong and has to be 
ripped

out. You *cannot* assume that the planner has access to the same domain
constraints that will apply at runtime.


I'm sorry, I did not know about this :-[


I've occasionally thought that we should hook domain constraint changes
into the plan invalidation mechanism, which would make it possible for
the planner to assume that the constraints seen at planning time will
apply at execution. Whereupon we could have the planner insert domain
constraint expressions into the plan rather than leaving those to be
collected at query startup by execExpr.c, and then do things like
constant-folding and cacheing CoerceToDomain nodes. But that would be
a rather large and very domain-specific change, and so it would be fit
material for a different patch IMO. I recommend that for now you just
treat CoerceToDomain as an uncacheable expression type and rip all the
domain-related changes out of this patch.


I'll fix this.


3. I think you should also try hard to get rid of the need for
PlannedStmt.hasCachedExpr. AFAICS there's only one place that is
using that flag, which is exec_simple_check_plan, and I have to
think there are better ways we could deal with that. In particular,
I don't understand why you haven't simply set up plpgsql parameter
references to be noncacheable. Or maybe what we'd better do is
disable CacheExpr insertions into potentially-simple plans in the
first place. As you have it here, it's possible for recompilation
of an expression to result in a change in whether it should be deemed
simple or not, which will break things (cf commit 00418c612).


I'm sorry, I'll fix the use of parameters in this case. And I'll think 
how to get rid of the need for PlannedStmt.hasCachedExpr when there're 
possible cached expressions without parameters.



4. I don't like the way that you've inserted
"replace_qual_cached_expressions" and
"replace_pathtarget_cached_expressions" calls into seemingly random 
places

in the planner. Why isn't that being done uniformly during expression
preprocessing? There's no apparent structure to where you've put these
calls, and so they seem really vulnerable to errors of omission.


I'll fix this.


Also,
if this were done in expression preprocessing, there'd be a chance of
combining it with some existing pass over expression trees instead of
having to do a separate (and expensive) expression tree mutation.
I can't help suspecting that eval_const_expressions could take this on
as an additional responsibility with a lot less than a thousand new 
lines

of code.


From quick look I see no contradictions so I'll try to implement it.

5. BTW, cost_eval_cacheable_expr seems like useless restructuring as 
well.

Why aren't you just recursively applying the regular costing function?


Such a stupid mistake :( I'll fix this.


If you did all of the above it would result in a pretty significant
reduction of the number of places touched by the patch, which would 
make

it easier to see what's going on. Then we could start to discuss, for
instance, what does the "isConstParam" flag actually *mean* and why
is it different from PARAM_FLAG_CONST?


AFAIU they do not differ, and as I said above I'll try not to change the 
infrastructure of ParamListInfoData.



An

Re: WIP Patch: Precalculate stable functions, infrastructure v1

2018-01-20 Thread Marina Polyakova

On 17-01-2018 1:05, Tom Lane wrote:

[ I'm sending this comment separately because I think it's an issue
Andres might take an interest in. ]

Marina Polyakova  writes:

[ v7-0001-Precalculate-stable-and-immutable-functions.patch ]


Another thing that's bothering me is that the execution semantics
you're proposing for CachedExpr seem rather inflexible.  AFAICS, once a
CachedExpr has run once, it will hang on to the result value and keep
returning that for the entire lifespan of the compiled expression.
We already noted that that breaks plpgsql's "simple expression"
logic, and it seems inevitable to me that it will be an issue for
other places as well.  I think it'd be a better design if we had some
provision for resetting the cached values, short of recompiling the
expression from scratch.

One way that occurs to me to do this is to replace the simple boolean
isExecuted flags with a generation counter, and add a master generation
counter to ExprState.  The rule for executing CachedExpr would be "if 
my

generation counter is different from the ExprState's counter, then
evaluate the subexpression and copy the ExprState's counter into mine".
Then the procedure for forcing recalculation of cached values is just 
to
increment the ExprState's counter.  There are other ways one could 
imagine

doing this --- for instance, I initially thought of keeping the master
counter in the ExprContext being used to run the expression.  But you 
need

some way to remember what counter value was used last with a particular
expression, so probably keeping it in ExprState is better.

Or we could just brute-force it by providing a function that runs 
through

a compiled expression step list and resets the isExecuted flag for each
EEOP_CACHEDEXPR_IF_CACHED step it finds.  A slightly less brute-force
way is to link those steps together in a list, so that the function
doesn't have to visit irrelevant steps.  If the reset function were 
seldom
used then the extra cycles for this wouldn't be very expensive.  But 
I'm

not sure it will be seldom used --- it seems like plpgsql simple
expressions will be doing this every time --- so I think the counter
approach might be a better idea.


Thank you very much! I'll try to implement something from this.


I'm curious to know whether Andres has some other ideas, or whether he
feels this is all a big wart on the compiled-expression concept.  I 
don't

think there are any existing cases where we keep any meaningful state
across executions of a compiled-expression data structure; maybe that's
a bad idea in itself.


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Use of term hostaddrs for multiple hostaddr values

2018-01-20 Thread Magnus Hagander
On Tue, Jan 9, 2018 at 5:34 AM, Michael Paquier 
wrote:

> Hi all,
>
> While looking at the documentation of libpq, I have noticed that the
> term hostaddrs is used to qualify multiple values of hostaddr. This
> looks incorrect to me, as this is not the name of a connection
> parameter. Please find attached a patch to address this
> inconsistency. One error message is also touched, impacting
> translability.
>

These are both clear bugs, and the docs one should definitely be both
applied and backpatched.

How much do we care about the error message when it comes to backpatching?
Maybe we should leave that one to 11 only, to avoid breaking that, as the
way it's written it's actually less wrong there.

Thoughts?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-20 Thread Peter Geoghegan
On Fri, Jan 19, 2018 at 9:29 PM, Amit Kapila  wrote:
> I think I can see why this patch needs that.  Is it mainly for the
> work you are doing in _bt_leader_heapscan where you are waiting for
> all the workers to be finished?

Yes, though it's also needed for the leader tuplesort. It needs to be
able to discover worker runs by looking for temp files named 0 through
to $NWORKERS - 1.

The problem with seeing who shows up after a period of time, and
having the leader arbitrarily determine that to be the total number of
participants (while blocking further participants from joining) is
that I don't know *how long to wait*. This would probably work okay
for parallel CREATE INDEX, when the leader participates as a worker,
because you can check only when the leader is finished acting as a
worker. It stands to reason that that's enough time for worker
processes to at least show up, and be seen to show up. We can use the
duration of the leader's participation as a worker as a natural way to
decide how long to wait.

But what when the leader doesn't participate as a worker, for whatever
reason? Other uses for parallel tuplesort might typically have much
less leader participation as compared to parallel CREATE INDEX. In
short, ISTM that seeing who shows up is a bad strategy for parallel
tuplesort.

> I think till now we don't have any such requirement, but if it is must
> for this patch, then I don't think it is tough to do that.  We need to
> write an API WaitForParallelWorkerToAttach() and then call for each
> launched worker or maybe WaitForParallelWorkersToAttach() which can
> wait for all workers to attach and report how many have successfully
> attached.   It will have functionality of
> WaitForBackgroundWorkerStartup and additionally it needs to check if
> the worker is attached to the error queue.  We already have similar
> API (WaitForReplicationWorkerAttach) for logical replication workers
> as well.  Note that it might have a slight impact on the performance
> because with this you need to wait for the workers to startup before
> doing any actual work, but I don't think it should be noticeable for
> large operations especially for operations like parallel create index.

Actually, though it doesn't really look like it from the way things
are structured within nbtsort.c, I don't need to wait for workers to
start up (call the WaitForParallelWorkerToAttach() function you
sketched) before doing any real work within the leader. The leader can
participate as a worker, and only do this check afterwards. That will
work because the leader Tuplesortstate has yet to do any real work.
Nothing stops me from adding a new function to tuplesort, for the
leader, that lets the leader say: "New plan -- you should now expect
this many participants" (leader takes this reliable number from
eventual call to WaitForParallelWorkerToAttach()).

I admit that I had no idea that there is this issue with
nworkers_launched until very recently. But then, that field has
absolutely no comments.

--
Peter Geoghegan



Re: [HACKERS] WIP: Covering + unique indexes.

2018-01-20 Thread Peter Geoghegan
On Wed, Jan 17, 2018 at 12:45 AM, Andrey Borodin  wrote:
> Unfortunately, amcheck_next does not work currently on HEAD (there are 
> problems with AllocSetContextCreate() signature), but I've tested 
> bt_index_check() before, during and after pgbench, on primary and on slave. 
> Also, I've checked bt_index_parent_check() on master.

I fixed that recently. It should be fine now.

-- 
Peter Geoghegan



Re: Use of term hostaddrs for multiple hostaddr values

2018-01-20 Thread Michael Paquier
On Sat, Jan 20, 2018 at 08:30:43PM +0200, Magnus Hagander wrote:
> On Tue, Jan 9, 2018 at 5:34 AM, Michael Paquier 
> wrote:
> These are both clear bugs, and the docs one should definitely be both
> applied and backpatched.
> 
> How much do we care about the error message when it comes to backpatching?
> Maybe we should leave that one to 11 only, to avoid breaking that, as the
> way it's written it's actually less wrong there.
> 
> Thoughts?

Thanks for your input!

Applying the error message portion only on HEAD is a good plan, there is
no point to make the life of translaters unnecessary painful.
--
Michael


signature.asc
Description: PGP signature


Re: Use of term hostaddrs for multiple hostaddr values

2018-01-20 Thread Peter Eisentraut
On 1/20/18 17:39, Michael Paquier wrote:
> On Sat, Jan 20, 2018 at 08:30:43PM +0200, Magnus Hagander wrote:
>> On Tue, Jan 9, 2018 at 5:34 AM, Michael Paquier 
>> wrote:
>> These are both clear bugs, and the docs one should definitely be both
>> applied and backpatched.
>>
>> How much do we care about the error message when it comes to backpatching?
>> Maybe we should leave that one to 11 only, to avoid breaking that, as the
>> way it's written it's actually less wrong there.
>>
>> Thoughts?
> 
> Thanks for your input!
> 
> Applying the error message portion only on HEAD is a good plan, there is
> no point to make the life of translaters unnecessary painful.

I would backpatch both.  The updated error message is arguably easier to
translate.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Configurable file mode mask

2018-01-20 Thread Michael Paquier
On Fri, Jan 19, 2018 at 06:54:23PM -0300, Alvaro Herrera wrote:
> Peter Eisentraut wrote:
> If the only problem is that buildfarm would run tests twice, then I
> think we should just press forward with this regardless of that: it
> seems a chicken-and-egg problem, because buildfarm cannot be upgraded to
> use some new test method if the method doesn't exist yet.  As a
> solution, let's just live with some run duplication for a period until
> the machines are upgraded to a future buildfarm client code.
> 
> If there are other problems, let's see what they are so that we can fix
> them.

The main complain I received about this move of the pg_upgrade tests to
be a TAP one is that they don't support easily cross-major version
upgrades, removing some abilities to what a developer or the builfarm
can actually do. Making this possible would require first some
refactoring of PostgresNode.pm so as a node is aware of the binary paths
it uses to be able to manipulate multiple instances with different
install paths at the same time.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-20 Thread Michael Paquier
On Wed, Jan 17, 2018 at 12:36:31AM -0800, Noah Misch wrote:
> For me, verifiability is the crucial benefit of inline asm.  Anyone with an
> architecture manual can thoroughly review an inline asm implementation.  Given
> intrinsics and __xlc_ver__ conditionals, the same level of review requires
> access to every xlc version.

Okay.

> The most recent patch version is Returned with Feedback.  As a matter of
> procedure, I discourage creating commitfest entries as a tool to solicit new
> patch versions.  If I were the author of a RwF patch, I would dislike finding
> a commitfest entry that I did not create with myself listed as author.

Per my understanding, this is a bug, and we don't want to lose track of
them.

> If you do choose to proceed, the entry should be Waiting on Author.

Right.

> Note that fixing this bug is just the start of accepting XLC 13.1 as a
> compiler of PostgreSQL.  If we get a buildfarm member with a few dozen clean
> runs (blocked by, at a minimum, fixing this and the inlining bug), we'll have
> something.  Until then, support for XLC 13.1 is an anti-feature.

Per my understanding of this thread, this is a bug. My point is that the
documentation states that AIX is supported from 4.3.3 to 6.1, however
there are no restrictions related to the compiler, hence I would have
thought that the docs imply XLC 13.1 as a supported compiler. And IBM
states that XLC 13.1 is supported from AIX 6.1:
https://www-01.ibm.com/support/docview.wss?uid=swg21326972

True that the docs tell as well to look at the buildfarm animals, which
don't use XLC 13.1 if you don't look at them closely. Perhaps an
explicit mention of the compiler compatibilities in the docs would help
making the support range clear to anybody then. I would expect more
people to look at the docs than the buildfarm internal contents.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-20 Thread Amit Kapila
On Sun, Jan 21, 2018 at 1:39 AM, Peter Geoghegan  wrote:
> On Fri, Jan 19, 2018 at 9:29 PM, Amit Kapila  wrote:
>
> Actually, though it doesn't really look like it from the way things
> are structured within nbtsort.c, I don't need to wait for workers to
> start up (call the WaitForParallelWorkerToAttach() function you
> sketched) before doing any real work within the leader. The leader can
> participate as a worker, and only do this check afterwards. That will
> work because the leader Tuplesortstate has yet to do any real work.
> Nothing stops me from adding a new function to tuplesort, for the
> leader, that lets the leader say: "New plan -- you should now expect
> this many participants" (leader takes this reliable number from
> eventual call to WaitForParallelWorkerToAttach()).
>
> I admit that I had no idea that there is this issue with
> nworkers_launched until very recently. But then, that field has
> absolutely no comments.
>

It would have been better if there were some comments besides that
field, but I think it has been covered at another place in the code.
See comments in LaunchParallelWorkers().

/*
* Start workers.
*
* The caller must be able to tolerate ending up with fewer workers than
* expected, so there is no need to throw an error here if registration
* fails.  It wouldn't help much anyway, because registering the worker in
* no way guarantees that it will start up and initialize successfully.
*/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [Sender Address Forgery]Re: pg_(total_)relation_size and partitioned tables

2018-01-20 Thread David Rowley
On 20 January 2018 at 23:14, Michael Paquier  wrote:
> If pg_partition_tree_tables() uses the top of the partition as input,
> all the tree should show up. If you use a leaf, anything under the leaf
> should show up. If a leaf is defined and it has no underlying leaves,
> then only this outmost leaf should be listed.

hmm, that's thoroughly confusing. Just in case anyone else is stuck on
that, I just need to mention that a leaf is the does not have
branches, in nature or computer science.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] UPDATE of partition key

2018-01-20 Thread Tom Lane
Robert Haas  writes:
> Committed with a bunch of mostly-cosmetic revisions.

Buildfarm member skink has been unhappy since this patch went in.
Running the regression tests under valgrind easily reproduces the
failure.  Now, I might be wrong about which of the patches committed
on Friday caused the unhappiness, but the valgrind backtrace sure
looks like it's to do with partition routing:

==00:00:05:49.683 17549== Invalid read of size 4
==00:00:05:49.683 17549==at 0x62A8BA: ExecCleanupTupleRouting 
(execPartition.c:483)
==00:00:05:49.683 17549==by 0x6483AA: ExecEndModifyTable 
(nodeModifyTable.c:2682)
==00:00:05:49.683 17549==by 0x627139: standard_ExecutorEnd (execMain.c:1604)
==00:00:05:49.683 17549==by 0x7780AF: ProcessQuery (pquery.c:206)
==00:00:05:49.683 17549==by 0x7782E4: PortalRunMulti (pquery.c:1286)
==00:00:05:49.683 17549==by 0x778AAF: PortalRun (pquery.c:799)
==00:00:05:49.683 17549==by 0x774E4C: exec_simple_query (postgres.c:1120)
==00:00:05:49.683 17549==by 0x776C17: PostgresMain (postgres.c:4143)
==00:00:05:49.683 17549==by 0x6FA419: PostmasterMain (postmaster.c:4412)
==00:00:05:49.683 17549==by 0x66E51F: main (main.c:228)
==00:00:05:49.683 17549==  Address 0xe25e298 is 2,088 bytes inside a block of 
size 32,768 alloc'd
==00:00:05:49.683 17549==at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==00:00:05:49.683 17549==by 0x89EB15: AllocSetAlloc (aset.c:945)
==00:00:05:49.683 17549==by 0x8A7577: palloc (mcxt.c:848)
==00:00:05:49.683 17549==by 0x671969: new_list (list.c:68)
==00:00:05:49.683 17549==by 0x672859: lappend_oid (list.c:169)
==00:00:05:49.683 17549==by 0x55330E: find_inheritance_children 
(pg_inherits.c:144)
==00:00:05:49.683 17549==by 0x553447: find_all_inheritors 
(pg_inherits.c:203)
==00:00:05:49.683 17549==by 0x62AC76: ExecSetupPartitionTupleRouting 
(execPartition.c:68)
==00:00:05:49.683 17549==by 0x64949D: ExecInitModifyTable 
(nodeModifyTable.c:2232)
==00:00:05:49.683 17549==by 0x62BBE8: ExecInitNode (execProcnode.c:174)
==00:00:05:49.683 17549==by 0x627B53: standard_ExecutorStart 
(execMain.c:1043)
==00:00:05:49.683 17549==by 0x778046: ProcessQuery (pquery.c:156)

(This is my local result, but skink's log looks about the same.)

regards, tom lane



Re: MCV lists for highly skewed distributions

2018-01-20 Thread John Naylor
I wrote:

>> I have a slight reservaton about whether 1.25x is still a sensible
>> heuristic.
>
> This was also discussed in [1], but no patch came out of it. I was
> just now turning the formulas discussed there into code, but I'll
> defer to someone with more expertise. FWIW, I suspect that a solution
> that doesn't take into account a metric like coefficient of variation
> will have the wrong behavior sometimes, whether for highly uniform or
> highly non-uniform distributions.

I spent a few hours hacking on this, and it turns out calculating the
right number of MCVs taking into account both uniform and highly
non-uniform distributions is too delicate a problem for me to solve
right now. The logic suggested by Dean Rasheed in [1] always produces
no MCVs for a perfectly uniform distribution (which is good), but very
often also for other distributions, which is not good. My efforts to
tweak that didn't work, so I didn't get as far as adapting it for the
problem Jeff is trying to solve.

I have not been able to come up with a more compelling alternative, so
I have nothing further to say about the patch under review.

> [1]
> https://www.postgresql.org/message-id/flat/32261.1496611829%40sss.pgh.pa.us#32261.1496611...@sss.pgh.pa.us

-John Naylor



Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2018-01-20 Thread Jing Wang
Hi Stephen and Thomas,

Thanks your review comments.
Enclosed please find the latest patch.

>/src/backend/parser/gram.y: In function ‘base_yyparse’:
>/src/backend/parser/gram.y:1160:19: warning: assignment from incompatible
pointer type [-Wincompatible-pointer-types]
>| IN_P DATABASE db_spec_name { $$ = $3; }
The warning has been dismissed.

>When it should be:
>ALTER DATABASE { name |
CURRENT_DATABASE } OWNER TO { new_owner |
CURRENT_USER | SESSION_USER }
Yes. It should be.

>Please don't include whitespace-only hunks, like this one:
Ok.

>The TAP regression tests for pg_dump are failing.
The test case has been updated.

>make makeDbSpec() return a DbSpec and then try to minimize the
>forced-casting happening.
Makes sense. It has been changed.


Regards,
Jing Wang
Fujitsu Australia


support_CURRENT_DATABASE_keyword_v4.6.patch
Description: Binary data


Re: [Sender Address Forgery]Re: pg_(total_)relation_size and partitioned tables

2018-01-20 Thread Michael Paquier
On Fri, Jan 19, 2018 at 06:28:41PM +0900, Amit Langote wrote:
> Do you mean pg_partition_tree(regclass), that returns all partitions in
> the partition tree whose root is passed as the parameter?
> 
> Perhaps, like the following (roughly implemented in the attached)?
> 
> select  pg_partition_root(p) as root_parent,
> pg_partition_parent(p) as parent,
> p as relname,
> pg_total_relation_size(p) as size
> frompg_partition_tree_tables('p') p
> order by 4;
>  root_parent | parent | relname |  size
> -++-+-
>  p   || p   |   0
>  p   | p  | p123|   0
>  p   | p123   | p12 |   0
>  p   | p123   | p3  | 3653632
>  p   | p12| p1  | 3653632
>  p   | p12| p2  | 3653632
> (6 rows)

It seems that you caught the idea. As long as each leaf and root is
listed uniquely, that would be acceptable to me, and useful for users.
If pg_partition_tree_tables() uses the top of the partition as input,
all the tree should show up. If you use a leaf, anything under the leaf
should show up. If a leaf is defined and it has no underlying leaves,
then only this outmost leaf should be listed.

+/*
+ * Returns Oids of tables in a publication.
+ */
Close enough, but that's not a publication.

>> Documentation, as well as regression tests, would be welcome :)
> 
> OK, I will add those things in the next version.

Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Supporting huge pages on Windows

2018-01-20 Thread Magnus Hagander
On Fri, Dec 1, 2017 at 5:02 AM, Michael Paquier 
wrote:

> On Fri, Nov 24, 2017 at 9:28 AM, Tsunakawa, Takayuki
>  wrote:
> > From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com]
> >> I hope Tsunakawa-san doesn't mind me posting another rebased version of
> >> his patch.  The last version conflicted with the change from SGML to XML
> >> that just landed in commit 3c49c6fa.
> >
> > Thank you very much for keeping it fresh.  I hope the committer could
> have some time.
>
> I have moved this patch to next CF. Magnus, you are registered as a
> reviewer of this patch. Are you planning to look at it and potentially
> commit it?
>

Apologies for the delays here. Yes, I was and am.

I took a look at this patch again. I've made some small updates to comments
etc. There was also from what I can tell one actual bug in the code -- a
missing free() of delPrivs.

The debug message for ERROR_NO_SYSTEM_RESOURCES said it turned off huge
pages because of not enough huge pages, but AIUI it can be because of other
resources as well. So I dropped that specific.


However, reading through a few things -- we now use the
API GetLargePageMinimum() unconditionally. This API appeared in Windows
Vista and Windows 2003. That means that if we apply this patch, those are
now our minimum versions of Windows. At least unless Microsoft backpatched
things.

This is probably equally true of some other things we do, but those are
runtime, and we would just give an error on old platforms. If I'm thinking
right, then direct linking to GetLargePageMinimum() will simply make it
impossible to start a PostgreSQL backend with or without huge pages on
previous versions, because it will give a linker error.

I wonder if we need to do something similar to what we already do for
CreateRestrictedToken() in pg_ctl.c for example. That one actually is done
for compatibility with NT4/2000 -- CreateRestrictedToken was added in XP.
So while we could consider getting rid of that workaround now, perhaps we
need to put a similar one in place for this?

I don't have a Windows build box around ATM, or a Windows XP, so if
somebody could try the attached version, build a postgres and see if it
even starts on a Windows XP machine, that would be useful input!

If we can confirm/deny the situation around XP and decide what to do about
it, I am now happy to commit the rest of this patch.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 37a61a13c8..cc156c6385 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1369,14 +1369,26 @@ include_dir 'conf.d'

 

-At present, this feature is supported only on Linux. The setting is
-ignored on other systems when set to try.
+At present, this feature is supported only on Linux and Windows. The
+setting is ignored on other systems when set to try.

 

 The use of huge pages results in smaller page tables and less CPU time
-spent on memory management, increasing performance. For more details,
-see .
+spent on memory management, increasing performance. For more details about
+using huge pages on Linux, see .
+   
+
+   
+Huge pages are known as large pages on Windows.  To use them, you need to
+assign the user right Lock Pages in Memory to the Windows user account
+that runs PostgreSQL.
+You can use Windows Group Policy tool (gpedit.msc) to assign the user right
+Lock Pages in Memory.
+To start the database server on the command prompt as a standalone process,
+not as a Windows service, the command prompt must be run as an administrator
+User Access Control (UAC) must be disabled. When the UAC is enabled, the normal
+command prompt revokes the user right Lock Pages in Memory when started.

 

diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index 4991ed46f1..9362f1c4a4 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -21,6 +21,7 @@ HANDLE		UsedShmemSegID = INVALID_HANDLE_VALUE;
 void	   *UsedShmemSegAddr = NULL;
 static Size UsedShmemSegSize = 0;
 
+static bool EnableLockPagesPrivilege(int elevel);
 static void pgwin32_SharedMemoryDelete(int status, Datum shmId);
 
 /*
@@ -103,6 +104,66 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2)
 	return true;
 }
 
+/*
+ * EnableLockPagesPrivilege
+ *
+ * Try to acquire SeLockMemoryPrivilege so we can use large pages.
+ */
+static bool
+EnableLockPagesPrivilege(int elevel)
+{
+	HANDLE hToken;
+	TOKEN_PRIVILEGES tp;
+	LUID luid;
+
+	if (!OpenProcessToken(GetCurrentProcess(), TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY, &hToken))
+	{
+		ereport(elevel,
+(errmsg("could not enable Lock Pages in Memory user right: error code %lu", GetL

Re: [HACKERS] log_destination=file

2018-01-20 Thread Magnus Hagander
On Tue, Nov 14, 2017 at 5:33 PM, Robert Haas  wrote:

> On Sun, Sep 10, 2017 at 5:29 AM, Dmitry Dolgov <9erthali...@gmail.com>
> wrote:
> > average latency:
> >
> > clients patch   master
> > 10  0.321   0.286
> > 20  0.669   0.602
> > 30  1.016   0.942
> > 40  1.358   1.280
> > 50  1.727   1.637
>
> That's still a noticeable slowdown, though.  And we've had previous
> reports of the overhead of logging being significant as well:
>
> http://postgr.es/m/CACLsApsA7U0GCFpojVQem6SGTEkv8
> vnwdbfhvi+dqo+gu5g...@mail.gmail.com
>
> I seem to recall a discussion, perhaps in-person, around the time Theo
> submitted that patch where it was reported that the logging collector
> could not be used on some systems he was working with because it
> became a major performance bottleneck.  With each backend writing its
> own messages to a file, it was tolerable, but when you tried to funnel
> everything through a single process, the back-pressure slowed down the
> entire system unacceptably.
>

Finally found myself back at this one, because I still think this is a
problem we definitely need to adress (whether with this file or not).

The funneling into a single process is definitely an issue.

But we don't really solve that problem today wit logging to stderr, do we?
Because somebody has to pick up the log as it came from stderr. Yes, you
get more overhead when sending the log to devnull, but that isn't really a
realistic scenario. The question is what to do when you actually want to
collect that much logging that quickly.

If each backend could actually log to *its own file*, then things would get
sped up. But we can't do that today. Unless you use the hooks and build it
yourself.

Per the thread referenced, using the hooks to handle the
very-high-rate-logging case seems to be the conclusion. But is that still
the conclusion, or do we feel we need to also have a native solution?

And if the conclusion is that hooks is the way to go for that, then is the
slowdown of this patch actually a relevant problem to it?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: improve type conversion of SPI_processed in Python

2018-01-20 Thread Peter Eisentraut
On 1/12/18 11:06, Tom Lane wrote:
>> Would that even be necessary?  Why not use the LongLong variant all the
>> time then?
> 
> I've not looked at the Python internals to see if one is noticeably faster
> than the other, but if it isn't, yeah we could simplify that.  In any case
> my main point is let's keep these two functions using similar approaches.

I ran extensive tests on this and went ahead with the simplified versions.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

2018-01-20 Thread Amit Kapila
On Fri, Sep 29, 2017 at 8:20 PM, Alexander Korotkov
 wrote:
> On Fri, Sep 8, 2017 at 4:07 AM, Thomas Munro 
> wrote:
>>
>> Hi Shubham,
>>
>> On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai 
>> wrote:
>> > If these two hash keys (78988658 and 546789888) mapped to the same
>> > bucket, this will result in false serialization failure.
>> > Please correct me if this assumption about false positives is wrong.
>>
>> I wonder if there is an opportunity to use computed hash values
>> directly in predicate lock tags, rather than doing it on the basis of
>> buckets.  Perhaps I'm missing something important about the way that
>> locks need to escalate that would prevent that from working.
>
>
> +1,
> Very nice idea!  Locking hash values directly seems to be superior over
> locking hash index pages.
> Shubham, do you have any comment on this?
>

As Shubham seems to be running out of time, I thought of helping him
by looking into the above-suggested idea.  I think one way to lock a
particular hash value is we can use TID of heap tuple associated with
each index entry (index entry for the hash index will be hash value).
However, do we really need it for implementing predicate locking for
hash indexes?  If we look at the "Index AM implementations" section of
README-SSI, it doesn't seem to be required.  Basically, if we look at
the strategy of predicate locks in btree [1], it seems to me locking
at page level for hash index seems to be a right direction as similar
to btree, the corresponding heap tuple read will be locked.

What do you think?

[1] -
* B-tree index searches acquire predicate locks only on the index
*leaf* pages needed to lock the appropriate index range. If, however,
a search discovers that no root page has yet been created, a predicate
lock on the index relation is required.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: master make check fails on Solaris 10

2018-01-20 Thread Marina Polyakova

On 18-01-2018 20:49, Marina Polyakova wrote:

On 18-01-2018 20:34, Tom Lane wrote:

...
What you could do in the meantime is work on finding a variation of
Victor's test that will detect the bug regardless of -O level.
If we do have hope that future gcc versions will handle this 
correctly,

we'll need a better test rather than just summarily dismissing
host_cpu = sparc.


Thanks, I'll try..


I tried different options of gcc but it did not help..
Perhaps searching in the source code of gcc will clarify something, but 
I'm sorry that I'm now too busy for this..


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [PATCH] Atomic pgrename on Windows

2018-01-20 Thread Magnus Hagander
On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier 
wrote:

> On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
>  wrote:
> > Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
> > appears to be possible to atomically replace file on Windows –
> ReplaceFile()
> > does that.  ReplaceFiles() requires target file to exist, this is why we
> > still need to call MoveFileEx() when it doesn't exist.
>
> Do you think that it could be safer to unlink the target file first
> with pgunlink()? This way you make sure that the target file is
> removed and not locked. This change makes me worrying about the
> introduction of more race conditions.


Unlinking it first seems dangerous, as pointed out by Andres.

What about first trying ReplaceFile() and then if it fails with "target
doesn't exist", then call MoveFileEx().

Or the other way around -- try MoveFileEx() first since that seems to work
most of the time today (if it mostly broke we'd be in trouble already), and
if it fails with a sharing violation, try ReplaceFile()? And perhaps end up
doing it something similar to what we do with shared memory which is just
to loop over it and try  each a couple of time, before giving up and
failing?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: WIP Patch: Precalculate stable functions, infrastructure v1

2018-01-20 Thread Marina Polyakova

As I said, thank you so much for your comments!!

On 17-01-2018 0:30, Tom Lane wrote:

...
This is indeed quite a large patch, but it seems to me it could become
smaller. After a bit of review:

1. I do not like what you've done with ParamListInfo. The changes 
around

that are invasive, accounting for a noticeable part of the patch bulk,
and I don't think they're well designed. Having to cast back and forth
between ParamListInfo and ParamListInfoCommon and so on is ugly and 
prone

to cause (or hide) errors. And I don't really understand why
ParamListInfoPrecalculationData exists at all. Couldn't you have gotten
the same result with far fewer notational changes by defining another
PARAM_FLAG bit in the existing pflags field? (Or alternatively, maybe
the real need here is for another ParamKind value for Param nodes?)

I also dislike this approach because it effectively throws away the
support for "virtual" param arrays that I added in commit 6719b238e:
ParamListInfoPrecalculationData has no support for dynamically 
determined
parameter properties, which is surely something that somebody will 
need.

(It's just luck that the patch doesn't break plpgsql today.) I realize
that that's a recent commit and the code I'm complaining about predates
it, but we need to adjust this so that it fits in with the new 
approach.

See comment block at lines 25ff in params.h.


I'll try to use ParamListInfoData for generic plans (= to get cached 
expressions for params of prepared statements where possible) without 
changing its infrastructure.


2. I don't follow the need for the also-rather-invasive changes to 
domain

constraint data structures. I do see that the patch attempts to make
CoerceToDomain nodes cacheable, which is flat wrong and has to be 
ripped

out. You *cannot* assume that the planner has access to the same domain
constraints that will apply at runtime.


I'm sorry, I did not know about this :-[


I've occasionally thought that we should hook domain constraint changes
into the plan invalidation mechanism, which would make it possible for
the planner to assume that the constraints seen at planning time will
apply at execution. Whereupon we could have the planner insert domain
constraint expressions into the plan rather than leaving those to be
collected at query startup by execExpr.c, and then do things like
constant-folding and cacheing CoerceToDomain nodes. But that would be
a rather large and very domain-specific change, and so it would be fit
material for a different patch IMO. I recommend that for now you just
treat CoerceToDomain as an uncacheable expression type and rip all the
domain-related changes out of this patch.


I'll fix this.


3. I think you should also try hard to get rid of the need for
PlannedStmt.hasCachedExpr. AFAICS there's only one place that is
using that flag, which is exec_simple_check_plan, and I have to
think there are better ways we could deal with that. In particular,
I don't understand why you haven't simply set up plpgsql parameter
references to be noncacheable. Or maybe what we'd better do is
disable CacheExpr insertions into potentially-simple plans in the
first place. As you have it here, it's possible for recompilation
of an expression to result in a change in whether it should be deemed
simple or not, which will break things (cf commit 00418c612).


I'm sorry, I'll fix the use of parameters in this case. And I'll think 
how to get rid of the need for PlannedStmt.hasCachedExpr when there're 
possible cached expressions without parameters.



4. I don't like the way that you've inserted
"replace_qual_cached_expressions" and
"replace_pathtarget_cached_expressions" calls into seemingly random 
places

in the planner. Why isn't that being done uniformly during expression
preprocessing? There's no apparent structure to where you've put these
calls, and so they seem really vulnerable to errors of omission.


I'll fix this.


Also,
if this were done in expression preprocessing, there'd be a chance of
combining it with some existing pass over expression trees instead of
having to do a separate (and expensive) expression tree mutation.
I can't help suspecting that eval_const_expressions could take this on
as an additional responsibility with a lot less than a thousand new 
lines

of code.


From quick look I see no contradictions so I'll try to implement it.

5. BTW, cost_eval_cacheable_expr seems like useless restructuring as 
well.

Why aren't you just recursively applying the regular costing function?


Such a stupid mistake :( I'll fix this.


If you did all of the above it would result in a pretty significant
reduction of the number of places touched by the patch, which would 
make

it easier to see what's going on. Then we could start to discuss, for
instance, what does the "isConstParam" flag actually *mean* and why
is it different from PARAM_FLAG_CONST?


AFAIU they do not differ, and as I said above I'll try not to change the 
infrastructure of ParamListInfoData.



An

Re: WIP Patch: Precalculate stable functions, infrastructure v1

2018-01-20 Thread Marina Polyakova

On 17-01-2018 1:05, Tom Lane wrote:

[ I'm sending this comment separately because I think it's an issue
Andres might take an interest in. ]

Marina Polyakova  writes:

[ v7-0001-Precalculate-stable-and-immutable-functions.patch ]


Another thing that's bothering me is that the execution semantics
you're proposing for CachedExpr seem rather inflexible.  AFAICS, once a
CachedExpr has run once, it will hang on to the result value and keep
returning that for the entire lifespan of the compiled expression.
We already noted that that breaks plpgsql's "simple expression"
logic, and it seems inevitable to me that it will be an issue for
other places as well.  I think it'd be a better design if we had some
provision for resetting the cached values, short of recompiling the
expression from scratch.

One way that occurs to me to do this is to replace the simple boolean
isExecuted flags with a generation counter, and add a master generation
counter to ExprState.  The rule for executing CachedExpr would be "if 
my

generation counter is different from the ExprState's counter, then
evaluate the subexpression and copy the ExprState's counter into mine".
Then the procedure for forcing recalculation of cached values is just 
to
increment the ExprState's counter.  There are other ways one could 
imagine

doing this --- for instance, I initially thought of keeping the master
counter in the ExprContext being used to run the expression.  But you 
need

some way to remember what counter value was used last with a particular
expression, so probably keeping it in ExprState is better.

Or we could just brute-force it by providing a function that runs 
through

a compiled expression step list and resets the isExecuted flag for each
EEOP_CACHEDEXPR_IF_CACHED step it finds.  A slightly less brute-force
way is to link those steps together in a list, so that the function
doesn't have to visit irrelevant steps.  If the reset function were 
seldom
used then the extra cycles for this wouldn't be very expensive.  But 
I'm

not sure it will be seldom used --- it seems like plpgsql simple
expressions will be doing this every time --- so I think the counter
approach might be a better idea.


Thank you very much! I'll try to implement something from this.


I'm curious to know whether Andres has some other ideas, or whether he
feels this is all a big wart on the compiled-expression concept.  I 
don't

think there are any existing cases where we keep any meaningful state
across executions of a compiled-expression data structure; maybe that's
a bad idea in itself.


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Use of term hostaddrs for multiple hostaddr values

2018-01-20 Thread Magnus Hagander
On Tue, Jan 9, 2018 at 5:34 AM, Michael Paquier 
wrote:

> Hi all,
>
> While looking at the documentation of libpq, I have noticed that the
> term hostaddrs is used to qualify multiple values of hostaddr. This
> looks incorrect to me, as this is not the name of a connection
> parameter. Please find attached a patch to address this
> inconsistency. One error message is also touched, impacting
> translability.
>

These are both clear bugs, and the docs one should definitely be both
applied and backpatched.

How much do we care about the error message when it comes to backpatching?
Maybe we should leave that one to 11 only, to avoid breaking that, as the
way it's written it's actually less wrong there.

Thoughts?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-20 Thread Peter Geoghegan
On Fri, Jan 19, 2018 at 9:29 PM, Amit Kapila  wrote:
> I think I can see why this patch needs that.  Is it mainly for the
> work you are doing in _bt_leader_heapscan where you are waiting for
> all the workers to be finished?

Yes, though it's also needed for the leader tuplesort. It needs to be
able to discover worker runs by looking for temp files named 0 through
to $NWORKERS - 1.

The problem with seeing who shows up after a period of time, and
having the leader arbitrarily determine that to be the total number of
participants (while blocking further participants from joining) is
that I don't know *how long to wait*. This would probably work okay
for parallel CREATE INDEX, when the leader participates as a worker,
because you can check only when the leader is finished acting as a
worker. It stands to reason that that's enough time for worker
processes to at least show up, and be seen to show up. We can use the
duration of the leader's participation as a worker as a natural way to
decide how long to wait.

But what when the leader doesn't participate as a worker, for whatever
reason? Other uses for parallel tuplesort might typically have much
less leader participation as compared to parallel CREATE INDEX. In
short, ISTM that seeing who shows up is a bad strategy for parallel
tuplesort.

> I think till now we don't have any such requirement, but if it is must
> for this patch, then I don't think it is tough to do that.  We need to
> write an API WaitForParallelWorkerToAttach() and then call for each
> launched worker or maybe WaitForParallelWorkersToAttach() which can
> wait for all workers to attach and report how many have successfully
> attached.   It will have functionality of
> WaitForBackgroundWorkerStartup and additionally it needs to check if
> the worker is attached to the error queue.  We already have similar
> API (WaitForReplicationWorkerAttach) for logical replication workers
> as well.  Note that it might have a slight impact on the performance
> because with this you need to wait for the workers to startup before
> doing any actual work, but I don't think it should be noticeable for
> large operations especially for operations like parallel create index.

Actually, though it doesn't really look like it from the way things
are structured within nbtsort.c, I don't need to wait for workers to
start up (call the WaitForParallelWorkerToAttach() function you
sketched) before doing any real work within the leader. The leader can
participate as a worker, and only do this check afterwards. That will
work because the leader Tuplesortstate has yet to do any real work.
Nothing stops me from adding a new function to tuplesort, for the
leader, that lets the leader say: "New plan -- you should now expect
this many participants" (leader takes this reliable number from
eventual call to WaitForParallelWorkerToAttach()).

I admit that I had no idea that there is this issue with
nworkers_launched until very recently. But then, that field has
absolutely no comments.

--
Peter Geoghegan



Re: [HACKERS] WIP: Covering + unique indexes.

2018-01-20 Thread Peter Geoghegan
On Wed, Jan 17, 2018 at 12:45 AM, Andrey Borodin  wrote:
> Unfortunately, amcheck_next does not work currently on HEAD (there are 
> problems with AllocSetContextCreate() signature), but I've tested 
> bt_index_check() before, during and after pgbench, on primary and on slave. 
> Also, I've checked bt_index_parent_check() on master.

I fixed that recently. It should be fine now.

-- 
Peter Geoghegan



Re: Use of term hostaddrs for multiple hostaddr values

2018-01-20 Thread Michael Paquier
On Sat, Jan 20, 2018 at 08:30:43PM +0200, Magnus Hagander wrote:
> On Tue, Jan 9, 2018 at 5:34 AM, Michael Paquier 
> wrote:
> These are both clear bugs, and the docs one should definitely be both
> applied and backpatched.
> 
> How much do we care about the error message when it comes to backpatching?
> Maybe we should leave that one to 11 only, to avoid breaking that, as the
> way it's written it's actually less wrong there.
> 
> Thoughts?

Thanks for your input!

Applying the error message portion only on HEAD is a good plan, there is
no point to make the life of translaters unnecessary painful.
--
Michael


signature.asc
Description: PGP signature


Re: Use of term hostaddrs for multiple hostaddr values

2018-01-20 Thread Peter Eisentraut
On 1/20/18 17:39, Michael Paquier wrote:
> On Sat, Jan 20, 2018 at 08:30:43PM +0200, Magnus Hagander wrote:
>> On Tue, Jan 9, 2018 at 5:34 AM, Michael Paquier 
>> wrote:
>> These are both clear bugs, and the docs one should definitely be both
>> applied and backpatched.
>>
>> How much do we care about the error message when it comes to backpatching?
>> Maybe we should leave that one to 11 only, to avoid breaking that, as the
>> way it's written it's actually less wrong there.
>>
>> Thoughts?
> 
> Thanks for your input!
> 
> Applying the error message portion only on HEAD is a good plan, there is
> no point to make the life of translaters unnecessary painful.

I would backpatch both.  The updated error message is arguably easier to
translate.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Configurable file mode mask

2018-01-20 Thread Michael Paquier
On Fri, Jan 19, 2018 at 06:54:23PM -0300, Alvaro Herrera wrote:
> Peter Eisentraut wrote:
> If the only problem is that buildfarm would run tests twice, then I
> think we should just press forward with this regardless of that: it
> seems a chicken-and-egg problem, because buildfarm cannot be upgraded to
> use some new test method if the method doesn't exist yet.  As a
> solution, let's just live with some run duplication for a period until
> the machines are upgraded to a future buildfarm client code.
> 
> If there are other problems, let's see what they are so that we can fix
> them.

The main complain I received about this move of the pg_upgrade tests to
be a TAP one is that they don't support easily cross-major version
upgrades, removing some abilities to what a developer or the builfarm
can actually do. Making this possible would require first some
refactoring of PostgresNode.pm so as a node is aware of the binary paths
it uses to be able to manipulate multiple instances with different
install paths at the same time.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-20 Thread Michael Paquier
On Wed, Jan 17, 2018 at 12:36:31AM -0800, Noah Misch wrote:
> For me, verifiability is the crucial benefit of inline asm.  Anyone with an
> architecture manual can thoroughly review an inline asm implementation.  Given
> intrinsics and __xlc_ver__ conditionals, the same level of review requires
> access to every xlc version.

Okay.

> The most recent patch version is Returned with Feedback.  As a matter of
> procedure, I discourage creating commitfest entries as a tool to solicit new
> patch versions.  If I were the author of a RwF patch, I would dislike finding
> a commitfest entry that I did not create with myself listed as author.

Per my understanding, this is a bug, and we don't want to lose track of
them.

> If you do choose to proceed, the entry should be Waiting on Author.

Right.

> Note that fixing this bug is just the start of accepting XLC 13.1 as a
> compiler of PostgreSQL.  If we get a buildfarm member with a few dozen clean
> runs (blocked by, at a minimum, fixing this and the inlining bug), we'll have
> something.  Until then, support for XLC 13.1 is an anti-feature.

Per my understanding of this thread, this is a bug. My point is that the
documentation states that AIX is supported from 4.3.3 to 6.1, however
there are no restrictions related to the compiler, hence I would have
thought that the docs imply XLC 13.1 as a supported compiler. And IBM
states that XLC 13.1 is supported from AIX 6.1:
https://www-01.ibm.com/support/docview.wss?uid=swg21326972

True that the docs tell as well to look at the buildfarm animals, which
don't use XLC 13.1 if you don't look at them closely. Perhaps an
explicit mention of the compiler compatibilities in the docs would help
making the support range clear to anybody then. I would expect more
people to look at the docs than the buildfarm internal contents.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-20 Thread Amit Kapila
On Sun, Jan 21, 2018 at 1:39 AM, Peter Geoghegan  wrote:
> On Fri, Jan 19, 2018 at 9:29 PM, Amit Kapila  wrote:
>
> Actually, though it doesn't really look like it from the way things
> are structured within nbtsort.c, I don't need to wait for workers to
> start up (call the WaitForParallelWorkerToAttach() function you
> sketched) before doing any real work within the leader. The leader can
> participate as a worker, and only do this check afterwards. That will
> work because the leader Tuplesortstate has yet to do any real work.
> Nothing stops me from adding a new function to tuplesort, for the
> leader, that lets the leader say: "New plan -- you should now expect
> this many participants" (leader takes this reliable number from
> eventual call to WaitForParallelWorkerToAttach()).
>
> I admit that I had no idea that there is this issue with
> nworkers_launched until very recently. But then, that field has
> absolutely no comments.
>

It would have been better if there were some comments besides that
field, but I think it has been covered at another place in the code.
See comments in LaunchParallelWorkers().

/*
* Start workers.
*
* The caller must be able to tolerate ending up with fewer workers than
* expected, so there is no need to throw an error here if registration
* fails.  It wouldn't help much anyway, because registering the worker in
* no way guarantees that it will start up and initialize successfully.
*/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [Sender Address Forgery]Re: pg_(total_)relation_size and partitioned tables

2018-01-20 Thread David Rowley
On 20 January 2018 at 23:14, Michael Paquier  wrote:
> If pg_partition_tree_tables() uses the top of the partition as input,
> all the tree should show up. If you use a leaf, anything under the leaf
> should show up. If a leaf is defined and it has no underlying leaves,
> then only this outmost leaf should be listed.

hmm, that's thoroughly confusing. Just in case anyone else is stuck on
that, I just need to mention that a leaf is the does not have
branches, in nature or computer science.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] UPDATE of partition key

2018-01-20 Thread Tom Lane
Robert Haas  writes:
> Committed with a bunch of mostly-cosmetic revisions.

Buildfarm member skink has been unhappy since this patch went in.
Running the regression tests under valgrind easily reproduces the
failure.  Now, I might be wrong about which of the patches committed
on Friday caused the unhappiness, but the valgrind backtrace sure
looks like it's to do with partition routing:

==00:00:05:49.683 17549== Invalid read of size 4
==00:00:05:49.683 17549==at 0x62A8BA: ExecCleanupTupleRouting 
(execPartition.c:483)
==00:00:05:49.683 17549==by 0x6483AA: ExecEndModifyTable 
(nodeModifyTable.c:2682)
==00:00:05:49.683 17549==by 0x627139: standard_ExecutorEnd (execMain.c:1604)
==00:00:05:49.683 17549==by 0x7780AF: ProcessQuery (pquery.c:206)
==00:00:05:49.683 17549==by 0x7782E4: PortalRunMulti (pquery.c:1286)
==00:00:05:49.683 17549==by 0x778AAF: PortalRun (pquery.c:799)
==00:00:05:49.683 17549==by 0x774E4C: exec_simple_query (postgres.c:1120)
==00:00:05:49.683 17549==by 0x776C17: PostgresMain (postgres.c:4143)
==00:00:05:49.683 17549==by 0x6FA419: PostmasterMain (postmaster.c:4412)
==00:00:05:49.683 17549==by 0x66E51F: main (main.c:228)
==00:00:05:49.683 17549==  Address 0xe25e298 is 2,088 bytes inside a block of 
size 32,768 alloc'd
==00:00:05:49.683 17549==at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==00:00:05:49.683 17549==by 0x89EB15: AllocSetAlloc (aset.c:945)
==00:00:05:49.683 17549==by 0x8A7577: palloc (mcxt.c:848)
==00:00:05:49.683 17549==by 0x671969: new_list (list.c:68)
==00:00:05:49.683 17549==by 0x672859: lappend_oid (list.c:169)
==00:00:05:49.683 17549==by 0x55330E: find_inheritance_children 
(pg_inherits.c:144)
==00:00:05:49.683 17549==by 0x553447: find_all_inheritors 
(pg_inherits.c:203)
==00:00:05:49.683 17549==by 0x62AC76: ExecSetupPartitionTupleRouting 
(execPartition.c:68)
==00:00:05:49.683 17549==by 0x64949D: ExecInitModifyTable 
(nodeModifyTable.c:2232)
==00:00:05:49.683 17549==by 0x62BBE8: ExecInitNode (execProcnode.c:174)
==00:00:05:49.683 17549==by 0x627B53: standard_ExecutorStart 
(execMain.c:1043)
==00:00:05:49.683 17549==by 0x778046: ProcessQuery (pquery.c:156)

(This is my local result, but skink's log looks about the same.)

regards, tom lane



Re: MCV lists for highly skewed distributions

2018-01-20 Thread John Naylor
I wrote:

>> I have a slight reservaton about whether 1.25x is still a sensible
>> heuristic.
>
> This was also discussed in [1], but no patch came out of it. I was
> just now turning the formulas discussed there into code, but I'll
> defer to someone with more expertise. FWIW, I suspect that a solution
> that doesn't take into account a metric like coefficient of variation
> will have the wrong behavior sometimes, whether for highly uniform or
> highly non-uniform distributions.

I spent a few hours hacking on this, and it turns out calculating the
right number of MCVs taking into account both uniform and highly
non-uniform distributions is too delicate a problem for me to solve
right now. The logic suggested by Dean Rasheed in [1] always produces
no MCVs for a perfectly uniform distribution (which is good), but very
often also for other distributions, which is not good. My efforts to
tweak that didn't work, so I didn't get as far as adapting it for the
problem Jeff is trying to solve.

I have not been able to come up with a more compelling alternative, so
I have nothing further to say about the patch under review.

> [1]
> https://www.postgresql.org/message-id/flat/32261.1496611829%40sss.pgh.pa.us#32261.1496611...@sss.pgh.pa.us

-John Naylor