date:20170921

Re: [HACKERS] analyzeCTE is too strict about typmods?

2017-09-21 Thread Kyotaro HORIGUCHI

At Thu, 03 Aug 2017 19:29:40 -0400, Tom Lane  wrote in 
<28993.1501802...@sss.pgh.pa.us>
> I wrote:
> > In short, therefore, it's looking to me like analyzeCTE() is wrong here.
> > It should allow the case where the recursive result has typmod -1 while
> > the non-recursive output column has some more-specific typmod, so long
> > as they match on type OID.  That would correspond to what we do in
> > regular non-recursive UNION situations.
> 
> Oh, scratch that.  I was thinking that we could simply relax the error
> check, but that really doesn't work at all.  The problem is that by now,
> we have probably already generated Vars referencing the outputs of the
> recursive CTE, and those will have the more-specific typmod, which is
> wrong in this scenario.  Usually that wouldn't matter too much, but
> there are cases where it would matter.
> 
> We could imagine dealing with this by re-parse-analyzing the recursive
> term using typmod -1 for the CTE output column, but I don't much want
> to go there.  It wouldn't be cheap, and I'm not quite sure it's guaranteed
> to converge anyway.

Agreed.

> What's seeming like an attractive compromise is to change the HINT
> to recommend manually coercing the recursive term, instead of the
> non-recursive one.  Adjusting the error cursor to point to that side
> might be a bit painful, but it's probably doable.
> 
> Thoughts?

I agree to the direction, but if I'm not missing anything
transformSetOperationTree has the enough information and we won't
get the expected pain there. (The movement itself might be the
pain, though..)

| ERROR:  recursive query "foo" column 1 has type numeric(7,3) in non-recursive 
term but type numeric overall
| LINE 4: select f1+1  from foo
|^
| HINT:  Cast the output of the recursive term to the type of the non-recursive 
term.

# The hint gets a bit longer..

By the way a wrong collation still results in the previous hint
but it won't be a problem.

| ERROR:  recursive query "foo" column 1 has collation "it_IT" in non-recursive 
term but collation "ja_JP" overall
| LINE 2: select a from bar
|^
| HINT:  Use the COLLATE clause to set the collation of the non-recursive term.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/parser/analyze.c
--- b/src/backend/parser/analyze.c
***
*** 42,47 
--- 42,49 
  #include "parser/parse_target.h"
  #include "parser/parsetree.h"
  #include "rewrite/rewriteManip.h"
+ #include "utils/builtins.h"
+ #include "utils/lsyscache.h"
  #include "utils/rel.h"
  
  
***
*** 1796,1801  transformSetOperationTree(ParseState *pstate, SelectStmt *stmt,
--- 1798,1804 
  		  bool isTopLevel, List **targetlist)
  {
  	bool		isLeaf;
+ 	int			varattno;
  
  	Assert(stmt && IsA(stmt, SelectStmt));
  
***
*** 1980,1985  transformSetOperationTree(ParseState *pstate, SelectStmt *stmt,
--- 1983,1989 
  		op->colTypmods = NIL;
  		op->colCollations = NIL;
  		op->groupClauses = NIL;
+ 		varattno = 0;
  		forboth(ltl, ltargetlist, rtl, rtargetlist)
  		{
  			TargetEntry *ltle = (TargetEntry *) lfirst(ltl);
***
*** 2008,2013  transformSetOperationTree(ParseState *pstate, SelectStmt *stmt,
--- 2012,2019 
  			else
  rescoltypmod = -1;
  
+ 			varattno++;
+ 
  			/*
  			 * Verify the coercions are actually possible.  If not, we'd fail
  			 * later anyway, but we want to fail now while we have sufficient
***
*** 2110,2115  transformSetOperationTree(ParseState *pstate, SelectStmt *stmt,
--- 2116,2161 
  			}
  
  			/*
+ 			 * Verify that the previously determined output column types and
+ 			 * collations match what the query really produced.  We have to
+ 			 * check this because the recursive term could have overridden the
+ 			 * non-recursive term, and we don't have any easy way to fix that.
+ 			 */
+ 			if (isTopLevel &&
+ pstate->p_parent_cte &&
+ pstate->p_parent_cte->cterecursive)
+ 			{
+ Oid	lcolcoll = exprCollation((Node *)ltle->expr);
+ 
+ /*
+  * This might somewhat confusing but we suggest to fix
+  * recursive term since non-recursive term may have the same
+  * type without typemod.
+  */
+ if (rescoltype != lcoltype || rescoltypmod != lcoltypmod)
+ 	ereport(ERROR,
+ 			(errcode(ERRCODE_DATATYPE_MISMATCH),
+ 			 errmsg("recursive query \"%s\" column %d has type %s in non-recursive term but type %s overall",
+ 	pstate->p_parent_cte->ctename, varattno,
+ 	format_type_with_typemod(lcoltype,
+ 			 lcoltypmod),
+ 	format_type_with_typemod(rescoltype,
+ 			 rescoltypmod)),
+ 			 errhint("Cast the output of the recursive term to the type of the non-recursive term."),
+ 			 parser_errposition(pstate,
+ exprLocation((Node *)rtle->expr;
+ if (rescolcoll != lcolcoll)
+ 	ereport(ERROR,
+

Re: [HACKERS] Generate wait event list and docs from text file

2017-09-21 Thread Schneider

On Mon, Aug 14, 2017 at 11:58 PM, Michael Paquier
 wrote:
> 1) It is easy to add a wait event and...
> 2) It is easy to not update its documentation.
> 3) the documentation tables get easily broken.
>
> As I have participated in the situation present now, I propose to
> write a patch to make the maintenance of the whole thing easier,
> because wait events are important, and because I see more of them
> added in the future.

Seems an apt occasion to point out that 10rc1 is missing documentation
for a couple events.

>From src/backend/storage/lmgr/lwlocknames.txt: CLogTruncationLock was
added to the
docs but BackendRandomLock and LogicalRepWorkerLock are missing. (Maybe there's
a reason for this... I just checked the diffs from 9.6 to 10 then
checked the docs for
completeness.)

> Using this set of meta-data, it is possible to generate the SGMO
> tables and the set of include tables.

The lock-related events might present a challenge here since they come
from a different
place.  In particular, there is no single location to find the
descriptions of tranche locks - you
have to search through the source code and find the
LWLockRegisterTranche() call to see
what text it used for the lock name!  (Consolidating that seems like a
great candidate for a
patch...)

-Jeremy

P.S. The number of wait events has gone from 69 in 9.6 up to 184 in 10rc1.  IMHO
this is very much worth mentioning as a new feature for postgresql 10!
 I counted
67 new I/O related events, 31 new latch-related events and 8 new
replication-related
events!  Seems like a big deal to me.  :)

-- 
http://about.me/jeremy_schneider

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

2017-09-21 Thread Noah Misch

On Thu, Sep 21, 2017 at 05:38:13PM -0400, Tom Lane wrote:
> I wrote:
> > Noah Misch  writes:
> >> Perhaps it is time to require HAVE_WCSTOMBS and HAVE_TOWLOWER, removing
> >> USE_WIDE_UPPER_LOWER?  Every buildfarm fossil has both.
> 
> > +1 ... if nothing else, there's the problem that untested code is likely
> > to be broken.  You just proved it *is* broken, of course, but my point
> > is that even if we repaired the immediate damage we could have little
> > confidence in it staying fixed.

That would be easy enough to ensure by adding ac_cv_func_towlower=no to some
buildfarm animal.  But the real-world need for said code is clearly dead and
gone.  Good riddance.

> Meanwhile, I see that Peter has posted a fix for the immediate problem.
> I propose that Peter should apply his fix in HEAD and v10, and then
> I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only.

That works for me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-09-21 Thread Michael Paquier

On Fri, Sep 22, 2017 at 7:26 AM, Tom Lane  wrote:
> "Bossart, Nathan"  writes:
>> [ 0001-error_on_duplicate_columns_in_analyze_v6.patch ]
>
> I've pushed (and back-patched) the 0001 patch, with some significant
> changes:

Thanks. Arrived here too late to give feedback on the proposed patch.

> I'll take a look at the updated 0002 tomorrow, but if anyone living in
> a different timezone wants to review it meanwhile, feel free.

Here is some feedback. 0002 fails to apply after 0001 has been
committed in its regression tests, that can easily be resolved.

A couple of tests could be removed:
+VACUUM (FREEZE, ANALYZE) vactst (i);
+VACUUM (FREEZE, ANALYZE) vactst (i, i, i);
+VACUUM vactst (i);

 void
-vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
-  List *va_cols, BufferAccessStrategy bstrategy, bool isTopLevel)
+vacuum(int options, VacuumRelation *relation, VacuumParams *params,
+  BufferAccessStrategy bstrategy, bool isTopLevel)
[...]
-   relations = get_rel_oids(relid, relation);
+   relations = get_rel_oids(relation);
I still think that ExecVacuum() should pass a list of VacuumRelation
objects to vacuum(), and get_rel_oids() should take in input a list,
and return a completed lists. This way the decision-making of doing
everything in the same transaction should happens once in vacuum().
And actually, if several relations are defined with VACUUM, your patch
would not use one transaction per table as use_own_xacts would be set
to false. I think that Tom meant that relations whose processed has
finished have to be committed immediately. Per your patch, the commit
happens once all relations are committed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make new crash restart test a bit more robust.

2017-09-21 Thread Thomas Munro

 On Wed, Sep 20, 2017 at 4:42 PM, Andres Freund  wrote:
> On 2017-09-19 19:00:38 -0700, Andres Freund wrote:
>> Given this fact pattern, I'll allow the case without a received error
>> message in the recovery test. Objections?
>
> Hearing none. Pushed.
>
> While debugging this, I've also introduced a pump wrapper so that we now
> get:
> ok 4 - exactly one process killed with SIGQUIT
> # aborting wait: program died
> # stream contents: >>psql::9: WARNING:  terminating connection because 
> of crash of another server process
> # DETAIL:  The postmaster has commanded this server process to roll back the 
> current transaction and exit, because another server process exited 
> abnormally and possibly corrupted shared memory.
> # HINT:  In a moment you should be able to reconnect to the database and 
> repeat your command.
> # psql::9: server closed the connection unexpectedly
> #   This probably means the server terminated abnormally
> #   before or while processing the request.
> # psql::9: connection to server was lost
> # <<
> # pattern searched for: (?^m:MAYDAY:  terminating connection because of crash 
> of another server process)
> not ok 5 - psql query died successfully after SIGQUIT

Seeing these failures in 013_crash_restart.pl from time to time on
Travis CI.  Examples:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/278419122
https://travis-ci.org/postgresql-cfbot/postgresql/builds/278247756

There are a couple of other weird problems in the TAP test that
probably belong on another thread (see build IDs 278302509 and
278247756 which are for different CF patches but exhibit the same
symptom: some test never returns control but we can't see its output,
maybe due to -Otarget, before the whole job is nuked by Travis for not
making progress).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-21 Thread Masahiko Sawada

On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier
 wrote:
>
> Yep, but the deficiency is caused by the use before_shmem_exit() in
> the SQL-level functions present in 9.6~ which make the cleanup happen
> potentially twice. If you look at the code of basebackup.c, you would
> notice that the cleanups of the counters only happen within the
> PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be
> enough.

Thank you for explaining. I understood and agreed..

On Fri, Sep 22, 2017 at 8:02 AM, Michael Paquier
 wrote:
> On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier
>  wrote:
>> On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada  
>> wrote:
>>> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
>>>  wrote:
 +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
 +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
 +   XLogCtl->Insert.nonExclusiveBackups--;
 Hm, no, I don't agree. I think that instead you should just leave
 do_pg_abort_backup() immediately if sessionBackupState is set to
 SESSION_BACKUP_NONE. This variable is the link between the global
 counters and the session stopping the backup so I don't think that we
 should touch this assertion of this counter. I think that this method
 would be safe as well for backup start as pg_start_backup_callback
 takes care of any cleanup. Also because the counters are incremented
 before entering in the PG_ENSURE_ERROR_CLEANUP block, and
 sessionBackupState is updated just after leaving the block.
>>>
>>> I think that the assertion failure still can happen if the process
>>> aborts after decremented the counter and before setting to
>>> SESSION_BACKUP_NONE. Am I missing something?
>>
>> The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
>> the cleanup at this moment. So this happens when waiting for the
>> archives to be done, and the session flag is set to NONE at this
>> point.
>
> And actually, with two non-exclusive backups taken in parallel, it is
> still possible to fail on another assertion in do_pg_stop_backup()
> even with your patch. Imagine the following:
> 1) Two backups are taken, counter for non-exclusive backups is set at 2.
> 2) One backup is stopped, then interrupted. This causes the counter to
> be decremented twice, once in do_pg_stop_backup, and once when
> aborting. Counter is at 0, switching as well forcePageWrites to
> false..
> 3) The second backup stops, a new assertion failure is triggered.
> Without assertions the counter would get at -1.

You're right. I updated the patch so that it exits do_pg_abort_backup
if the state is NONE and setting the state to NONE in
do_pg_stop_backup before releasing the WAL insert lock.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_do_pg_abort_backup_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-09-21 Thread Kyotaro HORIGUCHI

At Thu, 21 Sep 2017 20:35:01 -0400, Robert Haas  wrote 
in 
> On Thu, Sep 21, 2017 at 8:16 PM, Kyotaro HORIGUCHI
>  wrote:
> > Though I don't see it's bug and agree that the message is not
> > proper, currently we can create hash indexes without no warning
> > on unlogged tables and it causes a problem with replication.
> 
> That's true, but I don't believe it's a sufficient reason to make a change.
> 
> Before 84aa8ba128a08e6fdebb2497c7a79ebf18093e12 (2014), we didn't
> issue a warning about hash indexes in any case whatsoever; we relied
> on people reading the documentation to find out about the limitations
> of hash indexes.  They can still do that in any cases that the warning
> doesn't adequately cover.  I really don't think it's worth kibitzing
> the cases where this message is emitted in released branches, or the
> text of the message, just as we didn't back-patch the message itself
> into older releases that are still supported.  We need a compelling
> reason to change things in stable branches, and the fact that a
> warning message added in 2014 doesn't cover every limitation of a
> pre-1996 hash index implementation is not an emergency.  Let's save
> back-patching for actual bugs, or we'll forever be fiddling with
> things in stable branches that would be better left alone.

Sorry for annoying you and thank you. I agree with that after
just knowing the reason is not precisely (3) (we already have
WARNING for the problematic ops).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-21 Thread Michael Paquier

On Fri, Sep 22, 2017 at 10:41 AM, Craig Ringer  wrote:
> Another one to watch out for is that elog(...) and ereport(...) invoke
> CHECK_FOR_INTERRUPTS. That's given me exciting surprises before when
> combined with assertion checking and various exit cleanup hooks.

Ahah. Good point. In this case LWLockWaitForVar() is one thing to
worry about if LWLock tracing is enabled because it can log things
before holding the existing interrupts. This can be avoided easily in
the case of this issue by updating sessionBackupState before calling
WALInsertLockRelease and while holding the WAL insert lock. Surely
this meritates a comment in the patch we want here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows warnings from VS 2017

2017-09-21 Thread Andres Freund

On 2017-09-21 09:30:31 -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > The speed of memset is hardly going to be the dominating factor in a
> > 'CREATE DATABASE' command, so we could certainly afford to change to
> > plain memset calls here.
> 
> Another thought is that it may be time for our decennial debate about
> whether MemSet is worth the electrons it's printed on.  I continue to
> think that any modern compiler+libc ought to do an equivalent or better
> optimization given just a plain memset().  If that seems to be true
> for recent MSVC, we could consider putting an #if into c.h to define
> MemSet as just memset for MSVC.  Maybe later that could be extended
> to other compilers.

+many. glibc's memset is nearly an order of magnitude faster than MemSet
on my machine for medium+ length copies, and still 3-4 four times ~100
bytes. And both gcc and clang optimize way the memcpy entirely when the
length is a fixed short length.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-21 Thread Craig Ringer

On 21 September 2017 at 16:56, Michael Paquier 
wrote:

> On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada 
> wrote:
> > On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
> >  wrote:
> >> On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada 
> wrote:
> >>> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
> >>> backup has been introduced, so we should back-patch to the all
> >>> supported versions.
> >>
> >> There is a typo here right? Non-exclusive backups have been introduced
> >> in 9.6. Why would a back-patch further down be needed?
> >
> > I think the non-exclusive backups infrastructure has been introduced
> > in 9.1 for pg_basebackup. I've not checked yet that it can be
> > reproduced using pg_basebackup in PG9.1 but I thought it could happen
> > as far as I checked the code.
>
> Yep, but the deficiency is caused by the use before_shmem_exit() in
> the SQL-level functions present in 9.6~ which make the cleanup happen
> potentially twice. If you look at the code of basebackup.c, you would
> notice that the cleanups of the counters only happen within the
> PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be
> enough.
>
> >> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
> >> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
> >> +   XLogCtl->Insert.nonExclusiveBackups--;
> >> Hm, no, I don't agree. I think that instead you should just leave
> >> do_pg_abort_backup() immediately if sessionBackupState is set to
> >> SESSION_BACKUP_NONE. This variable is the link between the global
> >> counters and the session stopping the backup so I don't think that we
> >> should touch this assertion of this counter. I think that this method
> >> would be safe as well for backup start as pg_start_backup_callback
> >> takes care of any cleanup. Also because the counters are incremented
> >> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
> >> sessionBackupState is updated just after leaving the block.
> >
> > I think that the assertion failure still can happen if the process
> > aborts after decremented the counter and before setting to
> > SESSION_BACKUP_NONE. Am I missing something?
>
> The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
> the cleanup at this moment. So this happens when waiting for the
> archives to be done, and the session flag is set to NONE at this
> point.
>

Another one to watch out for is that elog(...) and ereport(...) invoke
CHECK_FOR_INTERRUPTS. That's given me exciting surprises before when
combined with assertion checking and various exit cleanup hooks.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Windows warnings from VS 2017

2017-09-21 Thread Michael Paquier

On Thu, Sep 21, 2017 at 10:30 PM, Tom Lane  wrote:
> Another thought is that it may be time for our decennial debate about
> whether MemSet is worth the electrons it's printed on.  I continue to
> think that any modern compiler+libc ought to do an equivalent or better
> optimization given just a plain memset().  If that seems to be true
> for recent MSVC, we could consider putting an #if into c.h to define
> MemSet as just memset for MSVC.  Maybe later that could be extended
> to other compilers.

+1. Things change quickly in IT.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-21 Thread Peter Geoghegan

On Tue, Sep 19, 2017 at 7:01 PM, Peter Geoghegan  wrote:
> I really think we need to add some kind of debug mode that makes ICU
> optionally spit out a locale display name at key points. We need this
> to gain confidence that the behavior that ICU provides actually
> matches what is expected across ICU different versions for different
> locale formats. We talked about this as a user-facing feature before,
> which can wait till v11; I just want this to debug problems like this
> one.

I patched CREATE COLLATION to show ICU display name, which produces
output like this:

postgres=# create collation basque (provider=icu,
locale='eu-u-kf-upper-kr-latn-digit-em-emoji-kn-true-co-eor');
NOTICE:  0: ICU collation description is "Basque
(colcasefirst=upper, Sort Order=European Ordering Rules,
colnumeric=yes, colreorder=latn-digit, em=emoji)"
CREATE COLLATION

I used an ISO 639-1 language code (2 letter language code) above,
which, as we can see, is recognized as Basque. ICU is also fine with
the 3 letter 639-2 code "eus-", recognizing that as Basque, too. If I
use an ISO 639-2 code for Basque that ICU/CLDR doesn't like, "baq-*",
I can see that my expectations have only partially been met, since the
notice doesn't say anything about the language Basque:

postgres=# create collation actually_not_basque (provider=icu,
locale='baq-u-kf-upper-kr-latn-digit-em-emoji-kn-true-co-eor');
NOTICE:  0: ICU collation description is "baq (colcasefirst=upper,
Sort Order=European Ordering Rules, colnumeric=yes,
colreorder=latn-digit, em=emoji)"
CREATE COLLATION

Functionality like this is starting to look essential to me, rather
than just a nice to have. Having this NOTICE would have made me
realize our problems with ICU versions < 54 much earlier, if nothing
else. If the purpose of NOTICE messages is to "Provide[s] information
that might be helpful to users", then I'd say that this definitely
qualifies. And, the extra code is trivial (we already get display name
in the context of initdb). I strongly recommend that we slip this into
v10, as part of fixing the problem with language tags that earlier ICU
versions have.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-09-21 Thread Robert Haas

On Thu, Sep 21, 2017 at 8:16 PM, Kyotaro HORIGUCHI
 wrote:
> Though I don't see it's bug and agree that the message is not
> proper, currently we can create hash indexes without no warning
> on unlogged tables and it causes a problem with replication.

That's true, but I don't believe it's a sufficient reason to make a change.

Before 84aa8ba128a08e6fdebb2497c7a79ebf18093e12 (2014), we didn't
issue a warning about hash indexes in any case whatsoever; we relied
on people reading the documentation to find out about the limitations
of hash indexes.  They can still do that in any cases that the warning
doesn't adequately cover.  I really don't think it's worth kibitzing
the cases where this message is emitted in released branches, or the
text of the message, just as we didn't back-patch the message itself
into older releases that are still supported.  We need a compelling
reason to change things in stable branches, and the fact that a
warning message added in 2014 doesn't cover every limitation of a
pre-1996 hash index implementation is not an emergency.  Let's save
back-patching for actual bugs, or we'll forever be fiddling with
things in stable branches that would be better left alone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-09-21 Thread Kyotaro HORIGUCHI

At Thu, 21 Sep 2017 16:19:27 -0400, Robert Haas  wrote 
in <694cb417-ef2c-4760-863b-aec4530c2...@gmail.com>
> On Sep 21, 2017, at 8:59 AM, Amit Kapila  wrote:.
> > I think giving an error message like "hash indexes are not WAL-logged
> > and .." for unlogged tables doesn't seem like a good behavior.
> 
> +1. This seems like deliberate behavior, not a bug.

Though I don't see it's bug and agree that the message is not
proper, currently we can create hash indexes without no warning
on unlogged tables and it causes a problem with replication.

The point here is that if we leave the behavior (failure on the
standby) for the reason that we see a warning on index creation,
a similar messages ought to be for unlogged tables.

Otherwise, our decision will be another option.

(4) Though we won't not see a warning on hash index creation on
  unlogged tables, it seems to have been a problem and won't
  mind.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visual studio 2017 build support

2017-09-21 Thread Haribabu Kommi

On Fri, Sep 22, 2017 at 7:03 AM, Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
>
> On 08/25/2017 11:29 PM, Haribabu Kommi wrote:
> >
> >
> > On Fri, Aug 25, 2017 at 11:27 PM, Christian Ullrich
> > > wrote:
> >
> > * On 2017-06-21 02:06, Haribabu Kommi wrote:
> >
> > Thanks for the review. Here I attached an updated patch with
> > README update.
> >
> >
> > Hello,
> >
> >
> > Thanks for the review.
> >
> >
> > the most recent update to VS 2017, version 15.3, now identifies as
> > "14.11" rather than "14.10" in the output of nmake /?. Simply
> > adding this value to the two places that check for 14.10 in your
> > patch appears to work for me.
> >
> >
> > VS 2017 doesn't change the nmake version to 15, and it is updating
> > with every minor version, so I changed the check to accept
> > everything that is greater than 14.10 and eq 15, in case in future if
> > VS 2017 changes the version number.
> >
> >
> > In a newly created project, PlatformToolset is still "v141".
> > ToolsVersion is "15.0" whereas your patch uses "14.1".
> >
> > ISTM that the ToolsVersion has been like this in all versions of
> > VS 2017; in my collection of .vcxproj files the auto-generated
> > PostgreSQL projects are the only ones using "14.1".
> >
> >
> > Updated the Tools version to 15.0 and kept the platform toolset as
> > V141, this because the toolset is version is still points
> > to V141, when I create a sample project with VS 2017 and the version
> > number is inline with nmake version also.
> >
> >
> >
>
>
> I was about to commit this after a good bit of testing when I noticed this:
>
> +   Building with Visual Studio 2017 is
> supported
> +   down to Windows 7 SP1 and Windows
> Server 2012 R2.
>
> I was able to build on Windows Server 2008 without a problem, so I'm
> curious why we are saying it's not supported.
>

Thanks for the review.

>From the visual studio system requirements [1], in the section of supported
operating systems, it is mentioned as windows 7 SP1 and windows server
2012 R2 and didn't mentioned anything about 2008, because of this reason,
I mentioned as that it supported till the above operating systems. As I
don't
have windows server 2008 system availability, so I didn't verify the same.

The visual studio 2017 product itself is not mentioned as that it supports
windows server 2008, can we go ahead and mention it in our documentation?

[1] -
https://www.visualstudio.com/en-us/productinfo/vs2017-system-requirements-vs

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] compress method for spgist - 2

2017-09-21 Thread Nikita Glukhov

On 21.09.2017 02:27, Alexander Korotkov wrote:

On Thu, Sep 21, 2017 at 2:06 AM, Darafei "Komяpa" Praliaskouski 
> wrote:

It is possible for bbox->low.x to be NaN when circle->center.x
is and
circle->radius are both +Infinity.

What is rationale behind this circle?

I would prefer to rather forbid any geometries with infs and nans.  
However, then upgrade process will suffer.  User with such geometries 
would get errors during dump/restore, pg_upgraded instances would 
still contain invalid values...

It seems to me that any circle with radius of any Infinity should
become a [-Infinity .. Infinity, -Infinity .. Infinity] box.Then
you won't have NaNs, and index structure shouldn't be broken.

We probably should produce [-Infinity .. Infinity, -Infinity .. 
Infinity] box for any geometry containing inf or nan.  That MBR would 
be founded for any query, saying: "index can't help you for this kind 
value, only recheck can deal with that".  Therefore, we would at least 
guarantee that results of sequential scan and index scan are the same.

I have looked at the GiST KNN code and found the same errors for NaNs,
infinities and NULLs as in my SP-GiST KNN patch.

Attached two patches:

1. Fix NaN-inconsistencies in circle's bounding boxes computed in GiST 
compress

method leading to the failed Assert(box->low.x <= box->high.x) in
computeDistance() from src/backend/access/gist/gistproc.c by 
transforming NaNs

into infinities (corresponding test case provided in the second patch).

2. Fix ordering of NULL, NaN and infinite distances by GiST.  This distance
values could be mixed because NULL distances were transformed into 
infinities,
and there was no special processing for NaNs in KNN queue's comparison 
function.

At first I tried just to set recheck flag for NULL distances, but it did not
work for index-only scans because they do not support rechecking. So I had
to add a special flag for NULL distances.

Should I start a separate thread for this issue and add patches to 
commitfest?

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
>From 52ab493cfe1ec6260578054b71f2c48e77d4850a Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Fri, 22 Sep 2017 02:06:48 +0300
Subject: [PATCH 1/2] Fix circle bounding box inconsistency in GiST compress
 method

---
 src/backend/access/gist/gistproc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c
index 08990f5..2699304 100644
--- a/src/backend/access/gist/gistproc.c
+++ b/src/backend/access/gist/gistproc.c
@@ -1149,6 +1149,16 @@ gist_circle_compress(PG_FUNCTION_ARGS)
 		r->high.y = in->center.y + in->radius;
 		r->low.y = in->center.y - in->radius;

+		/* avoid box inconsistency by transforming NaNs into infinities */
+		if (isnan(r->high.x))
+			r->high.x = get_float8_infinity();
+		if (isnan(r->high.y))
+			r->high.y = get_float8_infinity();
+		if (isnan(r->low.x))
+			r->low.x = -get_float8_infinity();
+		if (isnan(r->low.y))
+			r->low.y = -get_float8_infinity();
+
 		retval = (GISTENTRY *) palloc(sizeof(GISTENTRY));
 		gistentryinit(*retval, PointerGetDatum(r),
 	  entry->rel, entry->page,
-- 
2.7.4

>From 066ad9104ec0e967e20e820977286f001e4055a4 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Thu, 21 Sep 2017 19:09:02 +0300
Subject: [PATCH 2/2] Fix GiST ordering by distance for NULLs and NaNs

---
 src/backend/access/gist/gistget.c  | 29 ++---
 src/backend/access/gist/gistscan.c | 36 +++--
 src/include/access/gist_private.h  | 13 ++--
 src/test/regress/expected/gist.out | 64 ++
 src/test/regress/sql/gist.sql  | 60 +++
 5 files changed, 184 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 760ea0c..7fed700 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -132,7 +132,7 @@ gistindex_keytest(IndexScanDesc scan,
 	GISTSTATE  *giststate = so->giststate;
 	ScanKey		key = scan->keyData;
 	int			keySize = scan->numberOfKeys;
-	double	   *distance_p;
+	GISTDistance *distance_p;
 	Relation	r = scan->indexRelation;

 	*recheck_p = false;
@@ -150,7 +150,10 @@ gistindex_keytest(IndexScanDesc scan,
 		if (GistPageIsLeaf(page))	/* shouldn't happen */
 			elog(ERROR, "invalid GiST tuple found on leaf page");
 		for (i = 0; i < scan->numberOfOrderBys; i++)
-			so->distances[i] = -get_float8_infinity();
+		{
+			so->distances[i].value = -get_float8_infinity();
+			so->distances[i].isnull = false;
+		}
 		return true;
 	}

@@ -248,7 +251,8 @@ gistindex_keytest(IndexScanDesc scan,
 		if ((key->sk_flags & SK_ISNULL) || isNull)
 		{
 			/* Assume distance computes as null and sorts to the end */

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-21 Thread Michael Paquier

On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier
 wrote:
> On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada  
> wrote:
>> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
>>  wrote:
>>> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
>>> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
>>> +   XLogCtl->Insert.nonExclusiveBackups--;
>>> Hm, no, I don't agree. I think that instead you should just leave
>>> do_pg_abort_backup() immediately if sessionBackupState is set to
>>> SESSION_BACKUP_NONE. This variable is the link between the global
>>> counters and the session stopping the backup so I don't think that we
>>> should touch this assertion of this counter. I think that this method
>>> would be safe as well for backup start as pg_start_backup_callback
>>> takes care of any cleanup. Also because the counters are incremented
>>> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
>>> sessionBackupState is updated just after leaving the block.
>>
>> I think that the assertion failure still can happen if the process
>> aborts after decremented the counter and before setting to
>> SESSION_BACKUP_NONE. Am I missing something?
>
> The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
> the cleanup at this moment. So this happens when waiting for the
> archives to be done, and the session flag is set to NONE at this
> point.

And actually, with two non-exclusive backups taken in parallel, it is
still possible to fail on another assertion in do_pg_stop_backup()
even with your patch. Imagine the following:
1) Two backups are taken, counter for non-exclusive backups is set at 2.
2) One backup is stopped, then interrupted. This causes the counter to
be decremented twice, once in do_pg_stop_backup, and once when
aborting. Counter is at 0, switching as well forcePageWrites to
false..
3) The second backup stops, a new assertion failure is triggered.
Without assertions the counter would get at -1.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-09-21 Thread Tom Lane

"Bossart, Nathan"  writes:
> [ 0001-error_on_duplicate_columns_in_analyze_v6.patch ]

I've pushed (and back-patched) the 0001 patch, with some significant
changes:

* The list_concat_unique implementation is O(N^2), and seems a bit obscure
as well.  Perhaps there will never be so many column list entries that
the speed is a big issue, but I'm not sure.  I replaced the list with a
bitmap to avoid the risk.

* I did not see the point of having "ANALYZE t (x,x,x,nosuchcolumn)"
complain about nosuchcolumn rather than the duplicated x entries,
especially not if it meant you couldn't say which column was duplicated.
So I folded the test into the primary loop and made the error look
more like what you get for a nonexistent column.

* I didn't like the test case at all: it was repetitive and it actually
failed to exhibit the originally problematic behavior with unpatched
code, which is generally a minimum expectation for a regression test.
(I think that's because you used an empty table, for which ANALYZE will
not try to make any pg_statistic entries.)  So I rewrote that, and made
it use an existing table rather than create a whole new one just for this.

I'll take a look at the updated 0002 tomorrow, but if anyone living in
a different timezone wants to review it meanwhile, feel free.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Commitfest 201709 update

2017-09-21 Thread Daniel Gustafsson

Two thirds of the commitfest has now passed, and while a lot of patches have
been closed, there is still lots to do.  As of now, 48% of the entries are
either in “Needs review” or “Waiting on Author”.  Let’s try to reduce that
number further!

A lot of you authors/reviewers have responded to my nagging/pleading emails
with picking up work, and we are all grateful for your efforts.  Those of you
who are not reviewers in this commitfest, please consider picking up a patch to
review.  PostgreSQL runs on the collective effort put in by the community, and
all contributions are welcome.

Thank you all for your contributions to PostgreSQL!

Sincerely

Daniel Gustafsson
CFM201709

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Fw: [HACKERS] HACKERS[PATCH] split ProcArrayLock into multiple parts -- follow-up

2017-09-21 Thread Jim Van Fleet

> On 2017-09-21 15:51:54 -0500, Jim Van Fleet wrote:
> > Not to beat on a dead horse, or anything, but this fix was frowned 
upon 
> > because in one environment (one socket) it was 6% down and over 15% up 
in 
> > the right environment (two sockets).
> 
> > So, why not add a configuration parameter which specifies the number 
of 
> > parts? Default is 1 which would be "exactly" the same as no parts and 
> > hence no degradation in the single socket environment -- and with 2, 
you 
> > get some positive performance.
> 
> Several reasons:
> 
> - You'd either add a bunch of branches into a performance critical
>   parts, or you'd add a compile time flag, which most people would be
>   unable to toggle.
I agree, no compile time flags -- but no extra testing in the main path -- 
gets set at init and not changed from there.
> - It'd be something hard to tune, because even on multi-socket machines
>   it'll be highly load dependant. E.g. workloads that largely are
>   bottlenecked in a single backend / few backends will probably regress
>   as well.
Workloads are hard to tune -- with the default, you have what you have 
today. If you "know" the issue is ProcArrayLock, then you have an 
alternative to try.
> 
> FWIW, you started a new thread with this message, that doesn't seem
> helpful?

Sorry about that -- my mistake.

Jim

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

2017-09-21 Thread Tom Lane

I wrote:
> Noah Misch  writes:
>> Perhaps it is time to require HAVE_WCSTOMBS and HAVE_TOWLOWER, removing
>> USE_WIDE_UPPER_LOWER?  Every buildfarm fossil has both.

> +1 ... if nothing else, there's the problem that untested code is likely
> to be broken.  You just proved it *is* broken, of course, but my point
> is that even if we repaired the immediate damage we could have little
> confidence in it staying fixed.

Further notes about that:

* The Single Unix Spec v2 (a/k/a POSIX 1997), which has been our minimum
portability spec for quite awhile, requires wcstombs() and towlower(),
and further requires the latter to be declared in .

* Surveying the buildfarm, I agree with your conclusion that every active
member has wcstombs() and towlower().  gaur/pademelon is the lone member
that lacks ; it declares towlower() in  instead.  It's
not so surprising that that system adheres to a pre-1997 idea of where to
put that, because its /usr/include files mostly date from 1996 ...

Meanwhile, I see that Peter has posted a fix for the immediate problem.
I propose that Peter should apply his fix in HEAD and v10, and then
I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visual studio 2017 build support

2017-09-21 Thread Andrew Dunstan



On 08/25/2017 11:29 PM, Haribabu Kommi wrote:
>
>
> On Fri, Aug 25, 2017 at 11:27 PM, Christian Ullrich
> > wrote:
>
> * On 2017-06-21 02:06, Haribabu Kommi wrote:
>
> Thanks for the review. Here I attached an updated patch with
> README update.
>
>
> Hello,
>
>
> Thanks for the review.
>  
>
> the most recent update to VS 2017, version 15.3, now identifies as
> "14.11" rather than "14.10" in the output of nmake /?. Simply
> adding this value to the two places that check for 14.10 in your
> patch appears to work for me.
>
>
> VS 2017 doesn't change the nmake version to 15, and it is updating
> with every minor version, so I changed the check to accept
> everything that is greater than 14.10 and eq 15, in case in future if
> VS 2017 changes the version number.
>  
>
> In a newly created project, PlatformToolset is still "v141".
> ToolsVersion is "15.0" whereas your patch uses "14.1".
>
> ISTM that the ToolsVersion has been like this in all versions of
> VS 2017; in my collection of .vcxproj files the auto-generated
> PostgreSQL projects are the only ones using "14.1".
>
>  
> Updated the Tools version to 15.0 and kept the platform toolset as
> V141, this because the toolset is version is still points
> to V141, when I create a sample project with VS 2017 and the version
> number is inline with nmake version also.
>
>
>


I was about to commit this after a good bit of testing when I noticed this:

+   Building with Visual Studio 2017 is
supported
+   down to Windows 7 SP1 and Windows
Server 2012 R2.

I was able to build on Windows Server 2008 without a problem, so I'm
curious why we are saying it's not supported.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Fw: [HACKERS] HACKERS[PATCH] split ProcArrayLock into multiple parts -- follow-up

2017-09-21 Thread Andres Freund

On 2017-09-21 15:51:54 -0500, Jim Van Fleet wrote:
> Not to beat on a dead horse, or anything, but this fix was frowned upon 
> because in one environment (one socket) it was 6% down and over 15% up in 
> the right environment (two sockets).

> So, why not add a configuration parameter which specifies the number of 
> parts? Default is 1 which would be "exactly" the same as no parts and 
> hence no degradation in the single socket environment -- and with 2, you 
> get some positive performance.

Several reasons:

- You'd either add a bunch of branches into a performance critical
  parts, or you'd add a compile time flag, which most people would be
  unable to toggle.
- It'd be something hard to tune, because even on multi-socket machines
  it'll be highly load dependant. E.g. workloads that largely are
  bottlenecked in a single backend / few backends will probably regress
  as well.

FWIW, you started a new thread with this message, that doesn't seem
helpful?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-21 Thread Peter Geoghegan

On Thu, Sep 21, 2017 at 2:49 AM, Andreas Karlsson  wrote:
> If we are fine with supporting only ICU 4.2 and later (which I think we are
> given that ICU 4.2 was released in 2009) then using uloc_forLanguageTag()[1]
> to validate and canonize seems like the right solution. I had missed that
> this function even existed when I last read the documentation. Does it
> return a BCP 47 tag in modern versions of ICU?

The decision to support ICU >= 4.2 was already made (see commit
eccead9). I have no reason to think that that's a problem for us.

As I've said, we currently use uloc_toLanguageTag() on all supported
ICU versions, to at least create a collation name at initdb time, but
also to get our new collation's colcollate when ICU >= 54. If we now
put a uloc_forLanguageTag() in place before each ucol_open() call,
then we can safely store a BCP 47 format tag as colcollate on *every*
ICU version. We can reconstruct a traditional format locale string
*on-the-fly* within pg_newlocale_from_collation() and
get_collation_actual_version(), which would be what we'd pass to
ucol_open() (we'd never pass "raw colcollate").

To keep things simple, we could always convert colcollate to the
traditional format using uloc_forLanguageTag(), just in case we're on
a version of ICU where ucol_open() doesn't like locales that are
spelled using the BCP 47 format. It doesn't seem worth it to take
advantage of the leniency on ICU >= 54, since that would be a special
case (though we could if we wanted to).

> I strongly prefer if there, as much as possible, is only one format for
> inputting ICU locales.

I agree, but the bigger issue is that we're *half way* between
supporting only one format, and supporting two formats. AFAICT, there
is no reason that we can't simply support one format on all ICU
versions, and keep what ends up within pg_collation at initdb time
essentially the same across ICU versions (except for those that are
due to cultural/political developments).

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fw: [HACKERS] HACKERS[PATCH] split ProcArrayLock into multiple parts -- follow-up

2017-09-21 Thread Jim Van Fleet

Howdy --

Not to beat on a dead horse, or anything, but this fix was frowned upon 
because in one environment (one socket) it was 6% down and over 15% up in 
the right environment (two sockets).

So, why not add a configuration parameter which specifies the number of 
parts? Default is 1 which would be "exactly" the same as no parts and 
hence no degradation in the single socket environment -- and with 2, you 
get some positive performance.

Jim
- Forwarded by Jim Van Fleet/Austin/Contr/IBM on 09/21/2017 03:37 PM 
-

pgsql-hackers-ow...@postgresql.org wrote on 06/09/2017 01:39:35 PM:

> From: "Jim Van Fleet" 
> To: "Pgsql Hackers" 
> Date: 06/09/2017 01:41 PM
> Subject: [HACKERS] HACKERS[PATCH] split ProcArrayLock into multiple 
parts
> Sent by: pgsql-hackers-ow...@postgresql.org
> 
> I left out the retry in LWLockAcquire.
> 
> [attachment "ProcArrayLock_part.patch" deleted by Jim Van Fleet/
> Austin/Contr/IBM] 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-09-21 Thread Robert Haas

On Sep 21, 2017, at 8:59 AM, Amit Kapila  wrote:.
> I think giving an error message like "hash indexes are not WAL-logged
> and .." for unlogged tables doesn't seem like a good behavior.

+1. This seems like deliberate behavior, not a bug.

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Peter Geoghegan

On Wed, Sep 20, 2017 at 11:05 PM, Noah Misch  wrote:
> This is currently a v10 open item, but I think it doesn't qualify for that
> treatment.  It's merely an opportunity for optimization, albeit an
> attractively-simple one.

I have withdrawn this as an open item. I'm still hopeful that it can
be worked through for v10 at Peter's discretion.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Arrays of domains

2017-09-21 Thread Tom Lane

I wrote:
> Here's a rebased-up-to-HEAD version of this patch set.  The only
> actual change is removal of a no-longer-needed hunk in pl_exec.c.

I see the patch tester is complaining that this broke, due to commit
4bd199465.  The fix is trivial (s/GETARG_ANY_ARRAY/GETARG_ANY_ARRAY_P/)
but for convenience here's an updated patch set.

regards, tom lane

diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 9d75e86..d7db32e 100644
*** a/src/backend/optimizer/prep/preptlist.c
--- b/src/backend/optimizer/prep/preptlist.c
*** expand_targetlist(List *tlist, int comma
*** 306,314 
  		new_expr = coerce_to_domain(new_expr,
  	InvalidOid, -1,
  	atttype,
  	COERCE_IMPLICIT_CAST,
  	-1,
- 	false,
  	false);
  	}
  	else
--- 306,314 
  		new_expr = coerce_to_domain(new_expr,
  	InvalidOid, -1,
  	atttype,
+ 	COERCION_IMPLICIT,
  	COERCE_IMPLICIT_CAST,
  	-1,
  	false);
  	}
  	else
diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c
index e79ad26..5a241bd 100644
*** a/src/backend/parser/parse_coerce.c
--- b/src/backend/parser/parse_coerce.c
***
*** 34,48 
  
  static Node *coerce_type_typmod(Node *node,
     Oid targetTypeId, int32 targetTypMod,
!    CoercionForm cformat, int location,
!    bool isExplicit, bool hideInputCoercion);
  static void hide_coercion_node(Node *node);
  static Node *build_coercion_expression(Node *node,
  		  CoercionPathType pathtype,
  		  Oid funcId,
  		  Oid targetTypeId, int32 targetTypMod,
! 		  CoercionForm cformat, int location,
! 		  bool isExplicit);
  static Node *coerce_record_to_complex(ParseState *pstate, Node *node,
  		 Oid targetTypeId,
  		 CoercionContext ccontext,
--- 34,49 
  
  static Node *coerce_type_typmod(Node *node,
     Oid targetTypeId, int32 targetTypMod,
!    CoercionContext ccontext, CoercionForm cformat,
!    int location,
!    bool hideInputCoercion);
  static void hide_coercion_node(Node *node);
  static Node *build_coercion_expression(Node *node,
  		  CoercionPathType pathtype,
  		  Oid funcId,
  		  Oid targetTypeId, int32 targetTypMod,
! 		  CoercionContext ccontext, CoercionForm cformat,
! 		  int location);
  static Node *coerce_record_to_complex(ParseState *pstate, Node *node,
  		 Oid targetTypeId,
  		 CoercionContext ccontext,
*** coerce_to_target_type(ParseState *pstate
*** 110,117 
  	 */
  	result = coerce_type_typmod(result,
  targettype, targettypmod,
! cformat, location,
! (cformat != COERCE_IMPLICIT_CAST),
  (result != expr && !IsA(result, Const)));
  
  	if (expr != origexpr)
--- 111,117 
  	 */
  	result = coerce_type_typmod(result,
  targettype, targettypmod,
! ccontext, cformat, location,
  (result != expr && !IsA(result, Const)));
  
  	if (expr != origexpr)
*** coerce_type(ParseState *pstate, Node *no
*** 355,361 
  			result = coerce_to_domain(result,
  	  baseTypeId, baseTypeMod,
  	  targetTypeId,
! 	  cformat, location, false, false);
  
  		ReleaseSysCache(baseType);
  
--- 355,362 
  			result = coerce_to_domain(result,
  	  baseTypeId, baseTypeMod,
  	  targetTypeId,
! 	  ccontext, cformat, location,
! 	  false);
  
  		ReleaseSysCache(baseType);
  
*** coerce_type(ParseState *pstate, Node *no
*** 417,436 
  
  			result = build_coercion_expression(node, pathtype, funcId,
  			   baseTypeId, baseTypeMod,
! 			   cformat, location,
! 			   (cformat != COERCE_IMPLICIT_CAST));
  
  			/*
  			 * If domain, coerce to the domain type and relabel with domain
! 			 * type ID.  We can skip the internal length-coercion step if the
! 			 * selected coercion function was a type-and-length coercion.
  			 */
  			if (targetTypeId != baseTypeId)
  result = coerce_to_domain(result, baseTypeId, baseTypeMod,
  		  targetTypeId,
! 		  cformat, location, true,
! 		  exprIsLengthCoercion(result,
! 			   NULL));
  		}
  		else
  		{
--- 418,434 
  
  			result = build_coercion_expression(node, pathtype, funcId,
  			   baseTypeId, baseTypeMod,
! 			   ccontext, cformat, location);
  
  			/*
  			 * If domain, coerce to the domain type and relabel with domain
! 			 * type ID, hiding the previous coercion node.
  			 */
  			if (targetTypeId != baseTypeId)
  result = coerce_to_domain(result, baseTypeId, baseTypeMod,
  		  targetTypeId,
! 		  ccontext, cformat, location,
! 		  true);
  		}
  		else
  		{
*** coerce_type(ParseState *pstate, Node *no
*** 444,450 
  			 *

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Peter Geoghegan

On Thu, Sep 21, 2017 at 12:12 PM, Peter Eisentraut
 wrote:
>> Attached patch shows what I'm getting at. This is untested, since I
>> don't use Windows. Proceed with caution.
>
> Your analysis makes sense, but the patch doesn't work, because "locale"
> is never set before reading it.

It was just for illustrative purposes. Seems fine to actually move the
WIN32 block down to just before the existing TRUST_STRXFRM test, since
the locale is going to be cached and then immediately reused where we
return immediately (Windows libc) anyway. This would also be a win for
clarity, IMV.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Peter Eisentraut

On 9/16/17 18:33, Peter Geoghegan wrote:
> In summary, we're currently attaching the use of SortSupport to the
> wrong thing. We're treating this UTF-16 business as something that
> implies a broad OS/platform restriction, when in fact it should be
> treated as implying a restriction for one particular collation
> provider only (a collation provider that happens to be built into
> Windows, but isn't really special to us).
> 
> Attached patch shows what I'm getting at. This is untested, since I
> don't use Windows. Proceed with caution.

Your analysis makes sense, but the patch doesn't work, because "locale"
is never set before reading it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix bool/int type confusion

2017-09-21 Thread Tom Lane

Peter Eisentraut  writes:
> On 9/21/17 11:46, Tom Lane wrote:
>> This also means that the portability problem is purely hypothetical;
>> given valid tz reference data the code wouldn't ever try to increment
>> "hit" to 2 anyway.  It's even more hypothetical for us, because we don't
>> use leap-second-aware zones.

> It's not so much that there is an actual portability problem.  The
> problem is that the compiler issues a warning for this with stdbool.

Understood.

>> Therefore, what I would like to do is revert this commit (0ec2e908b),
>> and then either leave the problem to be fixed by our next regular sync
>> with a released tzcode version, or else force a sync with their git tip
>> to absorb their fix now.  In either case we should keep all our live
>> branches in sync.  I'm willing to do the code-sync work either way.

> That makes sense.  There is no urgency on the stdbool patch set, so
> waiting for this to be resolved properly around November seems reasonable.

I've just been going through their git commit log to see what else has
changed since tzcode2017b, and I note that there are half a dozen other
portability-ish fixes.  I think that some of them affect only code we
don't use, but I'm not sure that that's the case for all.  So I'm a bit
inclined to go with plan B, that is sync with their current HEAD now.
As far as I've been able to tell, they don't have any special code QA
process that they apply before a release.  They push out releases based
on the need to update the tzdata files, and the tzcode just rides along
with that in whatever state it is in.  So there's not really any
reliability gain from waiting for an official code release --- it just
is a bit easier to document what we synced to.  That being the case,
absorbing their changes in smaller chunks rather than bigger ones seems
easier.  I don't have the sync process totally automated, but it's not
that painful as long as there are not too many changes at once.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

2017-09-21 Thread Peter Eisentraut

On 9/21/17 01:29, Noah Misch wrote:
> I checked !USE_WIDE_UPPER_LOWER by configuring v10 as follows:
> 
>   ./configure -C --prefix=$HOME/sw/nopath/pg10 --enable-debug \
>   --enable-cassert --enable-depend --enable-tap-tests --with-libxml \
>   --with-gssapi --with-openssl ac_cv_func_towlower=no
> 
> The result fails to compile:

Yeah, the placement of the ifdef blocks was pretty bogus.  This patch
fixes it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 9e51e4e52ae565503934c97c84ee5a7d9bcb2dcc Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 21 Sep 2017 14:42:10 -0400
Subject: [PATCH] Fix build with !USE_WIDE_UPPER_LOWER

The placement of the ifdef blocks in formatting.c was pretty bogus, so
the code failed to compile if USE_WIDE_UPPER_LOWER was not defined.
---
 src/backend/utils/adt/formatting.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/backend/utils/adt/formatting.c 
b/src/backend/utils/adt/formatting.c
index 46f45f6654..2bf484cda3 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1528,7 +1528,6 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
{
result = asc_tolower(buff, nbytes);
}
-#ifdef USE_WIDE_UPPER_LOWER
else
{
pg_locale_t mylocale = 0;
@@ -1566,6 +1565,7 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
else
 #endif
{
+#ifdef USE_WIDE_UPPER_LOWER
if (pg_database_encoding_max_length() > 1)
{
wchar_t*workspace;
@@ -1603,8 +1603,8 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
wchar2char(result, workspace, result_size, 
mylocale);
pfree(workspace);
}
-#endif /* USE_WIDE_UPPER_LOWER 
*/
else
+#endif /* USE_WIDE_UPPER_LOWER 
*/
{
char   *p;
 
@@ -1652,7 +1652,6 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
{
result = asc_toupper(buff, nbytes);
}
-#ifdef USE_WIDE_UPPER_LOWER
else
{
pg_locale_t mylocale = 0;
@@ -1690,6 +1689,7 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
else
 #endif
{
+#ifdef USE_WIDE_UPPER_LOWER
if (pg_database_encoding_max_length() > 1)
{
wchar_t*workspace;
@@ -1727,8 +1727,8 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
wchar2char(result, workspace, result_size, 
mylocale);
pfree(workspace);
}
-#endif /* USE_WIDE_UPPER_LOWER 
*/
else
+#endif /* USE_WIDE_UPPER_LOWER 
*/
{
char   *p;
 
@@ -1777,7 +1777,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
{
result = asc_initcap(buff, nbytes);
}
-#ifdef USE_WIDE_UPPER_LOWER
else
{
pg_locale_t mylocale = 0;
@@ -1815,6 +1814,7 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
else
 #endif
{
+#ifdef USE_WIDE_UPPER_LOWER
if (pg_database_encoding_max_length() > 1)
{
wchar_t*workspace;
@@ -1864,8 +1864,8 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
wchar2char(result, workspace, result_size, 
mylocale);
pfree(workspace);
}
-#endif /* USE_WIDE_UPPER_LOWER 
*/
else
+#endif /* USE_WIDE_UPPER_LOWER 
*/
{
char   *p;
 
-- 
2.14.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-21 Thread Pavel Stehule

2017-09-21 20:20 GMT+02:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 9/21/17 13:54, Pavel Stehule wrote:
> > I see where you are coming from, but there is no association in the
> > existing UI that equates "+" to the word "verbose".  I think just
> > removing the verbose prefix and applying the sorting behavior in all
> > cases should be easier to explain and implement.
> >
> > I though about it - but I am not sure if one kind of these variables is
> > practical.
> >
> > if I don't need a size, then sort by schema, name is ok (I didn't need
> > any else ever). With only one kind of these variables, this setting is
> > common - what is not practical.
>
> But you are proposing also to add a variable configuring the sort
> direction.  It would be weird that \dX+ observed the sort direction but
> \dX did not.
>

yes and no.

schema_name, name_schema or SORT_DIRECTION has sense for both type of
commands.

size sort has sense only for \dX+ command.

I am thinking about solution and the most clean I see two distinct
variables:

SORT_COLUMNS and VERBOSE_SORT_COLUMNS

when VERBOSE_SORT_COLUMNS will be undefined, then SORT_COLUMNS is used for
\dX+ command too.

Is it acceptable?




> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] [COMMITTERS] pgsql: Fix bool/int type confusion

2017-09-21 Thread Peter Eisentraut

On 9/21/17 11:46, Tom Lane wrote:
> So the code in their git repo still has the variable as bool, but
> there's no ++ operator on it anymore.
> 
> This also means that the portability problem is purely hypothetical;
> given valid tz reference data the code wouldn't ever try to increment
> "hit" to 2 anyway.  It's even more hypothetical for us, because we don't
> use leap-second-aware zones.

It's not so much that there is an actual portability problem.  The
problem is that the compiler issues a warning for this with stdbool.

> Therefore, what I would like to do is revert this commit (0ec2e908b),
> and then either leave the problem to be fixed by our next regular sync
> with a released tzcode version, or else force a sync with their git tip
> to absorb their fix now.  In either case we should keep all our live
> branches in sync.  I'm willing to do the code-sync work either way.

That makes sense.  There is no urgency on the stdbool patch set, so
waiting for this to be resolved properly around November seems reasonable.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-21 Thread Peter Eisentraut

On 9/21/17 13:54, Pavel Stehule wrote:
> I see where you are coming from, but there is no association in the
> existing UI that equates "+" to the word "verbose".  I think just
> removing the verbose prefix and applying the sorting behavior in all
> cases should be easier to explain and implement.
> 
> I though about it - but I am not sure if one kind of these variables is
> practical.
> 
> if I don't need a size, then sort by schema, name is ok (I didn't need
> any else ever). With only one kind of these variables, this setting is
> common - what is not practical.

But you are proposing also to add a variable configuring the sort
direction.  It would be weird that \dX+ observed the sort direction but
\dX did not.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-21 Thread Jeff Janes

On Thu, Sep 21, 2017 at 7:42 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/19/17 20:45, Peter Eisentraut wrote:
> > On 9/19/17 17:55, Jeff Janes wrote:
> >> I guess I'm late to the party, but I don't see why this is needed at
> >> all.  We encourage people to use any and all new features which are
> >> appropriate to them--that is why we implement new features.  Why does
> >> this feature need a special invitation?
> >
> > It's not clear to me how an average user would get from the press
> > release or release notes to upgrading their installation to use
> > SCRAM-based authentication and passwords.  A little bit more guidance
> > somewhere would be helpful.
>
> Here is a patch that expands the SCRAM documentation a bit, adds more
> explanation how the different options are related, and sets some better
> links.  I think now you can get from the release notes to the relevant
> documentation and have enough information on how to put the new features
> into use.
>


This looks good to me.  Might suggest adding verifying the clients as a
specific step:

"To upgrade an existing installation from md5 to scram-sha-256, verify that
all client software supports it, set password_encryption = 'scram-sha-256'
in postgresql.conf..."

Cheers,

Jeff

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-21 Thread Pavel Stehule

2017-09-21 15:30 GMT+02:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 9/21/17 04:27, Pavel Stehule wrote:
> > yes. It was designed for + commands only. Can be enhanced to all
> > commands - then VERBOSE prefix should be removed - not sure if it is
> > necessary. For me interesting different order than default is only in
> > verbose mode.
>
> I see where you are coming from, but there is no association in the
> existing UI that equates "+" to the word "verbose".  I think just
> removing the verbose prefix and applying the sorting behavior in all
> cases should be easier to explain and implement.
>

I though about it - but I am not sure if one kind of these variables is
practical.

if I don't need a size, then sort by schema, name is ok (I didn't need any
else ever). With only one kind of these variables, this setting is common -
what is not practical.

I need sort by size in verbose mode (where size is visible) in 100% - so it
will be saved to psqlrc. And when size will be invisible, then sort by size
is not practical, and can be messy (because size is not visible).

So I don't think so removing VERBOSE prefix is a good idea - or we should
to do different design (have not a idea how)

Regards

Pavel

> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-09-21 Thread Dilip Kumar

On Thu, Sep 21, 2017 at 4:50 PM, Rafia Sabih
 wrote:
> On Sun, Sep 17, 2017 at 9:10 PM, Dilip Kumar  wrote:
>> On Wed, Sep 6, 2017 at 4:14 PM, Rafia Sabih
>>  wrote:
>>
>
> Please find the attached file for the revised version.

Thanks for the updated patch, I have some more comments.

+static shm_mq_handle *local_mq_attach(shm_mq_handle *mqh);
+static uint64 space_in_local(local_mq * lq, Size tuple_size);
+static bool read_local_queue(local_mq * lq, bool shm_mq_full);

local_mq * lq  -> local_mq *lq
same for other places as well.

---

+static uint64 space_in_shm(shm_mq *mq);
+
+static uint64 space_in_local(local_mq * lq, Size tuple_size);

we better use Size here instead if uint64

---

+ available = ringsize - used;
+
+ ringsize = lq->mq_ring_size;
+ writer_offset = lq->mq_bytes_written % ringsize;
+ reader_offset = lq->mq_bytes_read % ringsize;
+
+ if (writer_offset + tuple_size < ringsize && reader_offset < writer_offset)
+ available = (ringsize - writer_offset);

even though there is space in queue but tuple need rotation then we
are not allowing it to
write into the local queue.  If there is some strong reason behind
that, please add comments
to explain the same.
---

+ if (shm_mq_full || (written - read) >= .05 * lq->mq_ring_size)
+ return true;
+
+ else
+ return true;
+}

Seems like you want to return 'false' in the else case.



+ read_offset = lq->mq_bytes_read % lq->mq_ring_size;
+ available = space_in_shm(mqh->mqh_queue);
+
+ /* always read data in the aligned form */
+ to_read = MAXALIGN_DOWN(Min(used, available));
+
+ /*
+ * if the amount of data to be send from local queue involves wrapping of
+ * local queue, then send only the data till the end of queue right now
+ * and rest later.
+ */
+ if (lq->mq_bytes_read % lq->mq_ring_size + to_read > lq->mq_ring_size)

You can directly use "read_offset" instead of recalculating
lq->mq_bytes_read % lq->mq_ring_size.


+ do
+ {
+ if (shm_mq_is_detached(mqh->mqh_queue))
+ return SHM_MQ_DETACHED;
+
+ shm_space = space_in_shm(mqh->mqh_queue);
+
+ /*
+ * cannot send data to shared queue, unless there is required
+ * space, so wait till we get some space, since we cannot
+ * write anymore in local queue as of now
+ */
+ WaitLatch(MyLatch, WL_LATCH_SET, 0, WAIT_EVENT_MQ_SEND);
+
+ /* Reset the latch so we don't spin. */
+ ResetLatch(MyLatch);
+
+ /* An interrupt may have occurred while we were waiting. */
+ CHECK_FOR_INTERRUPTS();
+
+ shm_space = space_in_shm(mqh->mqh_queue);
+
+ if (read_local_queue(lq, true) && shm_space > 0)
+ copy_local_to_shared(lq, mqh, false);
+
+ local_space = space_in_local(lq, tuple_size);
+
+ } while (local_space <= tuple_size);
+

1. Just after getting the shm_space, you are calling WaitLatch,
without even checking whether
that space is sufficient to send the tuple.
2. Every time after latch is set (maybe some space freed in the shared
queue) you are calling
copy_local_to_shared to send as much data as possible from local to
shared queue, but that doesn't
even guarantee that we will have sufficient space in the local queue
to accommodate the current tuple.

I think calling copy_local_to_shared multiple time (which will
internally acquire mutex), after latch
is set you can check the shared queue space, don't attempt
copy_local_to_shared unless
shm_space >=tuple_size

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Peter Geoghegan

On Wed, Sep 20, 2017 at 11:05 PM, Noah Misch  wrote:
> This is currently a v10 open item, but I think it doesn't qualify for that
> treatment.  It's merely an opportunity for optimization, albeit an
> attractively-simple one.

Fair enough.

This is clearly an omission that was made in 41839b7ab, the commit
that added/fixed Windows support for ICU. Hopefully the omission can
be corrected for v10. I see this as a surprising behavior for users,
since ICU locales on all other platforms *will* have much faster
sorting than libc locales (often more than 3x faster). This is mostly
because ICU brings back abbreviated keys. 3x - 5x faster is more of a
qualitative difference than a quantitative one, IMHO.

That having been said, I'm certainly not going to insist on a v10 fix
for this issue.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PSA: don't be in a hurry to update to XCode 9.0

2017-09-21 Thread Dave Cramer

Too late I just stumbled over this as well!

Dave Cramer

da...@postgresintl.com
www.postgresintl.com

On 20 September 2017 at 14:34, Tom Lane  wrote:

> It seems to install some libraries that depend on
> /usr/lib/system/libsystem_darwin.dylib, which doesn't exist.
> Googling suggests it will be there in macOS 10.13,
> but on Sierra or earlier, you're outta luck.
>
> It looks like the only directly-affected PG dependency is
> libxml; if you leave out --with-libxml you can still build.
>
> I found this out the hard way on longfin's host, so I've
> temporarily removed --with-libxml from that animal's
> configuration to restore it to service.  I trust I'll be
> able to re-enable that after 10.13 comes out.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-21 Thread Claudio Freire

On Tue, Sep 19, 2017 at 8:55 PM, Peter Geoghegan  wrote:
> On Tue, Sep 19, 2017 at 4:47 PM, Claudio Freire  
> wrote:
>> Maybe this is looking at the problem from the wrong direction.
>>
>> Why can't the page be added to the FSM immediately and the check be
>> done at runtime when looking for a reusable page?
>>
>> Index FSMs currently store only 0 or 255, couldn't they store 128 for
>> half-recyclable pages and make the caller re-check reusability before
>> using it?
>
> No, because it's impossible for them to know whether or not the page
> that their index scan just landed on recycled just a second ago, or
> was like this since before their xact began/snapshot was acquired.
>
> For your reference, this RecentGlobalXmin interlock stuff is what
> Lanin & Shasha call "The Drain Technique" within "2.5 Freeing Empty
> Nodes". Seems pretty hard to do it any other way.

I don't see the difference between a vacuum run and distributed
maintainance at _bt_getbuf time. In fact, the code seems to be in
place already.

_bt_page_recyclable seems to prevent old transactions from treating
those pages as recyclable already, and the description of the
technique in 2.5 doesn't seem to preclude doing the drain while doing
other operations. In fact, Lehman even considers the possibility of
multiple concurrent garbage collectors.

It's only a matter of making the page visible in the FSM in a way that
can be efficiently skipped if we want to go directly to a page that
actually CAN be recycled to avoid looping forever looking for a
recyclable page in _bt_getbuf. In fact, that's pretty much Lehman's
drain technique right there. FSM entries with 128 would be "the
queue", and FSM entries with 255 would be "the freelist". _bt_getbuf
can be the GC getting a buffer to try and recycle, give up after a few
tries, and get an actual recycleable buffer instead (or extend the
relationship). In essence, microvacuum.

Unless I'm missing something and RecentGlobalXmin somehow doesn't
exclude all old transactions, I don't see a problem.

Lanin & Shasha use reference counting to do GC wait during the drain,
and most of the ordering of operations needed there is because of
that, but using the xmin seems to make all those considerations moot.
An xact earlier than RecentGlobalXmin cannot have active transactions
able to follow links to that page AFAIK.

TBH, I didn't read the whole papers, though I probably will.

But, in essence, what's the difference of vacuum doing

if (_bt_page_recyclable(page))
{
/* Okay to recycle this page */
RecordFreeIndexPage(rel, blkno);
vstate->totFreePages++;
stats->pages_deleted++;
}

VS doing it in _bt_getbuf?

What am I missing?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix bool/int type confusion

2017-09-21 Thread Tom Lane

Peter Eisentraut  writes:
> Fix bool/int type confusion
> Using ++ on a bool variable doesn't work well when stdbool.h is in use.
> The original BSD code appears to use int here, so use that instead.

I'm fairly unhappy with this approach to fixing this problem, because
localtime.c is not Postgres-maintained code; we try to keep it in sync
with the upstream copy maintained by the IANA timezone crew.  Patching it
like this will interfere with the next sync attempt, and patching it only
in master will cause back-patching of that sync patch to fail.

I checked the tz list archives and discovered that this problem was
already reported to them back in May:
http://mm.icann.org/pipermail/tz/2017-May/024995.html
Although indeed their previous coding had had the variable as "int",
their response was not to change it back to int, but to get rid
of the loop incrementing it, because that was intended to support
the possibility of 2 leap seconds on the same day, which according
to them is a widespread misunderstanding of the applicable standard.
So the code in their git repo still has the variable as bool, but
there's no ++ operator on it anymore.

This also means that the portability problem is purely hypothetical;
given valid tz reference data the code wouldn't ever try to increment
"hit" to 2 anyway.  It's even more hypothetical for us, because we don't
use leap-second-aware zones.

They've not made a new tzcode release since May, due to lack of political
activity in this sphere this year, although I gather that one is likely
by November or so.  However, we have precedent for sometimes syncing with
their git tip to absorb code bug fixes, cf commit 7afafe8af.

Therefore, what I would like to do is revert this commit (0ec2e908b),
and then either leave the problem to be fixed by our next regular sync
with a released tzcode version, or else force a sync with their git tip
to absorb their fix now.  In either case we should keep all our live
branches in sync.  I'm willing to do the code-sync work either way.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generic type subscripting

2017-09-21 Thread Dmitry Dolgov

> On 20 September 2017 at 17:19, Arthur Zakirov 
wrote:
> As a conclusion:
> * additional field are needed to pg_type for *_fetch and *_assign
functions to solve dependency problem

One last thing that I need to clarify. Initially there was an idea to
minimize
changes in `pg_type` - that's why I added only one column there that
contains an
OID of main subscripting function (and everything else you should find out
inside it). But I have no objections about adding more columns if everyone
is
ok with that. Basically pros and cons (marked as + and -):

one new column in `pg_type`:

* less intrusive (+)
* it's neccessary to make a dependency record between subscripting functions
  explicitly (-)

three new columns in `pg_type`:

* more intrusive (-)
* we can create a dependency record between subscripting functions
  simultaneously with a custom type creation (+)
* custom subscripting code does not need to resolve `fetch` and `assign`
  functions (+)

Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-21 Thread Joshua D. Drake


On 09/21/2017 07:51 AM, Peter Eisentraut wrote:

On 9/20/17 15:52, Jeff Janes wrote:

 I think that the addition of a link to
 > https://wiki.postgresql.org/wiki/List_of_drivers
  would be appropriate.

 I don't have any expectation that that list will be kept up to date.

I am not confident that it will be either, but what could we ever have
more confidence in being kept up-to-date than something anyone can
update which is hosted on a community asset?

If we put such a list linked from the documentation, we have to keep it
up to date for years, and no one is committing to do that.

I don't think it's our job to maintain lists of which third-party
products are ready to take advantage of new features in PostgreSQL.  I
don't see a list of GUIs ready to work with the new partitioning or
monitoring tools ready to work with the new xlog/wal naming.  If some
folks want to maintain such lists, that's great, but it's not a release
issue.


Peter is correct.

JD


--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-21 Thread Peter Eisentraut

On 9/20/17 15:52, Jeff Janes wrote:
> I think that the addition of a link to
> > https://wiki.postgresql.org/wiki/List_of_drivers
>  would be appropriate.
> 
> I don't have any expectation that that list will be kept up to date.
> 
> I am not confident that it will be either, but what could we ever have
> more confidence in being kept up-to-date than something anyone can
> update which is hosted on a community asset?
If we put such a list linked from the documentation, we have to keep it
up to date for years, and no one is committing to do that.

I don't think it's our job to maintain lists of which third-party
products are ready to take advantage of new features in PostgreSQL.  I
don't see a list of GUIs ready to work with the new partitioning or
monitoring tools ready to work with the new xlog/wal naming.  If some
folks want to maintain such lists, that's great, but it's not a release
issue.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-21 Thread Peter Eisentraut

On 9/19/17 20:45, Peter Eisentraut wrote:
> On 9/19/17 17:55, Jeff Janes wrote:
>> I guess I'm late to the party, but I don't see why this is needed at
>> all.  We encourage people to use any and all new features which are
>> appropriate to them--that is why we implement new features.  Why does
>> this feature need a special invitation?
> 
> It's not clear to me how an average user would get from the press
> release or release notes to upgrading their installation to use
> SCRAM-based authentication and passwords.  A little bit more guidance
> somewhere would be helpful.

Here is a patch that expands the SCRAM documentation a bit, adds more
explanation how the different options are related, and sets some better
links.  I think now you can get from the release notes to the relevant
documentation and have enough information on how to put the new features
into use.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 2e7640d9a6d65664721fff8d4acdd3c9289027b0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 21 Sep 2017 10:33:09 -0400
Subject: [PATCH] doc: Expand user documentation on SCRAM

Explain more about how the different password authentication methods and
the password_encryption settings relate to each other, give some
upgrading advice, and set a better link from the release notes.
---
 doc/src/sgml/client-auth.sgml | 122 --
 doc/src/sgml/config.sgml  |   2 +-
 doc/src/sgml/release-10.sgml  |   2 +-
 3 files changed, 95 insertions(+), 31 deletions(-)

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 1b568683a4..f2f7527107 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -916,46 +916,78 @@ Password Authentication

 MD5

+   
+SCRAM
+   

 password
 authentication

 

-The password-based authentication methods are scram-sha-256,
-md5, and password. These methods operate
-similarly except for the way that the password is sent across the
+There are several password-based authentication methods.  These methods
+operate similarly but differ in how the users' passwords are stored on the
+server and how the password provided by a client is sent across the
 connection.

 
-   
-Plain password sends the password in clear-text, and is
-therefore vulnerable to password sniffing attacks. It should
-always be avoided if possible. If the connection is protected by SSL
-encryption then password can be used safely, though.
-(Though SSL certificate authentication might be a better choice if one
-is depending on using SSL).
-   
+   
+
+ scram-sha-256
+ 
+  
+   The method scram-sha-256 performs SCRAM-SHA-256
+   authentication, as described in
+   https://tools.ietf.org/html/rfc7677;>RFC 7677.  It
+   is a challenge-response scheme that prevents password sniffing on
+   untrusted connections and supports storing passwords on the server in a
+   cryptographically hashed form that is thought to be secure.
+  
 
+  
+   This is the most secure of the currently provided methods but might not
+   be supported by older client libraries.
+  
+ 
+
 
-   
-scram-sha-256 performs SCRAM-SHA-256 authentication, as
-described in
-https://tools.ietf.org/html/rfc7677;>RFC 7677. It
-is a challenge-response scheme, that prevents password sniffing on
-untrusted connections. It is more secure than the md5
-method, but might not be supported by older clients.
-   
+
+ md5
+ 
+  
+   The method md5 uses a custom less secure challenge-response
+   mechanism.  It prevents password sniffing and avoids storing passwords
+   on the server in plain text, but provides no protection if an attacker
+   manages to steal the password hash from the server.  Also, the MD5 hash
+   algorithm is nowadays no longer consider secure against determined
+   attacks.  The md5 method cannot be used with
+   the  feature.
+  
 
-   
-md5 allows falling back to a less secure challenge-response
-mechanism for those users with an MD5 hashed password.
-The fallback mechanism also prevents password sniffing, but provides no
-protection if an attacker manages to steal the password hash from the
-server, and it cannot be used with the  feature. For all other users,
-md5 works the same as scram-sha-256.
-   
+  
+   To ease transition from the md5 method to the newer
+   SCRAM method, if md5 is specified as a method
+   in pg_hba.conf but the user's password in the
+   server is encrypted for SCRAM (see below), then SCRAM-based
+   authentication will automatically be chosen instead.
+  
+ 
+
+
+
+ password
+ 
+  
+   The method password sends the

Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type

2017-09-21 Thread Tom Lane

Tomas Vondra  writes:
> [ scalarineqsel may fall over when used by extension operators ]

I concur with your thought that we could have ineq_histogram_selectivity
fall back to a "mid bucket" default if it's working with a datatype it
is unable to convert_to_scalar.  But I think if we're going to touch this
at all, we ought to have higher ambition than that, and try to provide a
mechanism whereby an extension that's willing to work a bit harder could
get that additional increment of estimation accuracy.  I don't care for
this way to do that:

> * Make convert_numeric_to_scalar smarter, so that it checks if there is
> an implicit cast to numeric, and fail only if it does not find one.

because it's expensive, and it only works for numeric-category cases,
and it will fail outright for numbers outside the range of "double".
(Notice that convert_numeric_to_scalar is *not* calling the regular
cast function for numeric-to-double.)  Moreover, any operator ought to
know what types it can accept; we shouldn't have to do more catalog
lookups to figure out what to do.

Now that scalarltsel and friends are just trivial wrappers for a common
function, we could imagine exposing scalarineqsel_wrapper as a non-static
function, with more arguments (and perhaps a better-chosen name ;-)).
The idea would be for extensions that want to go this extra mile to
provide their own selectivity estimation functions, which again would
just be trivial wrappers for the core function, but would provide
additional knowledge through additional arguments.

The additional arguments I'm envisioning are a couple of C function
pointers, one function that knows how to convert the operator's
left-hand input type to scalar, and one function that knows how
to convert the right-hand type to scalar.  (Identical APIs of course.)
Passing a NULL would imply that the core code must fall back on its
own devices for that input.

Now the thing about convert_to_scalar is that there are several different
conversion conventions already (numeric, string, timestamp, ...), and
there probably could be more once extension types are coming to the party.
So I'm imagining that the API for these conversion functions could be
something like

bool convert(Datum value, Oid valuetypeid,
 int *conversion_convention, double *converted_value);

The conversion_convention output would make use of some agreed-on
constants, say 1 for numeric, 2 for string, etc etc.  In the core
code, if either converter fails (returns false) or if they don't
return the same conversion_convention code, we give up and use the
mid-bucket default.  If they both produce results with the same
conversion_convention, then we can treat the converted_values as
commensurable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-09-21 Thread amul sul

On Wed, Sep 20, 2017 at 9:27 PM, Amit Khandekar 
wrote:

> On 20 September 2017 at 00:06, Robert Haas  wrote:
> > On Fri, Sep 15, 2017 at 7:25 AM, Amit Khandekar 
> wrote:
> >> [ new patch ]
>

  86 -   (event == TRIGGER_EVENT_UPDATE &&
!trigdesc->trig_update_after_row))
  87 +   (event == TRIGGER_EVENT_UPDATE &&
!trigdesc->trig_update_after_row) ||
  88 +   (event == TRIGGER_EVENT_UPDATE && (oldtup == NULL ||
newtup == NULL)))
  89 return;
  90 }


Either of oldtup or newtup will be valid at a time & vice versa.  Can we
improve
this check accordingly?

For e.g.:
(event == TRIGGER_EVENT_UPDATE && )(HeapTupleIsValid(oldtup) ^
ItemPointerIsValid(newtup)


 247
 248 +   /*
 249 +* EDB: In case this is part of update tuple routing, put this row
into the
 250 +* transition NEW TABLE if we are capturing transition tables. We
need to
 251 +* do this separately for DELETE and INSERT because they happen on
 252 +* different tables.
 253 +*/
 254 +   if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_
capture)
 255 +   ExecARUpdateTriggers(estate, resultRelInfo, NULL,
 256 +NULL,
 257 +tuple,
 258 +NULL,
 259 +mtstate->mt_transition_capture);
 260 +
 261 list_free(recheckIndexes);

 267
 268 +   /*
 269 +* EDB: In case this is part of update tuple routing, put this row
into the
 270 +* transition OLD TABLE if we are capturing transition tables. We
need to
 271 +* do this separately for DELETE and INSERT because they happen on
 272 +* different tables.
 273 +*/
 274 +   if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_
capture)
 275 +   ExecARUpdateTriggers(estate, resultRelInfo, tupleid,
 276 +oldtuple,
 277 +NULL,
 278 +NULL,
 279 +mtstate->mt_transition_capture);
 280 +

Initially, I wondered that why can't we have above code right after
ExecInsert() & ExecIDelete() in ExecUpdate respectively?

We can do that for ExecIDelete() but not easily in the ExecInsert() case,
because ExecInsert() internally searches the correct partition's
resultRelInfo
for an insert and before returning to ExecUpdate resultRelInfo is restored
to the old one.  That's why current logic seems to be reasonable for now.
Is there anything that we can do?

Regards,
Amul

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

2017-09-21 Thread Tom Lane

Noah Misch  writes:
> Perhaps it is time to require HAVE_WCSTOMBS and HAVE_TOWLOWER, removing
> USE_WIDE_UPPER_LOWER?  Every buildfarm fossil has both.

+1 ... if nothing else, there's the problem that untested code is likely
to be broken.  You just proved it *is* broken, of course, but my point
is that even if we repaired the immediate damage we could have little
confidence in it staying fixed.

I think the USE_WIDE_UPPER_LOWER split was originally my code, so I'm
willing to take care of removing it if there's consensus that that's
what to do.

I'm not sure that we need to treat this as a v10 open item, though.
The premise of removing !USE_WIDE_UPPER_LOWER is that nobody cares
anymore, therefore it shouldn't matter to users whether we remove it in
v10.  There's an argument that having only two states of the relevant
code, not three, in the live back branches is worth something for
maintenance --- but should that outweigh the risk of breaking something
post-rc1?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows warnings from VS 2017

2017-09-21 Thread Tom Lane

Andrew Dunstan  writes:
> The speed of memset is hardly going to be the dominating factor in a
> 'CREATE DATABASE' command, so we could certainly afford to change to
> plain memset calls here.

Another thought is that it may be time for our decennial debate about
whether MemSet is worth the electrons it's printed on.  I continue to
think that any modern compiler+libc ought to do an equivalent or better
optimization given just a plain memset().  If that seems to be true
for recent MSVC, we could consider putting an #if into c.h to define
MemSet as just memset for MSVC.  Maybe later that could be extended
to other compilers.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-21 Thread Peter Eisentraut

On 9/21/17 04:27, Pavel Stehule wrote:
> yes. It was designed for + commands only. Can be enhanced to all
> commands - then VERBOSE prefix should be removed - not sure if it is
> necessary. For me interesting different order than default is only in
> verbose mode.

I see where you are coming from, but there is no association in the
existing UI that equates "+" to the word "verbose".  I think just
removing the verbose prefix and applying the sorting behavior in all
cases should be easier to explain and implement.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-21 Thread Ashutosh Bapat

On Mon, Sep 18, 2017 at 10:18 AM, Rafia Sabih
 wrote:
>>
>
>  Limit  (cost=83341943.28..83341943.35 rows=1 width=92) (actual
> time=1556989.996..1556989.997 rows=1 loops=1)
>->  Finalize GroupAggregate  (cost=83341943.28..83342723.24
> rows=10064 width=92) (actual time=1556989.994..1556989.994 rows=1
> loops=1)
>  Group Key: n1.n_name, n2.n_name, (date_part('year'::text,
> (lineitem_001.l_shipdate)::timestamp without time zone))
>  ->  Sort  (cost=83341943.28..83342043.92 rows=40256 width=92)
> (actual time=1556989.910..1556989.911 rows=6 loops=1)
>Sort Key: n1.n_name, n2.n_name,
> (date_part('year'::text, (lineitem_001.l_shipdate)::timestamp without
> time zone))
>Sort Method: quicksort  Memory: 27kB
>->  Gather  (cost=83326804.81..83338864.31 rows=40256
> width=92) (actual time=1550598.855..1556989.760 rows=20 loops=1)
>  Workers Planned: 4
>  Workers Launched: 4
>
> AFAICU the node above sort is group-aggregate and then there is limit,
> and the number of rows for sort node in explain analyse is returned
> number of rows. So, what is happening here is once one group is
> completed it is aggregated and fetched by limit, now there is no need
> for sort to return any more rows and hence the result.

Thanks for your explanation. That makes sense. I forgot about LIMIT node on top.

I debugged the plans today and performed some experiments. Here are my
observations

The join order with and without partition-wise join changes. Without
partition-wise join it is
(lineitem, (suppliers, nation1)), (orders, (customer, nation2)). The
join (lineitem, (suppliers, nation1)) is executed by one gather node
and (orders, (customer, nation2)) is executed by other. Thus the plan
has two gather nodes, which feed to the topmost join.
With partition-wise join the join order is ((lineitem, orders),
(supplier, nation1)), (customer, nation2). The join (lineitem, orders)
uses partition-wise join. This plan executes the whole join tree along
with partial group aggregation under a gather merge.

The rows estimated for various nodes under Gather/GatherMerge are
different from the actual rows e.g.
->  Hash Join  (cost=113164.47..61031454.40 rows=10789501 width=46)
(actual time=3379.931..731987.943 rows=8744357 loops=5) (in
non-partition-wise join plan) OR
->  Append  (cost=179532.36..80681785.95 rows=134868761 width=24)
(actual time=9437.573..1360219.567 rows=109372134 loops=5) (in
partition-wise join plan).
I first thought that this is a real estimation error and spent some
time investigating the estimation error. But eventually realised that
this is how a parallel query plan reports, when I saw that Gather node
estimated correct number of rows even though the nodes under it showed
this difference. Here's the explanation of this report. There are 4
parallel workers, so, the leaders contribution would be estimated to
be 0 by get_parallel_divisor(). So these estimates are per worker and
so the total estimated rows produced by any of the nodes is 4 times
the reported. But when the query actually runs, the leader also
participates, so number of loops = 5 and the actual rows reported are
(total actual rows) / (number of loops i.e. number of backends that
executed the query). The total estimates rows and total actual rows
are roughly equal. So there's no real estimation error, as I thought
earlier. May be we want to make EXPLAIN (ANALYZE) output easier to
understand.

When I tried the same query on laptop with scale 20, I found that the
leader is really contributing as much as other workers. So, the
partial paths were really created based on an estimate which was 20%
off. The cost difference between partition-wise join plan and
non-partition-wise join plan is hardly 1.5%. So, it's possible that if
we correct this estimation error, partition-wise join plan won't be
chosen because of it will have a higher cost. Remember there are two
gather nodes in non-partition-wise join plan and partition-wise join
plan has one gather. So, non-partition-wise join path gets the 20%
decreased estimates twice and partition-wise join gets it only once.

The explain (analyze, verbose) of a parallel node looks like
->  Parallel Seq Scan on public.lineitem_002  (cost=0.00..168752.99
rows=573464 width=24) (actual time=1.395..3075.485 rows=454464
loops=5)
 Filter:
((lineitem_002.l_shipdate >= '1995-01-01'::date) AND
(lineitem_002.l_shipdate <= '1996-12-31'::date))
 Rows Removed by Filter: 1045065
 Worker 0: actual
time=3.358..3131.426 rows=458267 loops=1
 Worker 1: actual
time=0.860..3146.282 rows=447231 loops=1
 Worker 2: actual
time=1.317..3123.646 rows=489960 loops=1

Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-09-21 Thread Amit Kapila

On Thu, Sep 21, 2017 at 4:14 PM, Kyotaro HORIGUCHI
 wrote:
>
> postgres=# create table test (id int primary key, v text);
> postgres=# create index on test using hash (id);
> WARNING:  hash indexes are not WAL-logged and their use is discouraged
>
> But not for for unlogged tables.
>
> postgres=# create unlogged table test (id int primary key, v text);
> postgres=# create index on test using hash (id);
> postgres=# (no warning)
>
> And fails on promotion in the same way.
>
> postgres=# select * from test;
> ERROR:  could not open file "base/13324/16446": No such file or directory
>
> indexcmds.c@965:503
>>   if (strcmp(accessMethodName, "hash") == 0 &&
>> RelationNeedsWAL(rel))
>> ereport(WARNING,
>> (errmsg("hash indexes are not WAL-logged and their use is 
>> discouraged")));
>
> Using !RelationUsesLocalBuffers instead fixes that and the
> attached patch is for 9.6. I'm a bit unconfident on the usage of
> logical meaning of the macro but what it does fits there.
>


I think giving an error message like "hash indexes are not WAL-logged
and .." for unlogged tables doesn't seem like a good behavior.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-21 Thread Robert Haas

On Thu, Sep 21, 2017 at 8:21 AM, Ashutosh Bapat
 wrote:
> About your earlier comment of making build_joinrel_partition_info()
> simpler. Right now, the code assumes that partexprs or
> nullable_partexpr can be NULL when either of them is not populated.
> That may be saves a sizeof(pointer) * (number of keys) byes of memory.
> Saving that much memory may not be worth the complexity of code. So,
> we may always allocate memory for those arrays and fill it with NIL
> values when there are no key expressions to populate those. That will
> simplify the code. I haven't done that change in this patchset. I was
> busy debugging the Q7 regression. Let me know your comments about
> that.

Hmm, I'm not sure that's the best approach, but let me look at it more
carefully before I express a firm opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows warnings from VS 2017

2017-09-21 Thread Andrew Dunstan



On 09/21/2017 02:53 AM, Haribabu Kommi wrote:
>
>
> On Thu, Sep 21, 2017 at 12:26 PM, Andrew Dunstan
>  > wrote:
>
>
>
> On 09/20/2017 08:18 PM, Andrew Dunstan wrote:
> >
> > On 09/20/2017 07:54 PM, Tom Lane wrote:
> >> Andrew Dunstan  > writes:
> >>> It's also warning that it will copy 16 bytes to a 13 byte
> structure at
> >>> lines 518, 1293 and 1294 of src/backend/commands/dbcommands.c.
> I haven't
> >>> seen any ill effects of this so far, but it seems to indicate that
> >>> something is possibly amiss on this compiler with the MemSet
> macros.
> >> That's weird.  Is it too stupid to figure out that the if() inside
> >> MemSet evaluates to constant false in these calls?  It seems
> hard to
> >> see how it would realize that the loop will write 16 bytes if
> it doesn't
> >> propagate the constant value forward.
> >>
> >> However ... on some other compilers, I've noticed that the
> compiler seems
> >> more likely to make "obvious" deductions of that sort if the
> variables in
> >> question are marked const.  Does it help if you do
> >>
> >> -            void   *_vstart = (void *) (start); \
> >> -            int             _val = (val); \
> >> -            Size    _len = (len); \
> >> +            void   * const _vstart = (void *) (start); \
> >> +            const int       _val = (val); \
> >> +            const Size      _len = (len); \
> >>
> >>
> >> I don't think there's any strong reason not to just do
> s/MemSet/memset/
> >> in these calls and nearby ones, but it would be good to
> understand just
> >> what's wrong here.  And why it's only showing up in that file;
> seems
> >> nearly certain that we have similar coding elsewhere.
> >>
> >>
> >
> > I'll test it.
> >
>
>
> Doesn't make a difference. I agree it would be good to understand
> what's
> going on.
>
>
> These warnings are not produced when the build is run in DEBUG mode.
> Because of this I didn't see these warning when I was working with VS
> 2017.
>
> This may be an issue with VS 2017 compiler.
>


Yeah, it seems like it, particularly when identical code in another
function in the same file doesn't generate a complaint.

The speed of memset is hardly going to be the dominating factor in a
'CREATE DATABASE' command, so we could certainly afford to change to
plain memset calls here.

I'll see if I can find someone to report this to :-)

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-21 Thread Rajkumar Raghuwanshi

On Mon, Sep 18, 2017 at 12:37 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
> On Tue, Sep 12, 2017 at 6:21 PM, Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
>
>>
>>
>> On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi <
>> rajkumar.raghuwan...@enterprisedb.com> wrote:
>>
>>>
>>> Hi Jeevan,
>>>
>>> I have started testing partition-wise-aggregate and got one observation,
>>> please take a look.
>>> with the v2 patch, here if I change target list order, query is not
>>> picking full partition-wise-aggregate.
>>>
>>
>> Thanks Rajkumar for reporting this.
>>
>> I am looking into this issue and will post updated patch with the fix.
>>
>
> Logic for checking whether partition keys lead group by keys needs to be
> updated here. The group by expressions can appear in any order without
> affecting the final result. And thus, the need for partition keys should
> be leading the group by keys to have full aggregation is not mandatory.
> Instead we must ensure that the partition keys are part of the group by
> keys to compute full aggregation on a partition.
>
> Attached, revised patch-set with above fix.
>
> Also, in test-cases, I have removed DROP/ANALYZE commands on child
> relations and also removed VERBOSE from the EXPLAIN.
>
> Notes:
> HEAD: 8edacab209957520423770851351ab4013cb0167
> Partition-wise Join patch-set version: v32
>

Hi Jeevan,

while testing latest v3 patches, I am  getting a server crash if I reset
partition_wise_agg_cost_factor, please take a look.

CREATE TABLE lp (a TEXT, b FLOAT, c INT) PARTITION BY LIST(c);
CREATE TABLE lp1 PARTITION OF lp FOR VALUES IN (10,20);
CREATE TABLE lp2 PARTITION OF lp FOR VALUES IN (30,40);

INSERT INTO lp VALUES ('a1',800, 20);
INSERT INTO lp VALUES ('a2',1250,30);
INSERT INTO lp VALUES ('a3',2975,20);
INSERT INTO lp VALUES ('a3',2850,30);

postgres=# SET enable_partition_wise_agg TO true;
SET
postgres=# SET partition_wise_agg_cost_factor TO 0.5;
SET
postgres=#
postgres=# SELECT MAX(b), AVG(b) FROM lp GROUP BY a HAVING a = 'a3' ORDER
BY 1,2;
 max  |  avg
--+
 2975 | 2912.5
(1 row)

postgres=# RESET partition_wise_agg_cost_factor;
RESET
postgres=#
postgres=# SELECT MAX(b), AVG(b) FROM lp GROUP BY a HAVING a = 'a3' ORDER
BY 1,2;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-09-21 Thread Rafia Sabih

On Sun, Sep 17, 2017 at 9:10 PM, Dilip Kumar  wrote:
> On Wed, Sep 6, 2017 at 4:14 PM, Rafia Sabih
>  wrote:
>
>> I worked on this idea of using local queue as a temporary buffer to
>> write the tuples when master is busy and shared queue is full, and it
>> gives quite some improvement in the query performance.
>>
>
> I have done some initial review of this patch and I have some comments.
>
Thanks for the review.

> /* this is actual size for this tuple which will be written in queue */
> + tuple_size = MAXALIGN(sizeof(Size)) + MAXALIGN(iov.len);
> +
> + /* create and attach a local queue, if it is not yet created */
> + if (mqh->mqh_local_queue == NULL)
> + mqh = local_mq_attach(mqh);
>
> I think we can create the local queue when first time we need it. So
> basically you
> can move this code inside else part where we first identify that there is no
> space in the shared queue.
>
Done.
> --
> + /* write in local queue if there is enough space*/
> + if (local_space > tuple_size)
>
> I think the condition should be if (local_space >= tuple_size)
>
I did this to be on the safer side, anyhow modified.
> --
> + while(shm_space <= 0)
> + {
> + if (shm_mq_is_detached(mqh->mqh_queue))
> + return SHM_MQ_DETACHED;
> +
> + shm_space = space_in_shm(mqh->mqh_queue);
> + }
>
> Instead of waiting in CPU intensive while loop we should wait on some latch, 
> why
> can't we use wait latch of the shared queue and whenever some space
> free, the latch will
> be set then we can recheck the space and if it's enough we can write
> to share queue
> otherwise wait on the latch again.
>
> Check other similar occurrences.
Done.
> -
>
> + if (read_local_queue(lq, true) && shm_space > 0)
> + copy_local_to_shared(lq, mqh, false);
> +
>
> Instead of adding shm_space > 0 in check it can be Assert or nothing
> because above loop will
> only break if shm_space > 0.
> 
Done
>
> + /*
> + * create a local queue, the size of this queue should be way higher
> + * than PARALLEL_TUPLE_QUEUE_SIZE
> + */
> + char *mq;
> + Size len;
> +
> + len = 6553600;
>
> Create some macro which is multiple of PARALLEL_TUPLE_QUEUE_SIZE,
>
Done.
> ---
>
> + /* this local queue is not required anymore, hence free the space. */
> + pfree(mqh->mqh_local_queue);
> + return;
> +}
>
> I think you can remove the return at the end of the void function.
> -
Done.
>
> In empty_queue(shm_mq_handle *mqh) function I see that only last exit path 
> frees
> the local queue but not all even though local queue is created.
> 
>
Modified.
> Other cosmetic issues.
> -
> +/* check the space availability in local queue */
> +static uint64
> +space_in_local(local_mq *lq, Size tuple_size)
> +{
> + uint64 read, written, used, available, ringsize, writer_offset, 
> reader_offset;
> +
> + ringsize = lq->mq_ring_size;
> + read = lq->mq_bytes_read;
> + written = lq->mq_bytes_written;
> + used = written - read;
> + available = ringsize - used;
> +
> Alignment is not correct, I think you can run pgindent once.
> 
>
> + /* check is there is required space in shared queue */
> statement need refactoring. "check if there is required space in shared 
> queue" ?
>
> -
>
> + /* write in local queue if there is enough space*/
> space missing before comment end.
>
> -
>
> +
> +/* Routines required for local queue */
> +
> +/*
> + * Initialize a new local message queue, this is kept quite similar
> to shm_mq_create.
> + */
>
> Header comments formatting is not in sync with other functions.
>
> -
>
> +}
> +/* routine to create and attach local_mq to the shm_mq_handle */
>
> one blank line between two functions.
>
> -
>
Ran pgindent for these issues.

> + bool detached;
> +
> + detached = false;
>
> a better way is bool detached = false;
>
> -
>
Done.
> Compilation warning
> 
> shm_mq.c: In function ‘write_in_local_queue’:
> shm_mq.c:1489:32: warning: variable ‘tuple_size’ set but not used
> [-Wunused-but-set-variable]
>   uint64 bytes_written, nbytes, tuple_size;
>
That might be in case not configured with cassert, however, it is
removed in current version anyway.

Please find the attached file for the revised version.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


faster_gather_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-09-21 Thread Kyotaro HORIGUCHI

Hello,

Following a bit older thread.

At Tue, 18 Jul 2017 08:33:07 +0200, Michael Paquier  
wrote in 
> On Tue, Jul 18, 2017 at 4:18 AM, Amit Kapila  wrote:
> > Thanks.  Do you have any suggestion for back-branches?  As of now, it
> > fails badly with below kind of error:
> >
> > test=> SELECT * FROM t_u_hash;
> > ERROR:  could not open file "base/16384/16392": No such file or directory
> >
> > It is explained in another thread [3] where it has been found that the
> > reason for such an error is that hash indexes are not WAL logged prior
> > to 10.  Now, we can claim that we don't recommend hash indexes to be
> > used prior to 10 in production, so such an error is okay even if there
> > is no crash has happened in the system.
> 
> There are a couple of approaches:
> 1) Marking such indexes as invalid at recovery and log information
> about the switch done.
> 2) Error at creation of hash indexes on unlogged tables.
> 3) Leave it as-is, because there is already a WARNING at creation.
> I don't mind seeing 3) per the amount of work done lately to support
> WAL on hash indexes.

I overlooked that but (3) is true as long as the table is
*logged* one.

postgres=# create table test (id int primary key, v text);
postgres=# create index on test using hash (id);
WARNING:  hash indexes are not WAL-logged and their use is discouraged

But not for for unlogged tables.

postgres=# create unlogged table test (id int primary key, v text);
postgres=# create index on test using hash (id);
postgres=# (no warning)

And fails on promotion in the same way.

postgres=# select * from test;
ERROR:  could not open file "base/13324/16446": No such file or directory

indexcmds.c@965:503
>   if (strcmp(accessMethodName, "hash") == 0 &&
> RelationNeedsWAL(rel))
> ereport(WARNING,
> (errmsg("hash indexes are not WAL-logged and their use is 
> discouraged")));

Using !RelationUsesLocalBuffers instead fixes that and the
attached patch is for 9.6. I'm a bit unconfident on the usage of
logical meaning of the macro but what it does fits there.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
***
*** 501,507  DefineIndex(Oid relationId,
  	amRoutine = GetIndexAmRoutine(accessMethodForm->amhandler);
  
  	if (strcmp(accessMethodName, "hash") == 0 &&
! 		RelationNeedsWAL(rel))
  		ereport(WARNING,
  (errmsg("hash indexes are not WAL-logged and their use is discouraged")));
  
--- 501,507 
  	amRoutine = GetIndexAmRoutine(accessMethodForm->amhandler);
  
  	if (strcmp(accessMethodName, "hash") == 0 &&
! 		!RelationUsesLocalBuffers(rel))
  		ereport(WARNING,
  (errmsg("hash indexes are not WAL-logged and their use is discouraged")));
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] coverage analysis improvements

2017-09-21 Thread Dagfinn Ilmari Mannsåker

Peter Eisentraut  writes:

> On 9/20/17 13:13, Dagfinn Ilmari Mannsåker wrote:
>> I have no opinion on the bulk of this patch set, but skimming it out of
>> curiosity I noticed that the plperl change seems to have lost the
>> dependency on plperl_helpers.h from the xsubpp targets:
>
> Those commands don't actually require plperl_helpers.h, do they?

No, but the .xs files #include it.  I guess it's SPI.o and Util.o that
should depend on the header..

- ilmari
-- 
"I use RMS as a guide in the same way that a boat captain would use
 a lighthouse.  It's good to know where it is, but you generally
 don't want to find yourself in the same spot." - Tollef Fog Heen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-21 Thread Andreas Karlsson


On 09/21/2017 01:40 AM, Peter Geoghegan wrote:

On Wed, Sep 20, 2017 at 4:08 PM, Peter Geoghegan  wrote:

pg_import_system_collations() takes care to use the non-BCP-47 style for
such versions, so I think this is working correctly.


But CREATE COLLATION doesn't use pg_import_system_collations().


And perhaps more to the point: it highly confusing that we use one or
the other of those 2 things ("langtag"/BCP 47 tag or "name"/legacy
locale name) as "colcollate", depending on ICU version, thereby
*behaving* as if ICU < 54 really didn't know anything about BCP 47
tags. Because, obviously earlier ICU versions know plenty about BCP
47, since 9 lines further down we use "langtag"/BCP 47 tag as collname
for CollationCreate() -- regardless of ICU version.

How can you say "ICU <54 doesn't even support the BCP 47 style", given
all that? Those versions will still have locales named "*-x-icu" when
users do "\dOS". Users will be highly confused when they quite
reasonably try to generalize from the example in the docs and what
"\dOS" shows, and get results that are wrong, often only in a very
subtle way.


If we are fine with supporting only ICU 4.2 and later (which I think we 
are given that ICU 4.2 was released in 2009) then using 
uloc_forLanguageTag()[1] to validate and canonize seems like the right 
solution. I had missed that this function even existed when I last read 
the documentation. Does it return a BCP 47 tag in modern versions of ICU?


I strongly prefer if there, as much as possible, is only one format for 
inputting ICU locales.


1. 
http://www.icu-project.org/apiref/icu4c/uloc_8h.html#aa45d6457f72867880f079e27a63c6fcb


Andreas



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

2017-09-21 Thread Julien Rouhaud

On Thu, Sep 21, 2017 at 10:52 AM, Ashutosh Bapat
 wrote:
> With 9140cf8269b0c4ae002b2748d93979d535891311, we store the
> RelOptInfos of partitions in the RelOptInfo of partitioned table. For
> range partitioned table without default partition, they are arranged
> in the ascending order of partition bounds. This patch may avoid
> MergeAppend if the sort keys match partition keys by creating an
> Append path by picking sorted paths from partition RelOptInfos in
> order. You may use slightly modified version of
> add_paths_to_append_rel().


Yes, I just saw this commit this morning, and this is exactly what I
was missing, thanks for the pointer and the patch!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we cacheline align PGXACT?

2017-09-21 Thread Alexander Korotkov

On Mon, Sep 18, 2017 at 12:41 PM, Daniel Gustafsson  wrote:

> > On 16 Sep 2017, at 01:51, Alexander Korotkov 
> wrote:
> >
> > On Tue, Sep 5, 2017 at 2:47 PM, Daniel Gustafsson  > wrote:
> > > On 04 Apr 2017, at 14:58, David Steele > wrote:
> > >
> > > On 4/4/17 8:55 AM, Alexander Korotkov wrote:
> > >> On Mon, Apr 3, 2017 at 9:58 PM, Andres Freund  
> > >>
> > >>I'm inclined to push this to the next CF, it seems we need a lot
> more
> > >>benchmarking here.
> > >>
> > >> No objections.
> > >
> > > This submission has been moved to CF 2017-07.
> >
> > This CF has now started (well, 201709 but that’s what was meant in
> above), can
> > we reach closure on this patch in this CF?
> >
> > During previous commitfest I come to doubts if this patch is really
> needed when same effect could be achieved by another (perhaps random)
> change of alignment.  The thing I can do now is to retry my benchmark on
> current master and check what effect this patch has now.
>
> Considering this I’ll mark this as Waiting on Author, in case you come to
> conclusion that another patch is required then we’ll bump to a return
> status.
>

I've made some read-only benchmarking.  There is clear win in this case.
The only point where median of master is higher than median of patched
version is 40 clients.

In this point observations are following:
master:   982432 939483 932075
pgxact-align: 913151 921300 938132
So, groups of observations form the overlapping ranges, and this anomaly
can be explained by statistical error.

I'm going to make some experiments with read-write and mixed workloads too.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-21 Thread Michael Paquier

On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada  wrote:
> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
>  wrote:
>> On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada  
>> wrote:
>>> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
>>> backup has been introduced, so we should back-patch to the all
>>> supported versions.
>>
>> There is a typo here right? Non-exclusive backups have been introduced
>> in 9.6. Why would a back-patch further down be needed?
>
> I think the non-exclusive backups infrastructure has been introduced
> in 9.1 for pg_basebackup. I've not checked yet that it can be
> reproduced using pg_basebackup in PG9.1 but I thought it could happen
> as far as I checked the code.

Yep, but the deficiency is caused by the use before_shmem_exit() in
the SQL-level functions present in 9.6~ which make the cleanup happen
potentially twice. If you look at the code of basebackup.c, you would
notice that the cleanups of the counters only happen within the
PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be
enough.

>> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
>> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
>> +   XLogCtl->Insert.nonExclusiveBackups--;
>> Hm, no, I don't agree. I think that instead you should just leave
>> do_pg_abort_backup() immediately if sessionBackupState is set to
>> SESSION_BACKUP_NONE. This variable is the link between the global
>> counters and the session stopping the backup so I don't think that we
>> should touch this assertion of this counter. I think that this method
>> would be safe as well for backup start as pg_start_backup_callback
>> takes care of any cleanup. Also because the counters are incremented
>> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
>> sessionBackupState is updated just after leaving the block.
>
> I think that the assertion failure still can happen if the process
> aborts after decremented the counter and before setting to
> SESSION_BACKUP_NONE. Am I missing something?

The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
the cleanup at this moment. So this happens when waiting for the
archives to be done, and the session flag is set to NONE at this
point.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

2017-09-21 Thread Ashutosh Bapat

On Thu, Sep 21, 2017 at 3:30 AM, Julien Rouhaud  wrote:
> On Thu, Aug 31, 2017 at 5:30 AM, Peter Eisentraut
>  wrote:
>> On 3/20/17 11:03, Ronan Dunklau wrote:
 Great idea.  This is too late for v10 at this point, but please add it
 to the next CommitFest so we don't forget about it.
>>> I know it is too late, and thought that it was too early to add it to the
>>> commitfest properly since so many design decisions should be discussed. 
>>> Thanks
>>> for the feedback, I added it.
>>
>> This patch no longer applies.  Please send an update.
>
>
> Thanks for the reminder, and sorry for very very late answer.  PFA
> rebased patch on current head.
>
> I unfortunately didn't have time yet to read carefully the newly added
> infrastructure (30833ba154e0c1106d61e3270242dc5999a3e4f3 and
> following), so this patch is still using its own copy of partitions
> list.  I hope I can work on it shortly, but I prefer to send the
> rebased version now, since it's still a POC with knowns problems and
> questions, and I'll more than welcome any guidance with it.
>
>

With 9140cf8269b0c4ae002b2748d93979d535891311, we store the
RelOptInfos of partitions in the RelOptInfo of partitioned table. For
range partitioned table without default partition, they are arranged
in the ascending order of partition bounds. This patch may avoid
MergeAppend if the sort keys match partition keys by creating an
Append path by picking sorted paths from partition RelOptInfos in
order. You may use slightly modified version of
add_paths_to_append_rel().

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-21 Thread Kyotaro HORIGUCHI

Hi,

At Tue, 19 Sep 2017 16:55:38 -0700, Peter Geoghegan  wrote in 

> On Tue, Sep 19, 2017 at 4:47 PM, Claudio Freire  
> wrote:
> > Maybe this is looking at the problem from the wrong direction.
> >
> > Why can't the page be added to the FSM immediately and the check be
> > done at runtime when looking for a reusable page?
> >
> > Index FSMs currently store only 0 or 255, couldn't they store 128 for
> > half-recyclable pages and make the caller re-check reusability before
> > using it?
> 
> No, because it's impossible for them to know whether or not the page
> that their index scan just landed on recycled just a second ago, or
> was like this since before their xact began/snapshot was acquired.
> 
> For your reference, this RecentGlobalXmin interlock stuff is what
> Lanin & Shasha call "The Drain Technique" within "2.5 Freeing Empty
> Nodes". Seems pretty hard to do it any other way.

Anyway(:p) the attached first patch is a PoC for the
cleanup-state-in-stats method works only for btree. Some
LOG-level debugging messages are put in the patch to show how it
works.

The following steps makes a not-recyclable page but I'm not sure
it is general enough, and I couldn't generate half-dead pages.
The pg_sleep() in the following steps is inserted in order to see
the updated values in stats.

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a int);
CREATE INDEX ON t1 (a);
INSERT INTO t1 (SELECT a FROM generate_series(0, 80) a);
DELETE FROM t1 WHERE a > 416700 AND a < 417250;
VACUUM t1;
DELETE FROM t1;
VACUUM t1;  -- 1 (or wait for autovacuum)
select pg_sleep(1);
VACUUM t1;  -- 2 (autovacuum doesn't work)
select pg_sleep(1);
VACUUM t1;  -- 3 (ditto)

The following logs are emited while the three VACUUMs are issued.

# VACUUM t1;  -- 1 (or wait for autovacuum)
 LOG:  btvacuumscan(t1_a_idx) result: deleted = 2185, notrecyclable = 1, 
hafldead = 0, no_cleanup_needed = false
 LOG:  Vacuum cleanup of index t1_a_idx is NOT skipped
 LOG:  btvacuumcleanup on index t1_a_idx is skipped since bulkdelete has run 
just before.
# VACUUM t1;  -- 2
 LOG:  Vacuum cleanup of index t1_a_idx is NOT skipped
 LOG:  btvacuumscan(t1_a_idx) result: deleted = 2192, notrecyclable = 0, 
hafldead = 0, no_cleanup_needed = true
# VACUUM t1;  -- 3
 LOG:  Vacuum cleanup of index t1_a_idx is skipped

VACUUM #1 leaves a unrecyclable page and requests the next cleanup.
VACUUM #2 leaves no unrecyclable page and inhibits the next cleanup.
VACUUM #3 (and ever after) no vacuum cleanup executed.

# I suppose it is a known issue that the cleanup cycles are not
# executed automatically unless new dead tuples are generated.

- Getting stats takes a very long time to fail during
  initdb. Since I couldn't find the right way to cope with this,
  I added a tentative function pgstat_live(), which checks that
  the backend has a valid stats socket.

- The patch calls pg_stat_get_vac_cleanup_needed using
  DirectFunctionCall. It might be better be wrapped.

As a byproduct, this enables us to run extra autovacuum rounds fo
r index cleanup. With the second attached, autovacuum works as
follows.

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a int);
CREATE INDEX ON t1 (a);
INSERT INTO t1 (SELECT a FROM generate_series(0, 80) a);
DELETE FROM t1 WHERE a > 416700 AND a < 417250;
(autovacuum on t1 runs)
> LOG:  btvacuumscan(t1_a_idx) result: deleted = 0, notrecyclable = 0, hafldead 
> = 0, no_cleanup_needed = true
> LOG:  Vacuum cleanup of index t1_a_idx is skipped
> LOG:  automatic vacuum of table "postgres.public.t1": index scans: 1
DELETE FROM t1;
(autovacuum on t1 runs)
> LOG:  btvacuumscan(t1_a_idx) result: deleted = 2185, notrecyclable = 1, 
> hafldead = 0, no_cleanup_needed = false
> LOG:  Vacuum cleanup of index t1_a_idx is NOT skipped
> LOG:  btvacuumcleanup on index t1_a_idx is skipped since bulkdelete has run 
> just before.
> LOG:  automatic vacuum of table "postgres.public.t1": index scans: 1
(cleanup vacuum runs for t1 in the next autovac timing)
> LOG:  Vacuum cleanup of index t1_a_idx is NOT skipped
> LOG:  btvacuumscan(t1_a_idx) result: deleted = 2192, notrecyclable = 0, 
> hafldead = 0, no_cleanup_needed = true
> LOG:  automatic vacuum of table "postgres.public.t1": index scans: 0

Any suggestions are welcome.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/access/nbtree/nbtpage.c
--- b/src/backend/access/nbtree/nbtpage.c
***
*** 1110,1116  _bt_pagedel(Relation rel, Buffer buf)
  {
  	int			ndeleted = 0;
  	BlockNumber rightsib;
! 	bool		rightsib_empty;
  	Page		page;
  	BTPageOpaque opaque;

--- 1110,1116 
  {
  	int			ndeleted = 0;
  	BlockNumber rightsib;
! 	bool		rightsib_empty = false;
  	Page		page;
  	BTPageOpaque opaque;

*** a/src/backend/access/nbtree/nbtree.c
--- b/src/backend/access/nbtree/nbtree.c
***
*** 63,68  typedef struct
--- 63,70 
  	BlockNumber

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-21 Thread Masahiko Sawada

On Sun, Sep 17, 2017 at 2:01 AM, Masahiko Sawada  wrote:
> On Sat, Sep 16, 2017 at 9:52 PM, Peter Eisentraut
>  wrote:
>> On 9/15/17 13:35, Arseny Sher wrote:
>>> Peter Eisentraut  writes:
>>>
 Here is a simple patch that fixes this, based on my original proposal
 point #4.
>>>
>>> I checked, it passes the tests and solves the problem. However, isn't
>>> this
>>>
>>> + if (slotname || !subenabled)
>>>
>>> is a truism? Is it possible that subscription has no slot but still
>>> enabled?
>>
>> Yeah, we could just remove the _at_commit() branch entirely.  That would
>> effectively undo the change in 7e174fa793a2df89fe03d002a5087ef67abcdde8,
>> but I don't see any other choice for now.  And the practical impact
>> would be quite limited.
>>
>
> Yeah, we can remove only _at_commit() branch, but other part of the
> commit is still valid for ALTER SUBSCRIPTION DISABLE.
>
>>> Besides, we can avoid stopping the workers if subscription has no
>>> associated replication origin, though this probably means that
>>> subscription was broken by user and is not worth it.
>>
>> Right, it seems not worth addressing this case separately.
>>
>
> Once we got this patch, DROP SUBSCRIPTION is not transactional either
> if dropping a replication slot or if the subscription got disabled in
> a transaction block. But we disallow to do DROP SUBSCRIPTION in a
> transaction block only in the former case. In the latter case, we
> adopted such non-transactional behaviour. Since these behaviours would
> be complex for users I attached the documentation patch explaining it.
>

Hmm, isn't there necessary to care and mention about this kind of
inconsistent behavior in docs?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-21 Thread Masahiko Sawada

On Thu, Sep 21, 2017 at 5:23 PM, Fabien COELHO  wrote:
>
> Hello Masahiko-san,
>
>>> ISTM that you probably intended "\(.*\)" (actual parenthesis) instead of
>>> "(.*)" (memorization) in the data generation message check.
>>
>>
>> Thank you, fixed it.
>>
>>> Otherwise all is well for me.
>>>
>>
>> Attached the updated version patch.
>
>
> Applies, compiles, make check & tap test ok, doc is fine.
>
> All is well for me: the feature is useful, code is simple and clean, it is
> easy to invoke, easy to extend as well, which I'm planning to do once it
> gets in.
>
> I switched the patch to "Ready for Committers". No doubt they will have
> their own opinions about it. Let's wait and see.

Thank you for the reviewing this patch!!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compress method for spgist - 2

2017-09-21 Thread Alexander Korotkov

On Thu, Sep 21, 2017 at 3:14 AM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > On Thu, Sep 21, 2017 at 2:06 AM, Darafei "Komяpa" Praliaskouski <
> >> It seems to me that any circle with radius of any Infinity should
> become a
> >> [-Infinity .. Infinity, -Infinity .. Infinity] box.Then you won't have
> >> NaNs, and index structure shouldn't be broken.
>
> > We probably should produce [-Infinity .. Infinity, -Infinity .. Infinity]
> > box for any geometry containing inf or nan.
>
> Hm, we can do better in at least some cases, eg for a box ((0,1),(1,inf))
> there's no reason to give up our knowledge of finite bounds for the
> other three boundaries.  But certainly for a NaN circle radius
> what you suggest seems the most sensible thing to do.
>

OK.  I'll try implement this for circles.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-21 Thread Pavel Stehule

2017-09-21 10:19 GMT+02:00 Alexander Korotkov :

> On Thu, Sep 21, 2017 at 1:53 AM, Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
>
>> On 9/8/17 00:13, Pavel Stehule wrote:
>> > I am sending rebased patch
>> >
>> > rebased again + fix obsolete help
>>
>> Why are the variables called VERBOSE_SORT_* ?  What is verbose about them?
>
>
> I assume Pavel called them so, because they are working only for "verbose"
> mode of command.  I.e. they are working for \dt+ not \dt.
> However, in \dt 2 of 3 sorting modes might work: schema_name and
> name_schema.  Thus, I think it worths enabling these variables for "non
> verbose" mode of commands too.
>

yes. It was designed for + commands only. Can be enhanced to all commands -
then VERBOSE prefix should be removed - not sure if it is necessary. For me
interesting different order than default is only in verbose mode.





> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-21 Thread Fabien COELHO



Hello Masahiko-san,


ISTM that you probably intended "\(.*\)" (actual parenthesis) instead of
"(.*)" (memorization) in the data generation message check.


Thank you, fixed it.


Otherwise all is well for me.



Attached the updated version patch.


Applies, compiles, make check & tap test ok, doc is fine.

All is well for me: the feature is useful, code is simple and clean, it is 
easy to invoke, easy to extend as well, which I'm planning to do once it 
gets in.


I switched the patch to "Ready for Committers". No doubt they will have 
their own opinions about it. Let's wait and see.


Thanks,

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-21 Thread Alexander Korotkov

On Thu, Sep 21, 2017 at 1:53 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/8/17 00:13, Pavel Stehule wrote:
> > I am sending rebased patch
> >
> > rebased again + fix obsolete help
>
> Why are the variables called VERBOSE_SORT_* ?  What is verbose about them?

I assume Pavel called them so, because they are working only for "verbose"
mode of command.  I.e. they are working for \dt+ not \dt.
However, in \dt 2 of 3 sorting modes might work: schema_name and
name_schema.  Thus, I think it worths enabling these variables for "non
verbose" mode of commands too.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-21 Thread Alexander Korotkov

On Thu, Sep 21, 2017 at 1:52 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/19/17 12:54, Pavel Stehule wrote:
> > However, patch misses regression tests covering added functionality.
> >
> > I am not sure if there are any tests related to output of \dt+ commands
> > - there result is platform depend.
>
> How so?


\dt+ reports relation sizes whose are platform depended.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] coverage analysis improvements

2017-09-21 Thread Michael Paquier

On Thu, Sep 21, 2017 at 1:48 AM, Peter Eisentraut
 wrote:
> OK, I was not aware that people are using it that way.

At least one.

> So updated patch
> set there, which separates coverage and coverage-html into two
> independent targets.

Thanks for the new versions.

Patches 4 and 5 could be merged. They step on each other's code paths.
Some other things could be merged as well.

 coverage-html-stamp: lcov_base.info lcov_test.info
rm -rf coverage
-   $(GENHTML) $(GENHTML_FLAGS) -o coverage --title='$(GENHTML_TITLE)'
--num-spaces=4 --prefix='$(abs_top_srcdir)' $^
+   $(GENHTML) $(GENHTML_FLAGS) -o coverage --title='$(GENHTML_TITLE)'
--num-spaces=4 $^
touch $@
Actually this is very nice. I have been always enforcing
abs_top_srcdir when using coverage stuff on things out of the main
tree.

-SPI.c: SPI.xs plperl_helpers.h
+%.c: %.xs
@if [ x"$(perl_privlibexp)" = x"" ]; then echo "configure switch
--with-perl was not specified."; exit 1; fi
-   $(PERL) $(XSUBPPDIR)/ExtUtils/xsubpp -typemap
$(perl_privlibexp)/ExtUtils/typemap $< >$@
Doing coverage on plperl with this patch applied, those do not seem
necessary. But I don't know enough this code to give a clear opinion.

Running coverage-html with all the patches, I am seeing the following
warnings with a fresh build on my macos laptop 10.11:
geninfo: WARNING: gcov did not create any files for
/Users/mpaquier/git/postgres/src/backend/access/transam/rmgr.gcda!
I don't think that this is normal.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-21 Thread Masahiko Sawada

On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
 wrote:
> On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada  
> wrote:
>> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
>> backup has been introduced, so we should back-patch to the all
>> supported versions.
>
> There is a typo here right? Non-exclusive backups have been introduced
> in 9.6. Why would a back-patch further down be needed?

I think the non-exclusive backups infrastructure has been introduced
in 9.1 for pg_basebackup. I've not checked yet that it can be
reproduced using pg_basebackup in PG9.1 but I thought it could happen
as far as I checked the code.

>
>> I changed do_pg_abort_backup() so that it decrements
>> nonExclusiveBackups only if > 0. Feedback is very welcome.
>
> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
> +   XLogCtl->Insert.nonExclusiveBackups--;
> Hm, no, I don't agree. I think that instead you should just leave
> do_pg_abort_backup() immediately if sessionBackupState is set to
> SESSION_BACKUP_NONE. This variable is the link between the global
> counters and the session stopping the backup so I don't think that we
> should touch this assertion of this counter. I think that this method
> would be safe as well for backup start as pg_start_backup_callback
> takes care of any cleanup. Also because the counters are incremented
> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
> sessionBackupState is updated just after leaving the block.

I think that the assertion failure still can happen if the process
aborts after decremented the counter and before setting to
SESSION_BACKUP_NONE. Am I missing something?

>
> About the patch:
> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
> There is a typo on this line.
>
> Adding that to the next CF would be a good idea so as we don't forget
> about it. Feel free to add me as reviewer.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] close_ps, NULLs, and DirectFunctionCall

2017-09-21 Thread Emre Hasegeli

> Does this need fixing, and if so how?

My improve geometric types patch [1] fixes this issue, by cosidering
NaNs larger than any non-NAN same as the float types.  There are many
issues with the geometric types similar to this.  Let's combine our
efforts to put them into a shape.

[1] 
https://www.postgresql.org/message-id/flat/CAE2gYzxF7-5djV6-cEvqQu-fNsnt=eqbourx7zdg+vv6zmt...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-21 Thread Ashutosh Sharma

On Thu, Sep 21, 2017 at 9:30 AM, Robert Haas  wrote:
> On Wed, Sep 20, 2017 at 11:43 AM, Ashutosh Sharma  
> wrote:
>> Attached are the patches with above changes. Thanks.
>
> Thanks.  I think that the comments and README changes in 0003 need
> significantly more work.  In several places, they fail to note the
> unlogged vs. logged differences, and the header comment for
> hashbucketcleanup still says that scans depend on increasing-TID order
> (really, 0001 should change that text somehow).
>

I have added a note for handling of logged and unlogged tables in
README file and also corrected the header comment for
hashbucketcleanup(). Please find the attached 0003*.patch having these
changes. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 485627178cfaf73a908498e79229e4db04a99648 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Wed, 20 Sep 2017 20:53:06 +0530
Subject: [PATCH] Rewrite hash index scan to work page at a time.

Patch by Ashutosh Sharma.
---
 src/backend/access/hash/README   |  25 +-
 src/backend/access/hash/hash.c   | 146 ++--
 src/backend/access/hash/hashpage.c   |  10 +-
 src/backend/access/hash/hashsearch.c | 437 +++
 src/backend/access/hash/hashutil.c   |  67 +-
 src/include/access/hash.h|  55 -
 6 files changed, 553 insertions(+), 187 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index c8a0ec7..3b1f719 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -259,10 +259,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+	the pin on the primary bucket throughout the scan)
+	save all the matching tuples from current index page into an items array
+	release pin and content lock (but if it is primary bucket page retain
+	its pin till the end of the scan)
+	get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
@@ -270,15 +271,13 @@ Holding the buffer pin on the primary bucket page for the whole scan prevents
 the reader's current-tuple pointer from being invalidated by splits or
 compactions.  (Of course, other buckets can still be split or compacted.)
 
-To keep concurrency reasonably good, we require readers to cope with
-concurrent insertions, which means that they have to be able to re-find
-their current scan position after re-acquiring the buffer content lock on
-page.  Since deletion is not possible while a reader holds the pin on bucket,
-and we assume that heap tuple TIDs are unique, this can be implemented by
-searching for the same heap tuple TID previously returned.  Insertion does
-not move index entries across pages, so the previously-returned index entry
-should always be on the same page, at the same or higher offset number,
-as it was before.
+To minimize lock/unlock traffic, hash index scan always searches the entire
+hash page to identify all the matching items at once, copying their heap tuple
+IDs into backend-local storage. The heap tuple IDs are then processed while not
+holding any page lock within the index thereby, allowing concurrent insertion
+to happen on the same index page without any requirement of re-finding the
+current scan position for the reader. We do continue to hold a pin on the
+bucket page, to protect against concurrent deletions and bucket split.
 
 To allow for scans during a bucket split, if at the start of the scan, the
 bucket is marked as bucket-being-populated, it scan all the tuples in that
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index d89c192..8550218 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,21 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
-	OffsetNumber offnum;
-	ItemPointer current;
 	bool		res;
 
 	/* Hash indexes are always lossy since we store only the hash code */
 	scan->xs_recheck = true;
 
 	/*
-	 * We hold pin but not lock on current buffer while outside the hash AM.
-	 * Reacquire the read lock here.
-	 */
-	if (BufferIsValid(so->hashso_curbuf))
-		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
-
-	/*
 	 * If we've already initialized this scan, we can just advance it in the
 	 * appropriate direction.  If we haven't done so yet, we call a routine to
 	 * get the first item in the scan.
 	 */
-	current = &(so->hashso_curpos);
-	if (ItemPointerIsValid(current))
+	if (!HashScanPosIsValid(so->currPos))
+		res = _hash_first(scan, dir);
+	else
 	{
 		/*
-

Re: [HACKERS] Windows warnings from VS 2017

2017-09-21 Thread Haribabu Kommi

On Thu, Sep 21, 2017 at 12:26 PM, Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
>
> On 09/20/2017 08:18 PM, Andrew Dunstan wrote:
> >
> > On 09/20/2017 07:54 PM, Tom Lane wrote:
> >> Andrew Dunstan  writes:
> >>> It's also warning that it will copy 16 bytes to a 13 byte structure at
> >>> lines 518, 1293 and 1294 of src/backend/commands/dbcommands.c. I
> haven't
> >>> seen any ill effects of this so far, but it seems to indicate that
> >>> something is possibly amiss on this compiler with the MemSet macros.
> >> That's weird.  Is it too stupid to figure out that the if() inside
> >> MemSet evaluates to constant false in these calls?  It seems hard to
> >> see how it would realize that the loop will write 16 bytes if it doesn't
> >> propagate the constant value forward.
> >>
> >> However ... on some other compilers, I've noticed that the compiler
> seems
> >> more likely to make "obvious" deductions of that sort if the variables
> in
> >> question are marked const.  Does it help if you do
> >>
> >> -void   *_vstart = (void *) (start); \
> >> -int _val = (val); \
> >> -Size_len = (len); \
> >> +void   * const _vstart = (void *) (start); \
> >> +const int   _val = (val); \
> >> +const Size  _len = (len); \
> >>
> >>
> >> I don't think there's any strong reason not to just do s/MemSet/memset/
> >> in these calls and nearby ones, but it would be good to understand just
> >> what's wrong here.  And why it's only showing up in that file; seems
> >> nearly certain that we have similar coding elsewhere.
> >>
> >>
> >
> > I'll test it.
> >
>
>
> Doesn't make a difference. I agree it would be good to understand what's
> going on.


These warnings are not produced when the build is run in DEBUG mode.
Because of this I didn't see these warning when I was working with VS 2017.

This may be an issue with VS 2017 compiler.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] SERIALIZABLE with parallel query

2017-09-21 Thread Thomas Munro

On Tue, Sep 19, 2017 at 1:47 PM, Haribabu Kommi
 wrote:
> During testing of this patch, I found some behavior difference
> with the support of parallel query, while experimenting with the provided
> test case in the patch.
>
> But I tested the V6 patch, and I don't think that this version contains
> any fixes other than rebase.
>
> Test steps:
>
> CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
> INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
>
> Session -1:
>
> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SELECT balance FROM bank_account WHERE id = 'Y';
>
> Session -2:
>
> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SET max_parallel_workers_per_gather = 2;
> SET force_parallel_mode = on;
> set parallel_setup_cost = 0;
> set parallel_tuple_cost = 0;
> set min_parallel_table_scan_size = 0;
> set enable_indexscan = off;
> set enable_bitmapscan = off;
>
> SELECT balance FROM bank_account WHERE id = 'X';
>
> Session -1:
>
> update bank_account set balance = 10 where id = 'X';
>
> Session -2:
>
> update bank_account set balance = 10 where id = 'Y';
> ERROR:  could not serialize access due to read/write dependencies among
> transactions
> DETAIL:  Reason code: Canceled on identification as a pivot, during write.
> HINT:  The transaction might succeed if retried.
>
> Without the parallel query of select statement in session-2,
> the update statement in session-2 is passed.

Hi Haribabu,

Thanks for looking at this!

Yeah.  The difference seems to be that session 2 chooses a Parallel
Seq Scan instead of an Index Scan when you flip all those GUCs into
parallelism-is-free mode.  Seq Scan takes a table-level predicate lock
(see heap_beginscan_internal()).  But if you continue your example in
non-parallel mode (patched or unpatched), you'll find that only one of
those transactions can commit successfully.

Using the fancy notation in the papers about this stuff where w1[x=42]
means "write by transaction 1 on object x with value 42", let's see if
there is an apparent sequential order of these transactions that makes
sense:

Actual order: r1[Y=0] r2[X=0] w1[X=10] w2[Y=10] ... some commit order ...
Apparent order A: r2[X=0] w2[Y=10] c2 r1[Y=0*] w1[X=10] c1 (*nonsense)
Apparent order B: r1[Y=0] w1[X=10] c1 r2[X=0*] w2[Y=10] c2 (*nonsense)

Both potential commit orders are nonsensical.  I think what happened
in your example was that a Seq Scan allowed the SSI algorithm to
reject a transaction sooner.  Instead of r2[X=0], the executor sees
r2[X=0,Y=0] (we scanned the whole table, as if we read all objects, in
this case X and Y, even though we only asked to read X).  Then the SSI
algorithm is able to detect a "dangerous structure" at w2[Y=10],
instead of later at commit time.

So I don't think this indicates a problem with the patch.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Noah Misch

On Sat, Sep 16, 2017 at 03:33:53PM -0700, Peter Geoghegan wrote:
> In summary, we're currently attaching the use of SortSupport to the
> wrong thing. We're treating this UTF-16 business as something that
> implies a broad OS/platform restriction, when in fact it should be
> treated as implying a restriction for one particular collation
> provider only (a collation provider that happens to be built into
> Windows, but isn't really special to us).
> 
> Attached patch shows what I'm getting at. This is untested, since I
> don't use Windows. Proceed with caution.

This is currently a v10 open item, but I think it doesn't qualify for that
treatment.  It's merely an opportunity for optimization, albeit an
attractively-simple one.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

79 matches

Mail list logo