Re: 2018-03 CFM

2018-03-06 Thread Andrey Borodin


> 7 марта 2018 г., в 12:00, Michael Paquier  написал(а):
> 
> On Wed, Mar 07, 2018 at 11:56:01AM +0500, Andrey Borodin wrote:
>> Actually, as for now, it is impossible to register patch at CF
>> 2018-09. At least I do not see the "new patch" button. May be CF
>> should be open if 2018-03 is in progress? 
> 
> OK, I was able to see the button but that may be due to my rights on the
> CF app.  I have switched the status to "Open".  Can you register your
> new patch now?
Yes, it works. Thanks! 
> (I would recommend helping with patch reviews instead of
> posting new things which are not bug fixes by the way as long as the CF
> is running.)
Sure, I'm reviewing some patches :)
There are patch movement from 03 to 09 in progress. And I decided to move my 
old things too. There's nothing new. :)

Best regards, Andrey Borodin.


Some message fixes

2018-03-06 Thread Kyotaro HORIGUCHI
Hello.

While translating message catalogs, I found that some messages
listing object names are missing partitioned tables/indices.
And GUC comment for autovacuum_work_mem has a typo.

1. some messages are missing partitioned table/index

  The first attached. I'm not sure how the orering ought to be
  but I arranged them in mainly in the appearance order in if()
  conditions, or the order of case label in switch()
  construct. One exception is ATExecChangeOwner, in which the
  order is the others regardless of the case label order just
  above.

  This is backpatchable to 10 but 10 doesn't have partitioned
  index so needs a different fix (Second attached).

  # But, I'm not sure if the list may be that long...

2. GUC comment for autovacuum_work_mem has a typo, "max num of
   parallel workers *than* can be..". (Third attached)

regards,  

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/comment.c b/src/backend/commands/comment.c
index 2f2e69b4a8..37b88b35ea 100644
--- a/src/backend/commands/comment.c
+++ b/src/backend/commands/comment.c
@@ -98,7 +98,7 @@ CommentObject(CommentStmt *stmt)
 relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
 ereport(ERROR,
 		(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-		 errmsg("\"%s\" is not a table, view, materialized view, composite type, or foreign table",
+		 errmsg("\"%s\" is not a table, view, materialized view, composite type, foreign table, or partitioned table",
 RelationGetRelationName(relation;
 			break;
 		default:
diff --git a/src/backend/commands/seclabel.c b/src/backend/commands/seclabel.c
index 5ee46905d8..06f99efaf9 100644
--- a/src/backend/commands/seclabel.c
+++ b/src/backend/commands/seclabel.c
@@ -114,7 +114,7 @@ ExecSecLabelStmt(SecLabelStmt *stmt)
 relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
 ereport(ERROR,
 		(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-		 errmsg("\"%s\" is not a table, view, materialized view, composite type, or foreign table",
+		 errmsg("\"%s\" is not a table, view, materialized view, composite type, foreign table, or partitioned table",
 RelationGetRelationName(relation;
 			break;
 		default:
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index ef3ca8c00b..932237477d 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -1671,7 +1671,7 @@ process_owned_by(Relation seqrel, List *owned_by, bool for_identity)
 			  tablerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
 			ereport(ERROR,
 	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-	 errmsg("referenced relation \"%s\" is not a table or foreign table",
+	 errmsg("referenced relation \"%s\" is not a table, foreign table, view, or partitioned table",
 			RelationGetRelationName(tablerel;
 
 		/* We insist on same owner and schema */
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index c4adfd569e..c254bd33fe 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -117,7 +117,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 			rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
 			ereport(ERROR,
 	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-	 errmsg("relation \"%s\" is not a table, foreign table, or materialized view",
+	 errmsg("relation \"%s\" is not a table, foreign table, materialized view, foreign table, or partitioned table",
 			RelationGetRelationName(rel;
 
 		/* You must own the relation to create stats on it */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 74e020bffc..fce9208319 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1889,7 +1889,7 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 			relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
 			ereport(ERROR,
 	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-	 errmsg("inherited relation \"%s\" is not a table or foreign table",
+	 errmsg("inherited relation \"%s\" is not a table, foreign table, or partitioned table",
 			parent->relname)));
 		/* Permanent rels cannot inherit from temporary ones */
 		if (relpersistence != RELPERSISTENCE_TEMP &&
@@ -2596,7 +2596,7 @@ renameatt_check(Oid myrelid, Form_pg_class classform, bool recursing)
 		relkind != RELKIND_PARTITIONED_TABLE)
 		ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("\"%s\" is not a table, view, materialized view, composite type, index, or foreign table",
+ errmsg("\"%s\" is not a table, view, materialized view, composite type, index, partitioned index, foreign table, or partitioned table",
 		NameStr(classform->relname;
 
 	/*
@@ -6285,7 +6285,7 @@ ATPrepSetStatistics(Relation rel, const char *colName, int16 colNum, Node *newVa
 		rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
 		ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- 

RE: Speed up the removal of WAL files

2018-03-06 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:mich...@paquier.xyz]
> On Wed, Mar 07, 2018 at 12:55:43AM +0900, Fujii Masao wrote:
> > So, what about, as another approach, making the checkpointer instead
> > of the startup process call RemoveNonParentXlogFiles() when
> > end-of-recovery checkpoint is executed? ISTM that a recovery doesn't
> > need to wait for
> > RemoveNonParentXlogFiles() to end. Instead, RemoveNonParentXlogFiles()
> > seems to have to complete before the checkpointer calls
> > RemoveOldXlogFiles() and creates .ready files for the "garbage" WAL files
> on the old timeline.
> > So it seems natual to leave that WAL recycle task to the checkpointer.
> 
> Couldn't that impact the I/O performance at the end of recovery until the
> first post-recovery checkpoint is completed?  Let's not forget that since
> 9.3 the end-of-recovery checkpoint is not triggered immediately, so there
> could be a delay.  If WAL segments of the past timeline are recycled without
> waiting for this first checkpoint to happen then there is no need to create
> new, zero-emptied, segments post-recovery, which can count as well.

Good point.  I understood you referred to PreallocXlogFiles(), which may create 
one new WAL file if RemoveNonParentXlogFiles() is not called or does not 
recycle WAL files in the old timeline.

The best hack (or a compromise/kludge?) seems to be:

1. Modify durable_xx() functions so that they don't fsync directory hanges when 
enableFsync is false.

2. RemoveNonParentXlogFiles() sets enableFsync to false before the while loop, 
restores the original value of it after the while loop, and fsync pg_wal/ just 
once.
What do you think?


Regards
Takayuki Tsunakawa






Re: [WIP PATCH] Index scan offset optimisation using visibility map

2018-03-06 Thread Andrey Borodin


> 7 марта 2018 г., в 3:25, Tels  написал(а):
> 
> It could be named "SkipTuples" (e.g. this is the number of tuples we need
> to skip, not the number we have skipped), and the other one then
> "iss_SkipTuplesRemaining" so they are consistent with each other.

I agree that name sounds strange (even for my globish ear).

I'm not sure, but may be this 
!   Assert(!(scandesc->numberOfOrderBys > 0 && 
scandesc->xs_recheckorderby));
should be if() elog(ERROR,...); ?
Also, I think that this check could be removed from loop. We do not expect that 
it's state will change during execution, do we?

Besides this, I think the patch is ready for committer.

Best regards, Andrey Borodin.


Missing break statement after transformCallStmt in transformStmt

2018-03-06 Thread Ashutosh Bapat
HI,
Commit 76b6aa41f41db66004b1c430f17a546d4102fbe7 a new case for
CallStmt in transformStmt but forgot to add a break statement at the
end of the case. This doesn't create any problems since the default
case to which it continues without break doesn't change the result.
But this is going to cause problems in future when somebody adds a
case after CallStmt.

Here's patch with fix.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c3a9617..cf1a34e 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -325,6 +325,7 @@ transformStmt(ParseState *pstate, Node *parseTree)
 		case T_CallStmt:
 			result = transformCallStmt(pstate,
 	   (CallStmt *) parseTree);
+			break;
 
 		default:
 


Re: Comment on top of RangeVarGetRelidExtended in namespace.c mentioning RangeVarGetRelid

2018-03-06 Thread Michael Paquier
On Tue, Mar 06, 2018 at 11:37:15PM -0500, Stephen Frost wrote:
> Thanks, fix pushed.

Thanks, Stephen.
--
Michael


signature.asc
Description: PGP signature


Re: BUG #14941: Vacuum crashes

2018-03-06 Thread Michael Paquier
On Mon, Mar 05, 2018 at 09:55:13PM +, Bossart, Nathan wrote:
> On 3/3/18, 6:13 PM, "Andres Freund"  wrote:
>> I was working on committing 0002 and 0003, when I noticed that the
>> second patch doesn't actually fully works.  NOWAIT does what it says on
>> the tin iff the table is locked with a lower lock level than access
>> exclusive.  But if AEL is used, the command is blocked in
>>
>> static List *
>> expand_vacuum_rel(VacuumRelation *vrel)
>> ...
>>  /*
>>   * We transiently take AccessShareLock to protect the syscache 
>> lookup
>>   * below, as well as find_all_inheritors's expectation that the 
>> caller
>>   * holds some lock on the starting relation.
>>   */
>>  relid = RangeVarGetRelid(vrel->relation, AccessShareLock, 
>> false);
>>
>> ISTM has been added after the patches initially were proposed. See
>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=19de0ab23ccba12567c18640f00b49f01471018d
>>
>> I'm a bit disappointed that the tests didn't catch this, they're clearly
>> not fully there. They definitely should test the AEL case, as
>> demonstrated here.
> 
> Sorry about that.  I've added logic to handle ACCESS EXCLUSIVE in v6 of 0002,
> and I've extended the tests to cover that case.

Hm, yes.  I have also let those patches rot a bit myself.  Sorry for
that.

>> Independent of that, I'm also concerned that NOWAIT isn't implemented
>> consistently with other commands. Aren't we erroring out in other uses
>> of NOWAIT?  ISTM a more appropriate keyword would have been SKIP
>> LOCKED.  I think the behaviour makes sense, but I'd rename the internal
>> flag and the grammar to use SKIP LOCKED.
> 
> Agreed.  I've updated 0002 to use SKIP LOCKED instead of NOWAIT.

So, NOWAIT means "please try to take a lock, don't wait for it and issue
an error immediately if the lock cannot be taken".  SKIP_LOCKED means
"please try to take a lock, don't wait for it, but do not issue an error
if the lock cannot be taken and move on to next steps".  I agree that
this is more consistent.

+   if (!(options & VACOPT_SKIP_LOCKED))
+   relid = RangeVarGetRelid(vrel->relation, AccessShareLock,
false);
+   else
+   {
+   relid = RangeVarGetRelid(vrel->relation, NoLock, false);
Yeah, I agree with Andres that getting all this logic done in
RangeVarGetRelidExtended would be cleaner.  Let's see where the other
thread leads us to:
https://www.postgresql.org/message-id/20180306005349.b65whmvj7z6hbe2y%40alap3.anarazel.de

+   /*
+* We must downcase the statement type for log message
consistency
+* between expand_vacuum_rel(), vacuum_rel(), and analyze_rel().
+*/
+   stmttype_lower = asc_tolower(stmttype, strlen(stmttype));
This blows up on multi-byte characters and may report incorrect relation
name if double quotes are used, no?

+   /*
+* Since autovacuum workers supply OIDs when calling vacuum(), no
+* autovacuum worker should reach this code, and we can make
+* assumptions about the logging levels we should use in the
checks
+* below.
+*/
+   Assert(!IsAutoVacuumWorkerProcess());
This is a good idea, should be a separate patch as other folks tend to
be confused with relid handling in expand_vacuum_rel().

+  Specifies that VACUUM should not wait for any
+  conflicting locks to be released: if a relation cannot be locked
+  immediately without waiting, the relation is skipped
Should this mention as well that no errors are raised, allowing the
process to move on with the next relations?
--
Michael


signature.asc
Description: PGP signature


Re: Comment on top of RangeVarGetRelidExtended in namespace.c mentioning RangeVarGetRelid

2018-03-06 Thread Stephen Frost
Greetings Michael,

* Michael Paquier (mich...@paquier.xyz) wrote:
> While reviewing the NOWAIT patch for VACUUM (well, SKIP_LOCKED per these
> day's trends), I bumped into $subject.
> 
> Attached is a one-liner.

Thanks, fix pushed.

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-06 Thread Ashutosh Bapat
On Tue, Mar 6, 2018 at 7:52 PM, Jeevan Chalke
 wrote:

>
>
> Changes look good to me and refactoring will be useful for partitionwise
> patches.
>
> However, will it be good if we add agg_costs into the GroupPathExtraData
> too?
> Also can we pass this to the add_partial_paths_to_grouping_rel() and
> add_paths_to_grouping_rel() to avoid passing can_sort, can_hash and costs
> related details individually to them?

I think so too.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] path toward faster partition pruning

2018-03-06 Thread Robert Haas
On Tue, Mar 6, 2018 at 8:34 PM, David Rowley
 wrote:
> One thing I've learned in my time working with PostgreSQL is that, if
> there's a known hole, someone's probably going to fall down it
> eventually.  I like working with PostgreSQL because we're pretty
> careful to not make holes that people can fall down, or if there is
> some hole that cannot be filled in, we try to put a fence around it
> with a sign, (e.g rename pg_xlog to pg_wal).  I'm not strongly opposed
> to your ideas, I probably don't have a complete understanding of the
> idea anyway. But from what I understand it looks like you want to take
> something that works quite well and make it work less well, and there
> appears not to be a good reason provided of why you want to do that.
>
> Is it because you want to simplify the patch due to concerns about it
> being too much logic to get right for PG11?

My understanding is that the patch as submitted is fundamentally
broken in multiple ways.

As Amit said a few emails upthread, "So while the patch's previous
approach to convert the query's constant value to the desired type was
wrong, this is wronger. :-("  I agree with that analysis.  As I tried
to explain in my last email, if you've got something like foo >
'blarfle'::type1 and foo > 'hoge'::type2, there may actually be no way
at all to determine which of those clauses is more restrictive.  The
fact that > was used in the query to compare foo with a value of type1
and, separately, with a value of type2 means that those operators
exist, but it does not follow that the opfamily provides an operator
which can compare type1 to type2.  As far as I can see, what this
means is that, in general, the approach the patch takes to eliminating
redundant clauses just doesn't work; and in the general case I don't
think there's much hope of saving it.  The question of whether the
patch does too much work at execution time or not is maybe arguable --
my position is that it does -- but working properly in the face of
cross-type comparisons is non-negotiable.

The use of evaluate_expr() is also completely wrong and has got to be
fixed.  I already wrote about that upthread so I won't repeat it here.
I'm pretty sure that the current design, if allowed to stand, would
have lots of bad consequences.

As I understand it, Amit is currently hacking on the patch to try to
fix these issues.  If he comes up with something that works properly
with cross-type comparisons and doesn't abuse evaluate_expr() but
still does more work than I'd ideally prefer at execution time, I'll
consider holding my nose and consider it anyway.  But considering the
amount of rework that I think is needed, I don't really see why we
wouldn't adopt a design that minimizes execution time work, too.

In short, I don't think I'm trying to make something that works quite
well work less well, because I don't think the patch as it stands can
be correctly described as working quite well.

> Let's say there was some view like:
>
> CREATE VIEW vw_ledger_2018 AS SELECT * FROM ledger WHERE postdate
> BETWEEN '2018-01-01' AND '2018-12-13';
>
> And a user comes along and does:
>
> SELECT * FROM vw_ledger_2018 WHERE postdate BETWEEN '2018-03-01' AND
> '2018-03-31'
>
> If we just take the first from each op strategy then we'll not have
> managed to narrow the case down to just the March partition. You might
> argue that this should be resolved at some higher level in the
> planner, but that does nothing for the run-time pruning case.

Yeah, that's a good example of how this could happen in real life.

So far I see three ways forward here:

1. If we've got multiple redundant quals, ignore all but one of them
for purposes of partition pruning.  Hope users don't get mad.

2. If we've got multiple redundant quals, do multiple checks.  Hope
this isn't annoyingly slow (or too much ugly code).

3. If we've got multiple redundant quals but no cross-type operators
are in use, evaluate all of the expressions and pick the highest or
lowest value as appropriate.  Otherwise fall back to #1 or #2.  For
bonus points, do this when cross-type operators ARE in use and the
additional cross-type operators that we need to figure out the highest
or lowest value, as appropriate, is also available.

I'm OK with any of those approaches; that is, I will not seek to block
a merely patch on the basis of which of those options it chooses.  I
think they are all defensible.  Options that are not OK include (a)
trying to cast a value of one type to another type, because that could
turn a query that would have simply returned no rows into an error
case or (b) supposing that all types in an opfamily are binary
coercible to each other, because that's just plain wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: GSOC 2018 ideas

2018-03-06 Thread Charles Cui
2018-03-05 1:42 GMT-08:00 Aleksander Alekseev :

> Hello Charles,
>
> >Went through the documents listed by you, and they are helpful!
> > It seems the main purpose of extension pg_protobuf is to parse
> > a protobuf struct and return the decoded field. May I ask how these kinds
> > of extensions are used in postgreSQL (or in other words, the scenarios to
> > use these plugins)?
>
> There are a few ideas behind all of this.
>
> 1) Sometimes people are not quite happy with strict relational schema by
> various reasons and prefer something more agile, like XML or JSON. These
> formats are indeed more convenient under certain circumstances, for
> instance in terms of ease of changing and migrating the schema.
>
> 2) One drawback of JSON is redundancy. For instance, you have to store
> the names of all document fields. These names don't carry much
> information but consume disk space and RAM thus affecting the overall
> performance. ZSON extension [1] partially solved this issue. However I
> wouldn't call it particularly convenient and the whole approach of
> compressing JSON seems to me more like a dirty hack, not a solution. The
> problem appeared because of using the wrong data format in the first
> place.
>
> 3) Unlike JSON, formats like Protobuf or Thrift are binary formats and
> most importantly don't store any field names. Thus they don't create a
> problem described above. However, PostgreSQL is not capable to access
> Protobuf fields out-of-the-box, for instance to index these fields. This
> is what pg_protobuf is for.
>
> The idea of using flexible schema and build index on top of them is
awesome!
Will definitely submit a proposal and focus on this if get selected.
Thanks for answering my questions.


> Hopefully this answers you question. If you have other questions please
> don't hesitate to ask!
>
> [1]: https://github.com/postgrespro/zson
>
>
> --
> Best regards,
> Aleksander Alekseev
>


Re: Server won't start with fallback setting by initdb.

2018-03-06 Thread Stephen Frost
Greetings Tom Peter, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Peter Eisentraut  writes:
> > On 3/4/18 15:31, Tom Lane wrote:
> >> Then, seeing that the factory defaults are ReservedBackends = 3 and
> >> max_wal_senders = 10, something's got to give; there's no way that
> >> max_connections = 10 can work with those.  But what I would argue is that
> >> of those three choices, the least defensible one is max_wal_senders = 10.
> >> Where did that come from?
> 
> > Let's see.  A typical installation might need:
> 
> > 1 for pg_receivewal for continuous backup
> > 2 for pg_basebackup
> > 2 for if pg_basebackup gets interrupted and it takes 2 hours to free the
> > TCP/IP connections
> > 1 for a standby connection
> > 1 for a second standby connection, for making infrastructure changes
> 
> That's "typical"?  It sounds like a major installation to me, one that
> would certainly have had to fool with more settings than just
> max_wal_senders.  Two concurrent pg_basebackups running at all times
> seems particularly dubious.
> 
> If we drop the assumption of 2 concurrent pg_basebackups, then your
> math would lead to a value of 5, which I'd be OK with.

Changing the defaults to go back down strikes me as an entirely wrong
approach after we've had a release with the higher defaults without
seriously compelling arguments against, and I don't agree that we've had
such a case made here.  If this discussion had happened before v10 was
released, I'd be much more open to going with the suggestion of '5', but
forcing users to update their configs for working environments because
we've decided that the default of 10 was too high is just pedantry, in
my opinion.

The original patch proposed strikes me as entirely reasonable, though
given the rarity of seeing a max_connections below 100 in the wild,
unlikely to have any impact on real users.  On the other hand, changing
this default setting will actively *break* user environments which are
working today for very questionable benefit- and in a difficult to
realize manner.  A user could pg_upgrade and not have any immediate
issues until a weekly cronjob fails or similar.  The current value is in
the wild and we've not had reports of performance issues, as best as I
can recall, and in reviewing the various places where max_wal_senders is
used, it seems unlikely that a value of 10 is going to be seriously
worse than a value of 5.  If such a case arises, which seems very likely
to be the exception rather than the rule, encouraging them to reduce
max_wal_senders is a straight-forward answer.

Thanks!

Stephen


signature.asc
Description: PGP signature


Comment on top of RangeVarGetRelidExtended in namespace.c mentioning RangeVarGetRelid

2018-03-06 Thread Michael Paquier
Hi all,

While reviewing the NOWAIT patch for VACUUM (well, SKIP_LOCKED per these
day's trends), I bumped into $subject.

Attached is a one-liner.
Thanks,
--
Michael
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 65e271a8d1..52dd400b96 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -202,7 +202,7 @@ static bool MatchNamedCall(HeapTuple proctup, int nargs, 
List *argnames,
 
 
 /*
- * RangeVarGetRelid
+ * RangeVarGetRelidExtended
  * Given a RangeVar describing an existing relation,
  * select the proper namespace and look up the relation OID.
  *


signature.asc
Description: PGP signature


Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2018-03-06 Thread Stephen Frost
Greetings Robert, Ashutosh, Arthur, Etsuro, all,

* Arthur Zakirov (a.zaki...@postgrespro.ru) wrote:
> On Tue, Mar 06, 2018 at 08:09:50PM +0900, Etsuro Fujita wrote:
> > Agreed.  I added a comment to that function.  I think that that comment in
> > combination with changes to the FDW docs in the patch would help FDW authors
> > understand why that is needed.  Please find attached an updated version of
> > the patch.
> 
> Thank you.
> 
> All tests pass, the documentation builds. There was the suggestion [1]
> of different approach. But the patch fix the issue in much more simple
> way.
> 
> Marked as "Ready for Commiter".
>
> 1 - 
> https://www.postgresql.org/message-id/20171005.200631.134118679.horiguchi.kyotaro%40lab.ntt.co.jp

Thanks,  I've looked through this patch and thread again and continue to
feel that this is both a good and sensible improvment and that the patch
is in pretty good shape.

The remaining question is if the subsequent discussion has swayed the
opinion of Robert and Ashutosh.  If we can get agreement that these
semantics are acceptable and an improvement over the status quo then I'm
happy to try and drive this patch to commit.

Robert, Ashutosh?

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: PATCH: Configurable file mode mask

2018-03-06 Thread Michael Paquier
On Tue, Mar 06, 2018 at 01:32:49PM -0500, David Steele wrote:
> On 3/5/18 10:46 PM, Michael Paquier wrote:
> Ah, I see what you mean now.  Fixed using follow_fast.  This variant
> claims to be faster and it doesn't matter if files are occasionally
> checked twice.

Fine for me.  I can see this option present in perl 5.8.8 as well, so we
should be good.

>> Those two are separate issues.  Could you begin a new thread on the
>> matter?  This will attract more attention.
> 
> OK, I'll move it back for now, and make a note to raise the position as
> a separate issue.  I'd rather not do it in this fest, though.

Seems like you forgot to re-add the chmod calls in initdb.c.

>> -   if (chmod(location, S_IRWXU) != 0)
>> +   current_umask = umask(0);
>> +   umask(current_umask);
>> [...]
>> -   if (chmod(pg_data, S_IRWXU) != 0)
>> +   if (chmod(pg_data, PG_DIR_MODE_DEFAULT & ~file_mode_mask) != 0)
>> Such modifications should be part of patch 3, not 2, where the group
>> features really apply.
> 
> The goal of patch 2 is to refactor the way permissions are set by
> replacing locally hard-coded values with centralized values, so I think
> this makes sense here.

Hm.  I would have left that out in this first version, now I am fine to
defer that to a committer's opinion as well.

>> my $test_mode = shift;
>> 
>> +   umask(0077);
> 
> Added. (Ensure that all files are created with default data dir
> permissions).

+   # Ensure that all files are created with default data dir permissions
+   umask(0077);
I have stared at this comment two minutes to finally understand that
this refers to the extra files which are part of this test.  It may be
better to be a bit more precise here.  I thought first that this
referred as well to setup_cluster calls...

>> Patch 2 begins to look fine for me.
> 
> I also made chmod_recursive() generic.

Likely this name is not worth worrying with conflicts in CPAN stuff ;)

+# Ensure all permissions in the pg_data directory are
correct. Permissions
+# should be dir = 0700, file = 0600.
+sub check_pg_data_perm
+{
check_permission_recursive() would be a more adapted name.  Stuff in
TestLib.pm is not necessarily linked to data folders.

sub clean_rewind_test
{
+   ok (check_pg_data_perm($node_master->data_dir(), 0));
+
$node_master->teardown_node  if defined $node_master;
I have also a pending patch for pg_rewind which adds read-only files in
the data folders of a new test, so this would cause this part to blow
up.  Testing that for all the test sets does not bring additional value
as well, and doing it at cleanup phase is also confusing.  So could you
move that check into only one test's script?  You could remove the umask
call in 003_extrafiles.pl as well this way, and reduce the patch diffs.

+ if ($file_mode != 0600)
+ {
+ print(*STDERR, "$File::Find::name mode must be 0600\n");
+
+ $result = 0;
+ return
+ }
0600 should be replaced by $expected_file_perm, and isn't a ';' missing
for each return "clause" (how does this even work..)?

Patch 2 is getting close for a committer lookup I think, still need to
look at patch 3 in details..  And from patch 3:
# Expected permission
-   my $expected_file_perm = 0600;
-   my $expected_dir_perm = 0700;
+   my $expected_file_perm = $allow_group ? 0640 : 0600;
+   my $expected_dir_perm = $allow_group ? 0750 : 0700;
Or $expected_file_perm and $expected_dir_perm could just be used as
arguments.
--
Michael


signature.asc
Description: PGP signature


Re: Suspicious call of initial_cost_hashjoin()

2018-03-06 Thread Peter Eisentraut
On 3/2/18 05:01, Thomas Munro wrote:
> On Fri, Mar 2, 2018 at 9:06 PM, Antonin Houska  wrote:
>> David Steele  wrote:
>>> Does this look right to you?
>>
>> Yes, this is what I meant. The patch applies cleanly and the code compiles
>> well.
>>
>>> If so, can you sign up as a reviewer and mark it Ready for Committer?
>>
>> Done.
> 
> Thanks.
> 
>> Actually I think it'd be nice if the "parallel_hash" argument was mentioned 
>> in
>> the header comment of initial_cost_hashjoin() function, but not sure this is
>> worth returning the patch to the author.
> 
> Done.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: JIT compiling with LLVM v11

2018-03-06 Thread Thomas Munro
On Tue, Mar 6, 2018 at 10:39 PM, Andres Freund  wrote:
> [more commits]

+* OSX prefixes all object level symbols with an underscore. But neither

"macOS" (see commit da6c4f6c and all mentions since).

make check at today's HEAD of your jit branch crashes on my FreeBSD
box.  The first thing to crash is this query from point.sql:

LOG:  server process (PID 87060) was terminated by signal 4: Illegal instruction
DETAIL:  Failed process was running: SELECT '' AS thirtysix, p1.f1 AS
point1, p2.f1 AS point2, p1.f1 <-> p2.f1 AS dist
   FROM POINT_TBL p1, POINT_TBL p2
   ORDER BY dist, p1.f1[0], p2.f1[0];

Unfortunately when I tried to load the core file into lldb, the stack
is like this:

* thread #1, name = 'postgres', stop reason = signal SIGILL
  * frame #0: 0x000800e7c1ea

Apparently the generated code is nuking the stack and executing
garbage?  I don't have time to investigate right now, and this may
indicate something busted in my environment, but I thought this might
tell you something.

These variants of that query don't crash (even though I set
jit_above_cost = 0 and checked that it's actually JIT-ing), which
might be clues:

  -- no p1.f1 <-> p2.f1
  SELECT p1.f1 AS point1, p2.f1 AS point2
   FROM POINT_TBL p1, POINT_TBL p2
   ORDER BY p1.f1[0], p2.f1[0];

  -- no join
  SELECT p1.f1 <-> p1.f1 AS dist
   FROM POINT_TBL p1
   ORDER BY 1;

These variants do crash:

  -- p1.f1 <-> p2.f1 in order by, but not select list
  SELECT p1.f1 AS point1, p2.f1 AS point2
   FROM POINT_TBL p1, POINT_TBL p2
   ORDER BY p1.f1 <-> p2.f1, p1.f1[0], p2.f1[0];

  -- p1.f1 <-> p2.f1 in select list, but not in order by
  SELECT p1.f1 AS point1, p2.f1 AS point2, p1.f1 <-> p2.f1 AS dist
   FROM POINT_TBL p1, POINT_TBL p2
   ORDER BY p1.f1[0], p2.f1[0];

  -- simple, with a join
  SELECT p1.f1 <-> p1.f1 AS dist
   FROM POINT_TBL p1, POINT_TBL p2
   ORDER BY 1;

I build it like this:

./configure \
  --prefix=$HOME/install/ \
  --enable-tap-tests \
  --enable-cassert \
  --enable-debug \
  --enable-depend \
  --with-llvm \
  CC="ccache cc" CFLAGS="-O0" CXX="ccache c++" CXXFLAGS="-std=c++11" \
  CLANG=/usr/local/llvm50/bin/clang \
  LLVM_CONFIG=/usr/local/llvm50/bin/llvm-config \
  --with-libraries="/usr/local/lib" \
  --with-includes="/usr/local/include"

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [doc fix] Correct calculation of vm.nr_hugepages

2018-03-06 Thread Peter Eisentraut
On 2/26/18 08:25, Vasundhar Boddapati wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:tested, passed
> 
> It works. Can be Merged.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: public schema default ACL

2018-03-06 Thread Stephen Frost
Greetings Tom, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > On Sat, Mar 3, 2018 at 4:56 AM, Noah Misch  wrote:
> >> I propose, for v11, switching to "GRANT USAGE ON SCHEMA
> >> public TO PUBLIC" (omit CREATE).  Concerns?  An alternative is to change 
> >> the
> >> default search_path to "$user"; that would be break more applications, and 
> >> I
> >> don't see an advantage to compensate for that.
> 
> > Isn't this going to cause widespread breakage?  Unprivileged users
> > will suddenly find that they can no longer create tables, because
> > $user doesn't exist and they don't have permission on public.  That
> > seems quite unfriendly.
> 
> Well, the fundamental problem here is that the arrangements around schema
> public were set up to allow a smooth transition from the pre-7.3
> no-schemas world, not to provide any kind of security.  If we want to use
> schemas for security then we're going to have to do *something* that's not
> compatible.  Or we can continue to ship an insecure default configuration,
> but I recall many people arguing against that sort of choice in the past.

I concur that this is the fundamental issue and that the privilege
system around schemas weren't considered due to the desire to provide a
smooth transition, but we are quite a long way from 7.3 and there's
abundent evidence that the current defaults are insecure by default.

I'll point out that a number of our *other* defaults are also insecure
(pg_hba.conf entries with 'trust' being particulalrly bad).  Those have
been worked around by packagers, but that really isn't ideal.  I'd love
to see us ship an actually secure (or even just reasonable, frankly...)
default configuration.

> I wonder whether it'd be sensible for CREATE USER --- or at least the
> createuser script --- to automatically make a matching schema.  Or we
> could just recommend that DBAs do so.  Either way, we'd be pushing people
> towards the design where "$user" does exist for most/all users.  Our docs
> comment (section 5.8.7) that "the concepts of schema and user are nearly
> equivalent in a database system that implements only the basic schema
> support specified in the standard", so the idea of automatically making
> a schema per user doesn't seem ridiculous on its face.  (Now, where'd I
> put my flameproof long johns ...)

You are not the first to think of this in recent days, and I'm hopeful
to see others comment in support of this idea.  For my 2c, I'd suggest
that what we actually do is have a new role attribute which is "when
this user connects to a database, if they don't have a schema named
after their role, then create one."  Creating the role at CREATE ROLE
time would only work for the current database, after all (barring some
other magic that allows us to create schemas in all current and future
databases...).

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] make async slave to wait for lsn to be replayed

2018-03-06 Thread Michael Paquier
On Tue, Mar 06, 2018 at 02:24:24PM +0300, Ivan Kartyshov wrote:
> Hello, I now is preparing the patch over syntax that Simon offered. And in
> few day I will update the patch.
> Thank you for your interest in thread.

It has been more than one month since a patch update has been requested,
and time is growing short.  This refactored patch introduces a whole new
concept as well, so my recommendation would be to mark this patch as
returned with feedback, and then review it freshly for v12 if this
concept is still alive and around.
--
Michael


signature.asc
Description: PGP signature


Re: Speed up the removal of WAL files

2018-03-06 Thread Michael Paquier
On Wed, Mar 07, 2018 at 12:55:43AM +0900, Fujii Masao wrote:
> So, what about, as another approach, making the checkpointer instead of
> the startup process call RemoveNonParentXlogFiles() when end-of-recovery
> checkpoint is executed? ISTM that a recovery doesn't need to wait for
> RemoveNonParentXlogFiles() to end. Instead, RemoveNonParentXlogFiles()
> seems to have to complete before the checkpointer calls RemoveOldXlogFiles()
> and creates .ready files for the "garbage" WAL files on the old timeline.
> So it seems natual to leave that WAL recycle task to the checkpointer.

Couldn't that impact the I/O performance at the end of recovery until
the first post-recovery checkpoint is completed?  Let's not forget that
since 9.3 the end-of-recovery checkpoint is not triggered immediately,
so there could be a delay.  If WAL segments of the past timeline are
recycled without waiting for this first checkpoint to happen then there
is no need to create new, zero-emptied, segments post-recovery, which
can count as well.
--
Michael


signature.asc
Description: PGP signature


Re: check error messages in SSL tests

2018-03-06 Thread Michael Paquier
On Tue, Mar 06, 2018 at 02:54:16PM -0500, Peter Eisentraut wrote:
> This contains a slight problem:  The tests contain two different
> branches, depending on whether tls-server-end-point is supported.  But
> these branches run a different number of tests, so the test count of the
> top of the test file might be wrong.  Here is a patch that restructures
> this to count the tests more dynamically.

Not sure how I missed that.  The error is obvious as command_ok triggers
one test and command_fails_like does two of them.  Test::More also
documents what you are using here, so the patch looks fine to me.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] path toward faster partition pruning

2018-03-06 Thread David Rowley
On 7 March 2018 at 10:15, Robert Haas  wrote:
> On Fri, Mar 2, 2018 at 7:32 PM, David Rowley
>  wrote:
>> It appears to me, for your method to work we'd need to try every
>> combination of the clauses matching each partition key, which in this
>> case is 3 * 3 * 3 searches. Amit's current method is 1 search, after
>> the clause reduction which is 3 + 3 + 3 (O(N) per partition key)
> [...]
>> With that considered, is it still a good idea to do it this way?
>
> I dunno.  What do you think?
>
> That case is indeed pretty unfortunate, but it's also pretty
> artificial.  It's not obvious to me that we shouldn't care about it,
> but it's also not obvious to me that we should.  If we have some
> bizarre cases that slip through the cracks or don't perform terribly
> well, maybe nobody would ever notice or care.  On the other hand,
> maybe they would.

One thing I've learned in my time working with PostgreSQL is that, if
there's a known hole, someone's probably going to fall down it
eventually.  I like working with PostgreSQL because we're pretty
careful to not make holes that people can fall down, or if there is
some hole that cannot be filled in, we try to put a fence around it
with a sign, (e.g rename pg_xlog to pg_wal).  I'm not strongly opposed
to your ideas, I probably don't have a complete understanding of the
idea anyway. But from what I understand it looks like you want to take
something that works quite well and make it work less well, and there
appears not to be a good reason provided of why you want to do that.

Is it because you want to simplify the patch due to concerns about it
being too much logic to get right for PG11?

> One thing that we could do is just only accept one clause for each
> column-strategy pairing, presumably either the first one or the last
> one.

The problem with that is it can cause surprising behaviour. We reorder
clauses and clauses get pushed down from upper parts of the query.

Let's say there was some view like:

CREATE VIEW vw_ledger_2018 AS SELECT * FROM ledger WHERE postdate
BETWEEN '2018-01-01' AND '2018-12-13';

And a user comes along and does:

SELECT * FROM vw_ledger_2018 WHERE postdate BETWEEN '2018-03-01' AND
'2018-03-31'

We're going to end up with base quals something like: postdate >=
'2018-01-01' AND postdate <= '2018-12-31' AND postdate >= '2018-03-01'
AND postdate <= '2018-03-31'

If we just take the first from each op strategy then we'll not have
managed to narrow the case down to just the March partition. You might
argue that this should be resolved at some higher level in the
planner, but that does nothing for the run-time pruning case.

I don't really want to do or say anything that jeopardises this patch
from getting into PG11, so if the path of least resistance is to go
with the option you've proposed then I'd much rather that than this
getting pushed out to PG12.  I really just want to try to make sure
we've thought of everything before we create too many surprises for
users.

Perhaps a compromise would be to check all quals from the first
partition key and only the first or last one from the remaining keys.
I imagine most cases will have just 1 key anyway. This would
significantly reduce the number of possible combinations of quals to
try, but unfortunately, it still does have that element of surprise.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] [Patch] Log SSL certificate verification errors

2018-03-06 Thread Michael Paquier
On Tue, Mar 06, 2018 at 02:56:41PM -0500, Peter Eisentraut wrote:
> But this patch was quite useful while debugging some of the other
> ongoing SSL work.  So if it were revived at some point in the future, it
> would be welcome.

+1!
--
Michael


signature.asc
Description: PGP signature


Re: Index-only scan returns incorrect results when using a composite GIST index with a gist_trgm_ops column.

2018-03-06 Thread Kyotaro HORIGUCHI
At Thu, 01 Mar 2018 15:39:18 -0500, Tom Lane  wrote in 
<22609.1519936...@sss.pgh.pa.us>
> Kyotaro HORIGUCHI  writes:
> > At Wed, 24 Jan 2018 00:13:51 +0300, Sergei Kornilov  wrote 
> > in <348951516742...@web54j.yandex.ru>
> >> Should we also make backport to older versions? I test on REL_10_STABLE - 
> >> patch builds and works ok, but "make check" fails on new testcase with 
> >> error:
> > CREATE INDEX ON t USING gist (a test_inet_ops, a inet_ops);
> > + ERROR:  missing support function 4 for attribute 1 of index "t_a_a1_idx"
> >> and with different explain results.
> 
> > Thank you for checking that. d3a4f89 allowed that and
> > inet_gist_decompress is removed at the same time. Unfortunately I
> > didn't find a type on master that has both decompress and fetch
> > methods so I prefer to split the regression patch among target
> > versions.
> 
> I pushed this fix with minor adjustments.  I did not like the proposed

Thank you.

> regression test at all: it was overcomplicated and the need for different
> versions for different back branches wasn't fun either.  After some poking
> around I found that the bug could be exhibited using just btree_gist's
> gist_inet_ops, since the core inet_ops class indexes the same datatype and
> it does have a fetch function.  So I added a test case in btree_gist.

Ah, It wasn't in my sight to test core in contrib. Thanks for
improving it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Hash Joins vs. Bloom Filters / take 2

2018-03-06 Thread Patrick Krecker
On Thu, Mar 1, 2018 at 4:04 PM, David Steele  wrote:
> On 3/1/18 6:52 PM, Tomas Vondra wrote:
>>
>> On 03/02/2018 12:31 AM, Andres Freund wrote:
>>>
>>>
>>>
>>> On March 1, 2018 3:22:44 PM PST, Tomas Vondra
>>>  wrote:



 On 03/01/2018 11:01 PM, Andres Freund wrote:
>
> Hi,
>
> On 2018-02-20 22:23:54 +0100, Tomas Vondra wrote:
>>
>> So I've decided to revive the old patch, rebase it to current

 master,
>>
>> and see if we can resolve the issues that killed it in 2016.
>
>
> There seems to be some good discussion in the thread. But the patch
> arrived just before the last commitfest and certainly isn't a trivial
> cleanup patch. Therefore I think it should be moved to the next CF?
>

 It isn't a massive invasive patch either, though, so I object to moving
 it to 2018-09 right away.
>>>
>>>
>>> Why do we have rules around not submitting large stuff to the last
>>> cf, if we just not follow through? We're neck deep in patches that
>>> are older. And you've already gotten a fair bit of feedback..
>>>
>>
>> It was not my intention to break (or even bend) the CF rules, of course.
>> I haven't considered the patch to be "large stuff", while you do. I see
>> Peter Geoghegan agrees with your conclusion on another thread, so go
>> ahead and move it to 2018-09.
>
>
> After reviewing the thread I also agree that this should be pushed to
> 2018-09, so I have done so.
>
> I'm very excited by this patch, though.  In general I agree with Peter that
> a higher rate of false positives is acceptable to save memory.  I also don't
> see any reason why this can't be tuned with a parameter. Start with a
> conservative default and allow the user to adjust as desired.
>
> --
> -David
> da...@pgmasters.net
>

Hi All --

I'm curious what has to be done to move this patch along. I looked
through the patched and I noticed that the work for deciding whether to
instantiate the bloom filter in single-batch mode is not complete yet
(or if it's in this change, I can't find it), contrary to this
comment:

+* When starting in a single-batch mode, we do nothing initially.
+* If the whole hash table fits into a single batch, we can get
+* sufficiently accurate ndistinct estimate by simply counting
+* occupied buckets (thanks to shooting for NTUP_PER_BUCKET=1),
+* or perhaps we could use something more elaborate (e.g. HLL).
+* But we only build the bloom filter if the hash table is large
+* enough to exceed on-CPU caches (e.g. 4MB).

I did some basic benchmarking with two tables and a simple where
clause filtering 90% of rows and it does yield about a 1.2x speedup in
my tests. In the pathological case where two tables are joined on a
pk/fk relationship with no filtering, the penalty seems to be about
1-2%.

Patrick



Re: TAP test module - PostgresClient

2018-03-06 Thread Kyotaro HORIGUCHI
At Sat, 3 Mar 2018 09:46:11 -0500, Peter Eisentraut 
 wrote in 
<7f1e5f2f-4902-2c29-de82-381de8cc6...@2ndquadrant.com>
> On 3/1/18 23:39, Michael Paquier wrote:
> > On Thu, Mar 01, 2018 at 02:27:13AM -0800, Andres Freund wrote:
> >> If I understand correctly there's been no progress on this since, and
> >> there'd definitely need to be major work to get something we can agree
> >> upon. Doesn't seem v11 material. I think we should mark this as returned
> >> with feedback.  Arguments against?
> > 
> > Agreed with your position.  The TAP tests rely on IPC::Run as a pillar
> > of its infrastructure.  I think that if we need a base API to do such
> > capabilities we ought to prioritize what we can do with it first instead
> > of trying to reinvent the wheel as this patch proposes in such a
> > complicated way.
> 
> I haven't seen any explanation for a problem this is solving.  The
> original submission contained a sample test case, by I don't see why
> that couldn't be done with the existing infrastructure.
> 
> Patch closed for now.

Agreed. This is not a v11 matter. Thanks.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: line_perp() (?-|) is broken.

2018-03-06 Thread Kyotaro HORIGUCHI
Hello, I'm returning to here sonn.

I regard Emre's work is aiming to refactor (not improve its
functionality of) the code, on the other hand thins one is a
separate bug fix which also should be backpatched.

But, honestrly I'm not sure such small fix would improve the
current situnation of the geometric operators at any rate.

At Sat, 03 Mar 2018 10:43:26 -0500, Tom Lane  wrote in 
<31399.1520091...@sss.pgh.pa.us>
> Emre Hasegeli  writes:
> >> Possibly this objection is pointless, because I'm not at all sure that
> >> the existing code is careful about how it uses FPeq/FPzero; perhaps
> >> we're applying EPSILON to all manner of numbers that don't have units
> >> of the coordinate space.  But this won't help make it better.

Agreed.

> > The existing code was probably paying attention to this particular
> > one, but it fails to apply EPSILON meaningfully to many other places.
> > For example lseg_parallel() and lseg_perp() applies it 2 times to the
> > same input.  First point_sl() compares x coordinates with FPeq(), and
> > then the returned slopes are compared again with FPeq().
> 
> Yeah, comparing a slope to EPSILON sure feels wrong :-(

It is a quite significant problem to fix and would be
controversial in detail. But, anyway, we are focusing on other
points that are less controversial. Then cook the EPSILON in the
next stage.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-03-06 Thread Tom Lane
Robert Haas  writes:
> This is really two separate changes:

> 1. Teach backends to remove any leftover pg_temp_%d schema on startup.

> 2. Teach autovacuum to remove any leftover tables in a pg_temp_%d
> schema if the backend is active but in some other database (rather
> than only when the backend is not active at all).

> Both of those changes seem broadly reasonable to me, but what do other
> people think?

I just read through this thread for the first time; sorry for not
paying attention sooner.

I'm uncomfortable with all the discussion of changing the autovacuum
launcher's algorithm; all of that seems complicated and unsure of making
anyone's life better.  It definitely does nothing for the problem
originally stated, that autovacuum is skipping an orphaned temp table
altogether.  That's not relevant to the actually proposed patch;
I'm just saying that I'm not sure anything useful would come of that.

Now as for the problem originally stated, step 1 alone doesn't fix it,
and there's reason not to like that change much.  Forcing backends to
clear their temp schemas immediately on connection will slow down
connection times, and for applications that never make any temp tables,
that's just a dead loss (though admittedly it might not be too expensive
in that case).  But the problem here is that it doesn't fix the issue.
The reason that a temp table might stay orphaned for a long time, if we're
speaking of an app that does use temp tables, is that it's in a very
high-numbered temp schema and only once in a blue moon does the database
have enough connections for that temp schema to be used at all.  Forcing
on-connection cleaning doesn't fix that.

So the correct fix is to improve autovacuum's check to discover whether
an old temp table is orphaned, so that it isn't fooled by putative
owner processes that are connected to some other DB.  Step 2 of the
proposed patch tries to do that, but it's only half right: we need a
change near line 2264 not only line 2090.  (Evidently, we need either
a comment that the logic is repeated, or else refactor so that there's
only one copy.)

Now, you can argue that autovacuum's check can be fooled by an "owner"
backend that is connected to the current DB but hasn't actually taken
possession of its assigned temp schema (and hence the table in question
really is orphaned after all).  This edge case could be taken care of by
having backends clear their temp schema immediately, as in step 1 of the
patch.  But I still think that that is an expensive way to catch what
would be a really infrequent case.  Perhaps a better idea is to have
backends advertise in PGPROC whether they have taken possession of their
assigned temp schema, and then autovacuum could ignore putative owners
that aren't showing that flag.  Or we could just do nothing about that,
on the grounds that nobody has shown the case to be worth worrying about.
Temp schemas that are sufficiently low-numbered to be likely to have
an apparent owner are also likely to get cleaned out by actual temp
table creation reasonably often, so I'm not at all convinced that we
need a mechanism that covers this case.  We do need something for the
cross-DB case though, because even a very low-numbered temp schema
can be problematic if it's in a seldom-used DB, which seems to be the
case that was complained of originally.

On the whole, my vote is to fix and apply step 2, and leave it at that.

regards, tom lane



Re: Server won't start with fallback setting by initdb.

2018-03-06 Thread Tom Lane
Peter Eisentraut  writes:
> On 3/4/18 15:31, Tom Lane wrote:
>> Then, seeing that the factory defaults are ReservedBackends = 3 and
>> max_wal_senders = 10, something's got to give; there's no way that
>> max_connections = 10 can work with those.  But what I would argue is that
>> of those three choices, the least defensible one is max_wal_senders = 10.
>> Where did that come from?

> Let's see.  A typical installation might need:

> 1 for pg_receivewal for continuous backup
> 2 for pg_basebackup
> 2 for if pg_basebackup gets interrupted and it takes 2 hours to free the
> TCP/IP connections
> 1 for a standby connection
> 1 for a second standby connection, for making infrastructure changes

That's "typical"?  It sounds like a major installation to me, one that
would certainly have had to fool with more settings than just
max_wal_senders.  Two concurrent pg_basebackups running at all times
seems particularly dubious.

If we drop the assumption of 2 concurrent pg_basebackups, then your
math would lead to a value of 5, which I'd be OK with.

regards, tom lane



Re: Server won't start with fallback setting by initdb.

2018-03-06 Thread Tom Lane
I wrote:
> Therefore, the condition that actually ought to be getting enforced here
> is either "ReservedBackends + max_wal_senders < MaxConnections", or
> "ReservedBackends + max_wal_senders <= MaxConnections", depending on
> whether you think it's appropriate to require at least one not-reserved-
> for-superusers connection slot to remain available if the walsenders
> slots are fully populated.

I propose the first attached patch to do that.  (I failed to resist the
temptation to copy-edit some nearby docs and comments, too.)

> My proposal is to default max_wal_senders to perhaps 3, and leave
> initdb's logic alone.

... and then the second attached patch to do that.

I noticed that a lot of our TAP tests are setting max_wal_senders
and max_replication_slots to random values around 4 or 5.  Probably
we could drop all that now, and let them just use the defaults.
I've not done that here, except for adjusting 010_pg_basebackup.pl
which would fail for no very good reason with minimal max_connections.

regards, tom lane

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 259a2d8..3a8fc7d 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** include_dir 'conf.d'
*** 696,703 
  
 
  The default value is three connections. The value must be less
! than the value of max_connections. This
! parameter can only be set at server start.
 

   
--- 696,704 
  
 
  The default value is three connections. The value must be less
! than max_connections minus
! .
! This parameter can only be set at server start.
 

   
*** include_dir 'conf.d'
*** 2982,2994 
  maximum number of simultaneously running WAL sender
  processes). The default is 10. The value 0 means replication is
  disabled. WAL sender processes count towards the total number
! of connections, so the parameter cannot be set higher than
! .  Abrupt streaming client
! disconnection might cause an orphaned connection slot until
  a timeout is reached, so this parameter should be set slightly
  higher than the maximum number of expected clients so disconnected
  clients can immediately reconnect.  This parameter can only
! be set at server start. wal_level must be set to
  replica or higher to allow connections from standby
  servers.
 
--- 2983,2998 
  maximum number of simultaneously running WAL sender
  processes). The default is 10. The value 0 means replication is
  disabled. WAL sender processes count towards the total number
! of connections, so this parameter's value must be less than
!  minus
! .
! Abrupt streaming client disconnection might leave an orphaned
! connection slot behind until
  a timeout is reached, so this parameter should be set slightly
  higher than the maximum number of expected clients so disconnected
  clients can immediately reconnect.  This parameter can only
! be set at server start.
! Also, wal_level must be set to
  replica or higher to allow connections from standby
  servers.
 
*** include_dir 'conf.d'
*** 3007,3016 
   (see ) that the server
   can support. The default is 10.  This parameter can only be set at
   server start.
!  wal_level must be set
!  to replica or higher to allow replication slots to
!  be used. Setting it to a lower value than the number of currently
   existing replication slots will prevent the server from starting.
  
 

--- 3011,3021 
   (see ) that the server
   can support. The default is 10.  This parameter can only be set at
   server start.
!  Setting it to a lower value than the number of currently
   existing replication slots will prevent the server from starting.
+  Also, wal_level must be set
+  to replica or higher to allow replication slots to
+  be used.
  
 

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index f3ddf82..660f318 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*** char	   *ListenAddresses;
*** 202,210 
  
  /*
   * ReservedBackends is the number of backends reserved for superuser use.
!  * This number is taken out of the pool size given by MaxBackends so
   * number of backend slots available to non-superusers is
!  * (MaxBackends - ReservedBackends).  Note what this really means is
   * "if there are <= ReservedBackends connections available, only superusers
   * can make new connections" --- pre-existing superuser connections don't
   * count 

Re: Server won't start with fallback setting by initdb.

2018-03-06 Thread Peter Eisentraut
On 3/4/18 15:31, Tom Lane wrote:
> Then, seeing that the factory defaults are ReservedBackends = 3 and
> max_wal_senders = 10, something's got to give; there's no way that
> max_connections = 10 can work with those.  But what I would argue is that
> of those three choices, the least defensible one is max_wal_senders = 10.
> Where did that come from?

Let's see.  A typical installation might need:

1 for pg_receivewal for continuous backup
2 for pg_basebackup
2 for if pg_basebackup gets interrupted and it takes 2 hours to free the
TCP/IP connections
1 for a standby connection
1 for a second standby connection, for making infrastructure changes

The purpose of raising the defaults to 10 was so that most users
wouldn't need to make changes, making it easier to access "proper"
backups and HA.  By my account, if the default were less than 7, the
setting would be insufficient to satisfy that goal.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [WIP PATCH] Index scan offset optimisation using visibility map

2018-03-06 Thread Tels
Hello Michail,

On Tue, March 6, 2018 4:03 pm, Michail Nikolaev wrote:
> Hello, Andrey.
>
> Thanks for review.
>
> I have updated comments according your review also renamed some fields for
> consistency.
> Additional some notes added to documentation.
>
> Updated patch in attach, github updated too.

That is a cool idea, and can come in very handy if you regulary need large
offsets. Cannot comment on the code, but there is at least one regular
typo:

+   * Decrement counter for remaning skipped tuples.
+   * If last tuple skipped - release the buffer.
+   */
+   if (node->iss_TuplesSkippedRemaning > 0)
+   node->iss_TuplesSkippedRemaning--;

The English word is "remaining", with "ai" instead of "a" :)

Also the variable name "iss_TuplesSkipped" has its grammar at odds with
the purpose the code seems to be.

It could be named "SkipTuples" (e.g. this is the number of tuples we need
to skip, not the number we have skipped), and the other one then
"iss_SkipTuplesRemaining" so they are consistent with each other.

Others might have a better name for these two, of course.

Best wishes,

Tels



Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2018-03-06 Thread Daniel Gustafsson
> On 06 Mar 2018, at 22:08, Tom Lane  wrote:
> 
> Peter Eisentraut  writes:
>> On 3/4/18 17:15, Daniel Gustafsson wrote:
>>> Do I think this patch is realistic to target for v11?  Well.  Given where we
>>> are in the cycle, I don’t think any new TLS implementation going in is
>>> realistic at this point since none of the proposed ones have had enough tyre
>>> kicking done.  That might change should there be lots of interest and work
>>> started soon, but as has been discussed elsewhere recently the project has
>>> limited resources.  I have time to work on this, and support reviewers of 
>>> it,
>>> but that’s only piece of the puzzle.
> 
>> I think it would be best if both this patch and the GnuTLS patch are
>> moved to the next CF and are attacked early in the PG12 cycle.
> 
> +1.  I think it's fairly important that those two get reviewed/committed
> in the same cycle, in case we need to adjust the relevant APIs.

I completely agree with all of the above.

cheers ./daniel


Re: Re: BUGFIX: standby disconnect can corrupt serialized reorder buffers

2018-03-06 Thread Alvaro Herrera
Craig Ringer wrote:

> Revised patch attached.
> 
> I have _not_ rewritten to use sscanf yet. I'll do that next, so you can
> choose the fewer-changes patch for backpatching if desired.

Pushed, with a further change: it seems more sensible to centralize the
whole operation of building the path, rather than just the format
string, so I created a new function to do that.  The code looks cleaner
IMO this way IMO.  All tests pass in all branches.

BTW the way the XLogSegNoOffsetToRecPtr() et al macros were modified to
accept wal_segment_size at the end of the argument list, *after* the
output argument, seems pretty bad style.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: public schema default ACL

2018-03-06 Thread Tom Lane
Robert Haas  writes:
> On Sat, Mar 3, 2018 at 4:56 AM, Noah Misch  wrote:
>> I propose, for v11, switching to "GRANT USAGE ON SCHEMA
>> public TO PUBLIC" (omit CREATE).  Concerns?  An alternative is to change the
>> default search_path to "$user"; that would be break more applications, and I
>> don't see an advantage to compensate for that.

> Isn't this going to cause widespread breakage?  Unprivileged users
> will suddenly find that they can no longer create tables, because
> $user doesn't exist and they don't have permission on public.  That
> seems quite unfriendly.

Well, the fundamental problem here is that the arrangements around schema
public were set up to allow a smooth transition from the pre-7.3
no-schemas world, not to provide any kind of security.  If we want to use
schemas for security then we're going to have to do *something* that's not
compatible.  Or we can continue to ship an insecure default configuration,
but I recall many people arguing against that sort of choice in the past.

I wonder whether it'd be sensible for CREATE USER --- or at least the
createuser script --- to automatically make a matching schema.  Or we
could just recommend that DBAs do so.  Either way, we'd be pushing people
towards the design where "$user" does exist for most/all users.  Our docs
comment (section 5.8.7) that "the concepts of schema and user are nearly
equivalent in a database system that implements only the basic schema
support specified in the standard", so the idea of automatically making
a schema per user doesn't seem ridiculous on its face.  (Now, where'd I
put my flameproof long johns ...)

regards, tom lane



Re: JIT compiling with LLVM v11

2018-03-06 Thread Tom Lane
Andres Freund  writes:
> I'm not too worried about that scenario. If, for a cheap plan, the
> planner ends up with a seqscan despite it being disabled, you're pretty
> close to randomly choosing plans already, as the pruning doesn't work
> well anymore (as the %1 percent fuzz factor in
> compare_path_costs_fuzzily() swamps the actual plan costs).

Something I've wanted to do for awhile is to get rid of disable_cost
in favor of pruning disabled plans through logic rather than costing.
I've looked at this once or twice, and it seems doable but not entirely
trivial --- the sticky bits are places where you do need to allow a
disabled plan type because there's no other alternative.  But if we
could get that done, it'd help with this sort of problem.

regards, tom lane



Re: [HACKERS] path toward faster partition pruning

2018-03-06 Thread Robert Haas
On Fri, Mar 2, 2018 at 7:32 PM, David Rowley
 wrote:
> Let's look at the following perhaps unlikely case. (I picked an
> extreme case to demonstrate why this may be an inferior method)
>
> Given the table abc (...) partition by range (a,b,c), with the query:
>
> select * from abc where a >= 1 and a >= 2 and a >= 3 and b >= 1 and b
>>= 2 and b = 3 and c >= 1 and c >= 2 and c = 3;
>
> We would likely still be parsing those clauses into some struct like
> PartitionClauseInfo and would end up with some arrays or Lists with
> the clauses segmented by partition key.
>
> It appears to me, for your method to work we'd need to try every
> combination of the clauses matching each partition key, which in this
> case is 3 * 3 * 3 searches. Amit's current method is 1 search, after
> the clause reduction which is 3 + 3 + 3 (O(N) per partition key)
[...]
> With that considered, is it still a good idea to do it this way?

I dunno.  What do you think?

That case is indeed pretty unfortunate, but it's also pretty
artificial.  It's not obvious to me that we shouldn't care about it,
but it's also not obvious to me that we should.  If we have some
bizarre cases that slip through the cracks or don't perform terribly
well, maybe nobody would ever notice or care.  On the other hand,
maybe they would.

I suppose in my ideal world, this could be handled by building a
GREATEST or LEAST expression.  In other words, if someone says foo >=
1 AND foo >= 2, instead of doing separate pruning steps, we'd just
prune once based on foo >= GREATEST(1,2).  But that doesn't really
work, because there's no provision to tell MinMaxExpr from which
opfamily we wish to draw the operator used to compare 1 and 2 and no
guarantee that such an operator exists for the actual data types of 1
and 2.  (Imagine that 1 and 2 of different data types; the relevant
opfamily might have an operator that can compare a value of the same
type as foo to 1 and similarly for 2, but no operator that can compare
1 and 2 to each other.)

One thing that we could do is just only accept one clause for each
column-strategy pairing, presumably either the first one or the last
one.  So in your example either a >= 1 or a >= 3 would be accepted and
the others would be discarded for purposes of partition pruning.  If a
user complains, we could tell THEM to manually do the rewrite
suggested in the previous step, and just write a >= GREATEST(1,2,3).
(And of course if it's that simple, they might want to then
pre-simplify to a >= 3!)

Another alternative is to include some kind of additional type of step
in the "pruning program" which can do this GREATEST/LEAST operation
... but that's adding quite a bit of complexity for what seems like
it's pretty much a corner case, and as noted above, there's no
guarantee that we even have the correct operator available.  It should
be fine if the partitioning column/expression and all of the constants
being compared are of the same type, and in practice *most* of the
time even when they're not, but we're going to have to have some
handling for the strange cases -- and I think the only real choices
are "try every combination and maybe be slow", "try 1 combination and
maybe fail to prune everything that could have been pruned", and some
intermediate possibilities along the same lines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Rewriting the test of pg_upgrade as a TAP test - take two

2018-03-06 Thread Tom Lane
Peter Eisentraut  writes:
> On 3/4/18 16:09, Andrew Dunstan wrote:
>> If you want to do this soon I can put out a Buildfarm Client release
>> fairly quickly.

> I think the dependency is mostly the other way around.  How quickly
> would build farm owners install the upgrade?

IIUC, the buildfarm script patch is only needed to avoid duplicate
tests.  So owners need only install it if they want to reduce wasted
cycles on their machines.  That being the case, it's only urgent to
the extent that the individual owner perceives it to be.  Some might
think it is so, so I'd like to see the BF release available before
we push the TAP test ... but we don't have to wait very long between.

regards, tom lane



Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2018-03-06 Thread Tom Lane
Peter Eisentraut  writes:
> On 3/4/18 17:15, Daniel Gustafsson wrote:
>> Do I think this patch is realistic to target for v11?  Well.  Given where we
>> are in the cycle, I don’t think any new TLS implementation going in is
>> realistic at this point since none of the proposed ones have had enough tyre
>> kicking done.  That might change should there be lots of interest and work
>> started soon, but as has been discussed elsewhere recently the project has
>> limited resources.  I have time to work on this, and support reviewers of it,
>> but that’s only piece of the puzzle.

> I think it would be best if both this patch and the GnuTLS patch are
> moved to the next CF and are attacked early in the PG12 cycle.

+1.  I think it's fairly important that those two get reviewed/committed
in the same cycle, in case we need to adjust the relevant APIs.  It
seems unlikely that we can muster the effort to get both done for v11.

regards, tom lane



Re: [WIP PATCH] Index scan offset optimisation using visibility map

2018-03-06 Thread Michail Nikolaev
Hello, Andrey.

Thanks for review.

I have updated comments according your review also renamed some fields for
consistency.
Additional some notes added to documentation.

Updated patch in attach, github updated too.


offset_index_only_v3.patch
Description: Binary data


Re: JIT compiling with LLVM v11

2018-03-06 Thread Andres Freund
On 2018-03-06 12:16:01 -0800, Andres Freund wrote:
> > I ran some performance assessments:
> >
> > merge base (0b1d1a038babff4aadf0862c28e7b667f1b12a30)
> >
> > make installcheck  3.14s user 3.34s system 17% cpu 37.954 total
> >
> > jit branch default settings
> >
> > make installcheck  3.17s user 3.30s system 13% cpu 46.596 total
> >
> > jit_above_cost=0
> >
> > make installcheck  3.30s user 3.53s system 5% cpu 1:59.89 total
> >
> > jit_optimize_above_cost=0 (and jit_above_cost=0)
> >
> > make installcheck  3.44s user 3.76s system 1% cpu 8:12.42 total
> >
> > jit_inline_above_cost=0 (and jit_above_cost=0)
> >
> > make installcheck  3.32s user 3.62s system 2% cpu 5:35.58 total
> >
> > One can see the CPU savings quite nicely.
> 
> I'm not quite sure what you mean by that.
> 
> 
> > One obvious problem is that with the default settings, the test suite
> > run gets about 15% slower.  (These figures are reproducible over several
> > runs.)  Is there some debugging stuff turned on that would explain this?
> >  Or would just loading the jit module in each session cause this?
> 
> I suspect it's loading the module.

There's also another issue: For a lot queries in the tests the stats are
way way way off because the relevant tables have never been analyzed.
There's a few cases where costs are off by like 5-7 orders of
magnitude...

Greetings,

Andres Freund



Re: public schema default ACL

2018-03-06 Thread Robert Haas
On Sat, Mar 3, 2018 at 4:56 AM, Noah Misch  wrote:
> Commit 5770172 ("Document security implications of search_path and the public
> schema.") is largely a workaround for the fact that the boot_val of
> search_path contains "public" while template0 gets "GRANT CREATE, USAGE ON
> SCHEMA public TO PUBLIC".  It's like having world-writable /usr/bin.  The
> security team opted not to change that in released branches, but we thought to
> revisit it later.  I propose, for v11, switching to "GRANT USAGE ON SCHEMA
> public TO PUBLIC" (omit CREATE).  Concerns?  An alternative is to change the
> default search_path to "$user"; that would be break more applications, and I
> don't see an advantage to compensate for that.

Isn't this going to cause widespread breakage?  Unprivileged users
will suddenly find that they can no longer create tables, because
$user doesn't exist and they don't have permission on public.  That
seems quite unfriendly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: JIT compiling with LLVM v11

2018-03-06 Thread Andres Freund
Hi,

On 2018-03-06 10:29:47 -0500, Peter Eisentraut wrote:
> I think taking the total cost as the triggering threshold is probably
> good enough for a start.  The cost modeling can be refined over time.

Cool.


> We should document that both jit_optimize_above_cost and
> jit_inline_above_cost require jit_above_cost to be set, or otherwise
> nothing happens.

Yea, that's a good plan. We could also change it so it would, but I
don't think there's much point?


> One problem I see is that if someone sets things like
> enable_seqscan=off, the artificial cost increase created by those
> settings would quite likely bump the query over the jit threshold, which
> would alter the query performance characteristics in a way that the user
> would not have intended.  I don't have an idea how to address this right
> now.

I'm not too worried about that scenario. If, for a cheap plan, the
planner ends up with a seqscan despite it being disabled, you're pretty
close to randomly choosing plans already, as the pruning doesn't work
well anymore (as the %1 percent fuzz factor in
compare_path_costs_fuzzily() swamps the actual plan costs).


> I ran some performance assessments:
>
> merge base (0b1d1a038babff4aadf0862c28e7b667f1b12a30)
>
> make installcheck  3.14s user 3.34s system 17% cpu 37.954 total
>
> jit branch default settings
>
> make installcheck  3.17s user 3.30s system 13% cpu 46.596 total
>
> jit_above_cost=0
>
> make installcheck  3.30s user 3.53s system 5% cpu 1:59.89 total
>
> jit_optimize_above_cost=0 (and jit_above_cost=0)
>
> make installcheck  3.44s user 3.76s system 1% cpu 8:12.42 total
>
> jit_inline_above_cost=0 (and jit_above_cost=0)
>
> make installcheck  3.32s user 3.62s system 2% cpu 5:35.58 total
>
> One can see the CPU savings quite nicely.

I'm not quite sure what you mean by that.


> One obvious problem is that with the default settings, the test suite
> run gets about 15% slower.  (These figures are reproducible over several
> runs.)  Is there some debugging stuff turned on that would explain this?
>  Or would just loading the jit module in each session cause this?

I suspect it's loading the module.  There's two pretty easy avenues to
improve this:

1) Attempt to load the JIT provider in postmaster, thereby avoiding a
   lot of redundant dynamic linker work if already installed. That's
   ~5-10 lines or such.  I basically refrained from that because it's
   convenient to not have to restart the server during development (one
   can just reconnect and get a newer jit plugin).

2) Don't load the JIT provider until fully needed. Right now
   jit_compile_expr() will load the jit provider even if not really
   needed. We should probably move the first two return blocks in
   llvm_compile_expr() into jit_compile_expr(), to avoid that.


> From the other results, we can see that one clearly needs quite a big
> database to see a solid benefit from this.

Right, until we've got caching this'll only be beneficial for ~1s+
analytics queries. Unfortunately caching requires some larger planner &
executor surgery, so I don't want to go there at the same time (I'm
already insane enough).


> Do you have any information gathered about this so far?  Any scripts
> to create test databases and test queries?

Yes. I've used tpc-h. Not because it's the greatest, but because it's
semi conveniently available and a lot of others have experience with it
already.  Do you mean whether I've run a couple benchmarks? If so, yes.
I'll schedule some more later - am on battery power rn.

Greetings,

Andres Freund



Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-06 Thread Robert Haas
On Tue, Mar 6, 2018 at 5:31 AM, Jeevan Chalke
 wrote:
> This is in-lined with enable_hashagg GUC. Do you think
> enable_partitionwise_aggregate seems better? But it will be not consistent
> with other GUC names like enable_hashagg then.

Well, if I had my way, enable_hashagg would be spelled
enable_hash_aggregate, too, but I wasn't involved in the project that
long ago.  100% consistency is hard to achieve here; the perfect
parallel of enable_hashagg would be enable_partitionwiseagg, but then
it would be inconsistent with enable_partitionwise_join unless we
renamed it to enable_partitionwisejoin, which I would rather not do.
I think the way the enable_blahfoo names were done was kinda
shortsighted -- it works OK as long as blahfoo is pretty short, like
mergejoin or hashagg or whatever, but if you have more or longer words
then I think it's hard to see where the word boundaries are without
any punctuation.  And if you start abbreviating then you end up with
things like enable_pwagg which are not very easy to understand.  So I
favor spelling everything out and punctuating it.

> So the code for doing partially aggregated partial paths and partially
> aggregated non-partial path is same except partial paths goes into
> partial_pathlist where as non-partial goes into pathlist of
> partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel()
> when isPartialAgg = true seems correct. Also as we have decided, this
> function is responsible to create all partially aggregated paths including
> both partial and non-partial.
>
> Am I missing something?

Hmm.  I guess not.  I think I didn't read this code well enough
previously.  Please find attached proposed incremental patches (0001
and 0002) which hopefully make the code in this area a bit clearer.

>> +   /*
>> +* If there are any fully aggregated partial paths present,
>> may be because
>> +* of parallel Append over partitionwise aggregates, we must stick
>> a
>> +* Gather or Gather Merge path atop the cheapest partial path.
>> +*/
>> +   if (grouped_rel->partial_pathlist)
>>
>> This comment is copied from someplace where the code does what the
>> comment says, but here it doesn't do any such thing.
>
> Well, these comments are not present anywhere else than this place. With
> Parallel Append and Partitionwise aggregates, it is now possible to have
> fully aggregated partial paths now. And thus we need to stick a Gather
> and/or Gather Merge atop cheapest partial path. And I believe the code does
> the same. Am I missing something?

I misread the code.  Sigh.  I should have waited until today to send
that email and taken time to study it more carefully.  But I still
don't think it's completely correct.  It will not consider using a
pre-sorted path; the only strategies it can consider are cheapest path
+ Gather and cheapest path + explicit Sort (even if the cheapest path
is already correctly sorted!) + Gather Merge.  It should really do
something similar to what add_paths_to_partial_grouping_rel() already
does: first call generate_gather_paths() and then, if the cheapest
partial path is not already correctly sorted, also try an explicit
Sort + Gather Merge.  In fact, it looks like we can actually reuse
that logic exactly.  See attached 0003 incremental patch; this changes
the outputs of one of your regression tests, but the new plan looks
better.

Some other notes:

There's a difference between performing partial aggregation in the
same process and performing it in a different process.  hasNonPartial
tells us that we can't perform partial aggregation *at all*;
hasNonSerial only tells us that partial and final aggregation must
happen in the same process.  This patch could possibly take advantage
of partial aggregation even when hasNonSerial is set.  Finalize
Aggregate -> Append -> N copies of { Partial Aggregate -> Whatever }
is OK with hasNonSerial = true as long as hasNonPartial = false.  Now,
the bad news is that for this to actually work we'd need to define new
values of AggSplit, like AGGSPLIT_INITIAL = AGGSPLITOP_SKIPFINAL and
AGGSPLIT_FINAL = AGGSPLITOP_COMBINE, and I'm not sure how much
complexity that adds.  However, if we're not going to do that, I think
we'd better at last add some comments about it suggesting that someone
might want to do something about it in the future.

I think that, in general, it's a good idea to keep the number of times
that create_grouping_paths() does something which is conditional on
whether child_data is NULL to a minimum.  I haven't looked at what
Ashutosh tried to do there so I don't know whether it's good or bad,
but I like the idea, if we can do it cleanly.

It strikes me that we might want to consider refactoring things so
that create_grouping_paths() takes the grouping_rel and
partial_grouping_rel as input arguments.  Right now, the
initialization of the child grouping and partial-grouping rels is
partly in try_partitionwise_aggregate(), 

Re: Rewriting the test of pg_upgrade as a TAP test - take two

2018-03-06 Thread Peter Eisentraut
On 3/4/18 16:09, Andrew Dunstan wrote:
>>> AFAICT, this still has the same problem as the previous take, namely
>>> that adding a TAP test suite to the pg_upgrade subdirectory will end up
>>> with the build farm client running the pg_upgrade tests twice.  What we
>>> likely need here is an update to the build farm client in conjunction
>>> with this.

> Pushed with a bug fix. See
> 
> 
> If you want to do this soon I can put out a Buildfarm Client release
> fairly quickly.

I think the dependency is mostly the other way around.  How quickly
would build farm owners install the upgrade?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2018-03-06 Thread Peter Eisentraut
On 3/4/18 17:15, Daniel Gustafsson wrote:
> Do I think this patch is realistic to target for v11?  Well.  Given where we
> are in the cycle, I don’t think any new TLS implementation going in is
> realistic at this point since none of the proposed ones have had enough tyre
> kicking done.  That might change should there be lots of interest and work
> started soon, but as has been discussed elsewhere recently the project has
> limited resources.  I have time to work on this, and support reviewers of it,
> but that’s only piece of the puzzle.

I think it would be best if both this patch and the GnuTLS patch are
moved to the next CF and are attacked early in the PG12 cycle.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: check error messages in SSL tests

2018-03-06 Thread Peter Eisentraut
On 2/24/18 10:12, Peter Eisentraut wrote:
> On 2/24/18 07:37, Michael Paquier wrote:
>> On Fri, Feb 23, 2018 at 01:57:44PM -0500, Peter Eisentraut wrote:
>>> Oh.  I actually had that file as 0600 in my checked-out tree, probably
>>> from earlier experiments.  Fixed that.  And I also changed it to make
>>> another temp file that is explicitly 0644, because we can't rely on that
>>> being the default either.
>>
>> Thanks for fixing the first one.  I also like the second change.  This
>> patch looks fine to me.
> 
> committed

This contains a slight problem:  The tests contain two different
branches, depending on whether tls-server-end-point is supported.  But
these branches run a different number of tests, so the test count of the
top of the test file might be wrong.  Here is a patch that restructures
this to count the tests more dynamically.

Thoughts?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From c96f5ce5cabae3bedb8d07164f2a7d3bba82f4bc Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 6 Mar 2018 14:49:07 -0500
Subject: [PATCH] Fix test counting in SSL tests

The branch that does not support tls-server-end-point runs more tests,
so we need to structure the test counting dynamically.
---
 src/test/ssl/t/002_scram.pl | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/test/ssl/t/002_scram.pl b/src/test/ssl/t/002_scram.pl
index 3d7f9abfbe..a805a3196b 100644
--- a/src/test/ssl/t/002_scram.pl
+++ b/src/test/ssl/t/002_scram.pl
@@ -8,15 +8,13 @@
 use ServerSetup;
 use File::Copy;
 
-if ($ENV{with_openssl} eq 'yes')
-{
-   plan tests => 6;
-}
-else
+if ($ENV{with_openssl} ne 'yes')
 {
plan skip_all => 'SSL not supported by this build';
 }
 
+my $number_of_tests = 6;
+
 # This is the hostname used to connect to the server.
 my $SERVERHOSTADDR = '127.0.0.1';
 
@@ -70,8 +68,11 @@

"scram_channel_binding=tls-server-end-point",
qr/channel binding type 
"tls-server-end-point" is not supported by this build/,
"SCRAM authentication with 
tls-server-end-point as channel binding");
+   $number_of_tests++;
 }
 test_connect_fails($common_connstr,
"scram_channel_binding=not-exists",
qr/unsupported SCRAM channel-binding type/,
"SCRAM authentication with invalid channel binding");
+
+done_testing($number_of_tests);

base-commit: 286c0ab257f8dde8e5494426b86c38f3877ae5c2
-- 
2.16.2



Re: [COMMITTERS] pgsql: Add much-more-extensive TAP tests for pgbench.

2018-03-06 Thread Peter Eisentraut
On 3/2/18 20:19, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 9/8/17 09:32, Tom Lane wrote:
>>> Add much-more-extensive TAP tests for pgbench.
> 
>> The use of done_testing() raises the requirement of Test::More from 0.82
>> to 0.87.  That seems OK, but we should update the version requirement in
>> TestLib.pm.  Any concerns?
> 
> Probably not, given that none of the buildfarm machines have whined
> (although I'm not sure if all the older ones run the TAP tests?).

done

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: constraint exclusion and nulls in IN (..) clause

2018-03-06 Thread Tom Lane
Amit Langote  writes:
> [ v4-0001-Disregard-nulls-in-SAOP-rightarg-array-list-durin.patch ]

This patch seems pretty wrong to me.  The proposed proof rule is wrong
for !useOr expressions (that is, "scalar op ALL (array)"); you can't
just ignore null items in that case.  It's also wrong when we need strong
refutation.  I was about to point to the comment for predicate_refuted_by,

 * An important fine point is that truth of the clauses must imply that
 * the predicate returns FALSE, not that it does not return TRUE.  This
 * is normally used to try to refute CHECK constraints, and the only
 * thing we can assume about a CHECK constraint is that it didn't return
 * FALSE --- a NULL result isn't a violation per the SQL spec.  (Someday

and reject the patch as unfixably wrong, when I noticed that in
fact somebody had added support for weak refutation to this code.
Now I'm just desperately unhappy about the lack of comment-updating
work in that commit (b08df9cab, looks like).  It's not acceptable
to falsify comments and not update them.  I have a nasty feeling that
there are outright bugs in the logic, too.

I'm inclined to think that you've attacked the problem at the wrong place
anyway.  The notion that one arm of an OR reducing to NULL might not
prevent proving what we want is by no means specific to ScalarArrayOp.
I think it'd make more sense to see about incorporating that idea in
predicate_implied_by_simple_clause/predicate_refuted_by_simple_clause.
It might be necessary to change the boolean results in the recursive
logic to three-way, not sure.

I'm setting this patch back to Waiting On Author pending fixes
for the above issues.  In the meantime I see a separate work item
here to do code review for b08df9cab.

regards, tom lane



Re: using index or check in ALTER TABLE SET NOT NULL

2018-03-06 Thread Sergei Kornilov

Hello

> Do you actually need test output proving that this code path was taken
> rather than the default one? Seems like looking at the code coverage
> report might be enough.
I not known. In v4 i use DEBUG1 message and do not check code path in tests at 
all: by full table scan or by constraint, i tested only command result to not 
break behavior.
Today Ildar Musin proposed to test code path through 
NotNullImpliedByRelConstraints function. This is my first patch and I do not 
have the confidence to write a test. So I looked more closely at the alter 
table tests, found several info in attach partition and updated my patch.

> I did not see any INFO messages in a quick test of ALTER TABLE ATTACH
> PARTITION, but if there are any lurking in there, they probably need
> to be downgraded.
In src/test/regress/expected/alter_table.out i found 7 test with
> INFO: partition constraint for table "..." is implied by existing constraints
and 5 with
> INFO:  updated partition constraint for default partition "..." is implied by 
> existing constraints
ereport's are in ValidatePartitionConstraints function 
src/backend/commands/tablecmds.c

regards, Sergei



Re: using index or check in ALTER TABLE SET NOT NULL

2018-03-06 Thread Tom Lane
Sergei Kornilov  writes:
>> ISTM that depending on DEBUG messages is bad because debugging lines
>> added elsewhere will make your tests fail;

> I agree and this is reason why i not used DEBUG message in tests as was 
> proposed. I found INFO messages in tests and decided that this was an 
> acceptable option year ago.

INFO is for user-facing messages, such as output from VACUUM VERBOSE,
where the user has specifically requested the output.  I do not think
this falls into that category.  Even if you persuade some committer
to commit it like that, there will be user pushback asking us to get
this noise out of their faces ... and then we'll have to find some
other way to test it.

Do you actually need test output proving that this code path was taken
rather than the default one?  Seems like looking at the code coverage
report might be enough.

> In this case we need remove INFO from ALTER TABLE ATTACH PARTITION code and 
> tests. I think is a bad idea of using different rules for same stuff, right? 
> Probably i can do this work too.

I did not see any INFO messages in a quick test of ALTER TABLE ATTACH
PARTITION, but if there are any lurking in there, they probably need
to be downgraded.

regards, tom lane



Re: All Taxi Services need Index Clustered Heap Append

2018-03-06 Thread Evgeniy Shishkin


> On Mar 6, 2018, at 04:57, Craig Ringer  wrote:
> 
> On 3 March 2018 at 00:30, Darafei "Komяpa" Praliaskouski  > wrote:
>  
> I gave this all some thought and it looks like it all could have not happened 
> if Postgres was able to cluster heap insertions by (id, ts) index. We're ok 
> with synchronuous_commit=off, so amplified write won't immediately hit disk 
> and can get cooled down in progress. Clustering doesn't require perfect 
> sorting: we need to minimize number of pages fetched, it's ok if the pages 
> are not consecutive on disk.
> 
> I'm surprised nobody has mentioned BRIN yet.
> 
> Ever since BRIN was introduced, I've thought it would be very interesting to 
> use it + the freespace map for coarse-grained tuple routing in heap inserts. 
> Try to find the best-match range with BRIN and look for free space there. I 
> think there was discussion of this early on, so you may want to look up the 
> BRIN threads.
> 
> The main challenge probably comes when a range is exhausted. Your writes 
> would start spilling over into new heap pages and get intermixed again. Some 
> support for pre-allocating new ranges would be needed.


Basically Poor Man's Clustering solution consists of two parts:
1. Find eligible pages to insert
2. Make sure you don't screw new page

At first we want to define how we want to cluster our data, it can be a mix of 
equality, hash, range and geo operators.
In Darafei's case appropriate clustering would be: CLUSTER BY (hash(id), ts 
asc) or (eq(id), range(ts, '1 day interval')).
To support fast search of pages to insert, there should be present Bloom index 
on id and BRIN or Btree index on ts 
for the first example, and Btree or Hash on id and Btree or BRIN or Gist for 
the second.
 

So, we can bitmap scan all needed indexes to find pages with tuples with 
already present clustered keys,
we AND those bitmaps and consult fsm and proceed to insert in pages if place 
allows.

Now, if we have not found pages with the same cluster keys or the pages are 
full, we need to find free page and don't screw it.
To do so, we must estimate ideal clustering of a page and either insert or make 
a completely new page.

In order not to intermix page with hash or equality clustering, we should 
examine already present keys and consult statistics for them.
If key is in MCV - better find new page, otherwise we need to calculate how 
many tuples per page on average we have for the key
 and compare with number of tuples on the page with our new key. I.e. if the 
page we about to insert have 30 tuples with 3 clustering keys,
and we conclude that on average we have 20 tuples per key this means that there 
would be 30 more tuples with that keys and 20 for our key.
If the page can hold that 80 tuples, we insert there.  I believe existing 
statistics are enough to make this work.

For range based clustering we should estimate we should estimate how many 
tuples there would be on a page and its interval and compare
if we would break this interval. For example we have plain ordering clustering, 
we insert value 130, the page have min value 100 and number
of tuples is 20. We estimate that on average a page max value minus min value 
is 100 and number of tuples is 30. So each tuple increments
range by 3. So our page with min value of 100 and 20 tuples have future closing 
max value of 200, as our value is 130 we proceed to insert.
I don't know if there are enough stats for this already, or if BRIN can help 
here. But it is doable.

Now for the geo part. This is the least thought through. We definitely need an 
index to consult MBR and use gist penalty functions to decide.

Do you think the above rough design is doable and desirable?  

Re: PATCH: Configurable file mode mask

2018-03-06 Thread David Steele
On 3/5/18 10:46 PM, Michael Paquier wrote:
> On Mon, Mar 05, 2018 at 03:07:20PM -0500, David Steele wrote:
>> On 2/28/18 2:28 AM, Michael Paquier wrote:
>>> On Tue, Feb 27, 2018 at 03:52:32PM -0500, David Steele wrote:
>>> I don't quite understand here.  I have no objection into extending
>>> setup_cluster() with a group_access argument so as the tests run both
>>> the group access and without it, but I'd rather keep the low-level API
>>> clean of anything fancy if we can use the facility which already
>>> exists.
>>
>> I'm not sure what you mean by, "facility that already exists".  I think
>> I implemented this the same as the allows_streaming and has_archiving
>> flags.  The only difference is that this option must be set at initdb
>> time rather than in postgresql.conf.
>>
>> Could you be more specific?
> 
> Let's remove has_group_access from PostgresNode::init and instead use
> existing parameter called "extra".  In your patch, this setup:
> has_group_access => 1
> is equivalent to that:
> extra => [ -g ]
> You can also guess the value of has_group_access by parsing the
> arguments from the array "extra".

OK.  extra is used to set -g and the group_access() function checks
pgdata directly since this can change after the cluster is created.

>>> check_pg_data_perm() looks useful even without $allow_group even if the
>>> grouping facility is reverted at the end.  S_ISLNK should be considered
>>> as well for tablespaces or symlinks of pg_wal?  We have such cases in
>>> the regression tests.
>>
>> Hmmm, the link is just pointing to a directory whose permissions have
>> been changed.  Why do we need to worry about the link?
> 
> Oh, perhaps I misread your code here, but this ignores symlinks, for
> example take an instance where pg_wal is symlinked, we'd likely want to
> make sure that at least archive_status is using a correct set of
> permissions, no?  There is a "follow" option in File::Find which could
> help.

Ah, I see what you mean now.  Fixed using follow_fast.  This variant
claims to be faster and it doesn't matter if files are occasionally
checked twice.

>>
>>> setup_signals();
>>>
>>> -   umask(S_IRWXG | S_IRWXO);
>>> -
>>> create_data_directory();
>>> This should not be moved around.
>>
>> Hmmm - I moved it much earlier in the process which seems like a good
>> thing.  Consider if there was a option to fixup permissions, like there
>> is to fsync.  Isn't it best to set the mode as soon as possible to
>> prevent code from being inserted before it?
> 
> Those two are separate issues.  Could you begin a new thread on the
> matter?  This will attract more attention.

OK, I'll move it back for now, and make a note to raise the position as
a separate issue.  I'd rather not do it in this fest, though.

> The indentation in RewindTest.pm is a bit wrong.

Fixed.

> -   if (chmod(location, S_IRWXU) != 0)
> +   current_umask = umask(0);
> +   umask(current_umask);
> [...]
> -   if (chmod(pg_data, S_IRWXU) != 0)
> +   if (chmod(pg_data, PG_DIR_MODE_DEFAULT & ~file_mode_mask) != 0)
> Such modifications should be part of patch 3, not 2, where the group
> features really apply.

The goal of patch 2 is to refactor the way permissions are set by
replacing locally hard-coded values with centralized values, so I think
this makes sense here.

> my $test_mode = shift;
> 
> +   umask(0077);

Added. (Ensure that all files are created with default data dir
permissions).

> RewindTest::setup_cluster($test_mode);
> What's that for?  A comment would be welcome.

Added. (Used to differentiate clusters).

> Perhaps the tests should be cut into a separate patch?  

I prefer tests to be in the same patch as the code they test.

> Those are not
> directly related to the refactoring done in patch 2.

The point of the tests is to be sure there are no regressions in the
refactor so they seem directly related to me.  Also, the tests
themselves were not to good about keeping permissions consistent.

> Patch 2 begins to look fine for me.

I also made chmod_recursive() generic.

New patches are attached.

Thanks!
-- 
-David
da...@pgmasters.net
From 2caecfef0cb0b029c439786fe462d69b5caaf67b Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Mon, 5 Mar 2018 14:33:10 -0500
Subject: [PATCH 1/3] pg_resetwal tests.

Adds a very basic test suite for pg_resetwal.
---
 src/bin/pg_resetwal/.gitignore |  1 +
 src/bin/pg_resetwal/Makefile   |  6 +
 src/bin/pg_resetwal/t/001_basic.pl | 53 ++
 3 files changed, 60 insertions(+)
 create mode 100644 src/bin/pg_resetwal/t/001_basic.pl

diff --git a/src/bin/pg_resetwal/.gitignore b/src/bin/pg_resetwal/.gitignore
index 236abb4323..a950255fd7 100644
--- a/src/bin/pg_resetwal/.gitignore
+++ b/src/bin/pg_resetwal/.gitignore
@@ -1 +1,2 @@
 /pg_resetwal
+/tmp_check
diff --git a/src/bin/pg_resetwal/Makefile b/src/bin/pg_resetwal/Makefile
index 5ab7ad33e0..13c9ca6279 100644
--- a/src/bin/pg_resetwal/Makefile
+++ 

Re: using index or check in ALTER TABLE SET NOT NULL

2018-03-06 Thread Sergei Kornilov
Hello

> You should be able to use an event trigger that raises a message when
> table_rewrite is hit, to notify the test driver that a rewrite happens.
Here is no table rewrite, only verify. So here EventTriggerTableRewrite is not 
called. Or i missed something?

> ISTM that depending on DEBUG messages is bad because debugging lines
> added elsewhere will make your tests fail;
I agree and this is reason why i not used DEBUG message in tests as was 
proposed. I found INFO messages in tests and decided that this was an 
acceptable option year ago.

> and INFO is generally frowned upon (I, for one, would frown upon an INFO 
> message raised by ALTER
> TABLE, for sure.) so let's not do that either.
In this case we need remove INFO from ALTER TABLE ATTACH PARTITION code and 
tests. I think is a bad idea of using different rules for same stuff, right? 
Probably i can do this work too.

But it is necessary to decide how the test should look.

regards, Sergei



Re: User defined data types in Logical Replication

2018-03-06 Thread Alvaro Herrera
Masahiko Sawada wrote:
> On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera  
> wrote:

> > Therefore, I'm inclined to make this function raise a warning, then
> > return a substitute value (something like "unrecognized type XYZ").
> > [...]
> 
> I agree with you about not hiding the actual reason for the error but
> if we raise a warning at logicalrep_typmap_gettypname don't we call
> slot_store_error_callback recursively?

Hmm, now that you mention it, I don't really know.  I think it's
supposed not to happen, since calling ereport() again opens a new
recursion level, but then maybe errcontext doesn't depend on the
recursion level ... I haven't checked.  This is why the TAP test would
be handy :-)

> Agreed. Will add a TAP test.

Great.  This patch waits on that, then.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-06 Thread Tom Lane
David Gould  writes:
> On Thu, 01 Mar 2018 18:49:20 -0500
> Tom Lane  wrote:
>> The sticking point in my mind right now is, if we do that, what to do with
>> VACUUM's estimates.

> For what it's worth, I think the current estimate formula for VACUUM is
> pretty reasonable. Consider a table T with N rows and P pages clustered
> on serial key k. Assume reltuples is initially correct.

If the starting condition involves uniform tuple density throughout the
table, with matching reltuples/relpages ratio, then any set of changes
followed by one VACUUM will produce the right reltuples (to within
roundoff error) at the end.  This can be seen by recognizing that VACUUM
will visit every changed page, and the existing calculation is equivalent
to "assume the old tuple density is correct for the unvisited pages, and
then add on the measured tuple count within the visited pages".  I'm a bit
inclined to reformulate and redocument the calculation that way, in hopes
that people would find it more convincing.

However, things get less good if the initial state is nonuniform and
we do a set of updates that line up with the nonuniformity.  For
example, start with a uniformly full table, and update 50% of the
rows lying within the first 20% of the pages.  Now those 20% are
only half full of live tuples, and the table has grown by 10%, with
all those added pages full.  Do a VACUUM.  It will process the first
20% and the new 10% of pages, and arrive at a correct reltuples count
per the above argument.  But now, reltuples/relpages reflects an average
tuple density that's only about 90% of maximum.  Next, delete the
surviving tuples in the first 20% of pages, and again VACUUM.  VACUUM
will examine only the first 20% of pages, and find that they're devoid
of live tuples.  It will then update reltuples using the 90% density
figure as the estimate of what's in the remaining pages, and that's
too small, so that reltuples will drop to about 90% of the correct
value.

Lacking an oracle (small "o"), I do not think there's much we can do
about this, without resorting to very expensive measures such as
scanning the whole table.  (It's somewhat interesting to speculate
about whether scanning the table's FSM could yield useful data, but
I'm unsure that I'd trust the results much.)  The best we can do is
hope that correlated update patterns like this are uncommon.

Maybe this type of situation is an argument for trusting an ANALYZE-based
estimate more than the VACUUM-based estimate.  I remain uncomfortable with
that in cases where VACUUM looked at much more of the table than ANALYZE
did, though.  Maybe we need some heuristic based on the number of pages
actually visited by each pass?

regards, tom lane



Re: using index or check in ALTER TABLE SET NOT NULL

2018-03-06 Thread Alvaro Herrera
Sergei Kornilov wrote:
> Hello, Ildar
> Thanks. I looked ATTACH PARTITION tests and found such checks. If no one is 
> against i will just use in my patch INFO level instead DEBUG1, similarly 
> ATTACH PARTITION code. Updated patch attached.
> 
> Or i can rewrite tests to use DEBUG1 level.

You should be able to use an event trigger that raises a message when
table_rewrite is hit, to notify the test driver that a rewrite happens.

(If any DDL that causes a table rewrite fails to trigger the
table_rewrite event correctly, that is a bug.)

ISTM that depending on DEBUG messages is bad because debugging lines
added elsewhere will make your tests fail; and INFO is generally frowned
upon (I, for one, would frown upon an INFO message raised by ALTER
TABLE, for sure.) so let's not do that either.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Rewrite of pg_dump TAP tests

2018-03-06 Thread Alvaro Herrera
Stephen Frost wrote:
> Greetings,
> 
> * Stephen Frost (sfr...@snowman.net) wrote:
> > Attached is a patch (which applies cleaning against a2a2205, but not so
> > much anymore, obviously, but I will fix after the releases) which
> > greatly improves the big pg_dump TAP tests.  There's probably more which
> > can be done, but I expect people will be much happier with this.  The
> > cliff-notes are:
> 
> Attached is an updated patch which applies cleanly against current
> master.  I've not yet looked into back-patching these changes, but will
> if this can get some review and feedback, and folks like this better and
> are ok with the same being done in the back-branches.

Just looking at the commit message (sql script not dumped to screen),
and the fact that this removes the bulk of the like/unlike arrays, I'd
say this is a definite improvement over the existing tests.

Perhaps we'd like to tweak more later, but this is a good step forward
IMO.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

2018-03-06 Thread David Steele
On 3/6/18 7:04 AM, Teodor Sigaev wrote:
>> I agree with Teodor (upthread, not quoted here) that the documentation
>> could use some editing.
>>
>> I started to do it myself, but quickly realized I have no knowledge of
>> the content.  I'm afraid I would destroy the meaning while updating the
>> grammar.
>>
>> Anyone understand the subject matter well enough to review the
>> documentation?
> 
> Liudmila tried to improve docs in Alexander's patchset.
> 
> https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752c...@postgrespro.ru

This looks good to me with a few minor exceptions:

+   word_similarity(text, text) requires further
+   explanation. Consider the following example:

Maybe too verbose?  I think "word_similarity(text,
text) requires further explanation." can be removed entirely.

+   string.  However, this function does not add paddings to the

"add padding"

> BTW, adding Liudmila's message to commitfest task
> (https://commitfest.postgresql.org/17/1403/) doesn't work

Doesn't work for me either.

Alexander, can you post the final patches to the thread so they show up
in the CF app?

Thanks,
-- 
-David
da...@pgmasters.net



Re: MCV lists for highly skewed distributions

2018-03-06 Thread Dean Rasheed
On 6 March 2018 at 08:51, John Naylor  wrote:
> On 3/5/18, Dean Rasheed  wrote:
>> Attached is an updated patch.
> Nice. The results look good.

Thanks for the review.


> I agree it should be in a separate function. As for that large
> comment, I spent some time pouring over it to verify the math and make
> sure I understand it. I see a couple points where it might be a bit
> confusing to someone reading this code for the first time:
>
> +* This bound is at most 25, and approaches 0 as n approaches 0 or N. 
> The
> +* case where n approaches 0 cannot happen in practice, since the 
> sample
> +* size is at least 300.
>
> I think the implication is that the bound cannot dip below 10 (the
> stated minimum to justify using techniques intended for a normal
> distributed sample), so there's no need to code a separate clamp to
> ensure 10 values. That might be worth spelling out explicitly in the
> comment.
>

Perhaps I should really say that n can't be less than 300 unless N is
too, in which case n=N. So the only case where we need to worry about
the bound approaching zero is when n --> N, and we're sampling the
entire table, or almost all of it. I'll see if I can re-word that to
be clearer.


> +* size is at least 300.  The case where n approaches N corresponds to
> +* sampling the whole the table, in which case it is reasonable to 
> keep
> +* the whole MCV list (have no lower bound), so it makes sense to 
> apply
> +* this formula for all inputs, even though the above derivation is
> +* technically only valid when the right hand side is at least around 
> 10.
>
> It occurred to me that if the table cardinality is just barely above
> the sample size, we could get a case where a value sampled only once
> could slip into the MCV list. With the default stat target, this would
> be tables with between 30,000 and 31,500 rows.

(actually I think that's 31250 rows, or more generally when we sample
more than around 96% of the table)

> I couldn't induce this
> behavior, so I looked at the logic that identifies MCV candidates, and
> saw a test for
>
> if (dups_cnt > 1)
>
> If I'm reading that block correctly, than a value sampled once will
> never even be considered for the MCV list, so it seems that the
> defacto lower bound for these tables is two. It might be worth
> mentioning that in the comment.
>
> It also means that this clamp on HEAD
>
> -   if (mincount < 2)
> -   mincount = 2;
>
> is dead code.
>

Yes, in compute_scalar_stats(), each value is guaranteed to have been
seen at least twice. However, looking at compute_distinct_stats(),
that's not the case. I'm not sure if that matters.

When we have sampled 96% of the table, any estimate we produce based
on the number of times a value has been seen in the sample is likely
to be almost exact, even if it has only been seen once or twice. So
the estimates from the MCV list will all be good, but I suspect in
this case the estimates for all other values that didn't fit in the
MCV list will also be good, so it may not matter whether or not we
keep those MCV items. My instinct is to keep them, on the grounds that
the more information the planner has, the better.


>>   mcv_cutoff - The relative standard error cutoff percentage used (10,
>> 15, 20, ... 50), or -10, -15, ... -50 for the corresponding tests
>> against HEAD.
>
> I'm not quite following the negative numbers for HEAD. They're all the
> equivalent, right? Just a label for convenience to make sure you ran
> the same number of tests?
>

Yes, they should all be equivalent. They just reflect the way I ran
the tests -- HEAD vs the patch with a cutoff of 10%, HEAD vs the patch
with a cutoff of 15%, and so on. So I ended up running it against HEAD
9 times, and I didn't want to throw any of that data away. It's useful
for getting a feel for the scope of random variations between test
runs.

Regards,
Dean



Re: 2018-03 Commitfest Summary (Andres #1)

2018-03-06 Thread Fabien COELHO


Hello,


Please check your numbers before criticising someone unduly.


I did.  I filtered emails by threads, and counted the number of
messages.


I do not see how this is related to the number of patch submissions or the 
number of reviews posted, but it is certainly counting something.


The CF reviewer data are mostly accurate for me, and show that I do more 
reviews than submissions.


Now I'm not paid for this, and I only want to help. If the project think 
that I can help more by not contributing, it just has to ask.



because pgbench isn't overflow safe. I reported that, but you didn't
follow up with fixes.


Indeed. AFAICR you did it before, I think that I reviewed it, it was not a
period for which I had a lot of available time, and I did not feel it was
something that urgent to fix because there was no practical impact. I would
have done it later, probably.


It's still not fixed.


Then I apologise: I definitely missed something. I'll look into it, 
although it may be yet another patch submission.


--
Fabien.



Re: Kerberos test suite

2018-03-06 Thread Alvaro Herrera
Peter Eisentraut wrote:
> On 3/5/18 21:08, Michael Paquier wrote:
> > +my $kdc_port = int(rand() * 16384) + 49152;
> > That may not be worth worrying as that's an ephemeral port range but it
> > can make the test a bit unstable...
> 
> This is what we've been using in the other tests as well.  It's clearly
> not optimal, but making it more robust in a platform-agnostic way seems
> tricky.

We have port-testing code in PostgresNode.pm::get_new_node; maybe that
could be taking out of there into TestLib.pm?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Prefix operator for text and spgist support

2018-03-06 Thread Arthur Zakirov
On Mon, Feb 19, 2018 at 05:19:15PM +0300, Ildus Kurbangaliev wrote:
> At brief look at this place seems better to move this block into
> pattern_fixed_prefix function. But there is also `vartype` variable
> which used to in prefix construction, and it would require pass this
> variable too. And since pattern_fixed_prefix called in 10 other places
> and vartype is required only for this ptype it seems better just keep
> this block outside of this function.

Understood.

> I've added documentation in current version of the patch.

Thank you.

Can you rebase the patch due to changes within pg_proc.h?

Also here

+   
+There is also the prefix operator ^@ and corresponding
+text_startswith function which covers cases when only
+searching by beginning of the string is needed.
+   

I think text_startswith should be enclosed with the  tag. I'm
not sure, but I think  used for operators, keywords, etc. I
haven't found a manual which describes how to use tags, but after looking
at the documentation where  is used, I think that for function
 should be used.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2018-03-06 Thread Tom Lane
Alvaro Herrera  writes:
> David Steele wrote:
>> Based on Tom's feedback, and hearing no opinions to the contrary, I have
>> marked this patch Rejected.

> I think I opine contrarywise, but I haven't made time to review the
> status of this in detail.  I'm fine with keeping it rejected for now,
> but I reserve the option to revive it in the future.

I'm fine with reviving it if someone can find a way around the new-
reserved-word problem.  But that's gonna be a bit hard given that
the patch wants to do 

database_name:
ColId
| CURRENT_DATABASE

You might be able to preserve the accessibility of the current_database()
function by making CURRENT_DATABASE into a type_func_name_keyword instead
of a fully-reserved word.  But that's ugly (I think you'd need a single-
purpose production for it to be used as a function), and you've still
broken any SQL code using "current_database" as a table or column name.
I'm dubious that the remaining use-case for the feature is worth it.

regards, tom lane



Re: Kerberos test suite

2018-03-06 Thread Robbie Harwood
Peter Eisentraut  writes:

> On 3/5/18 21:08, Michael Paquier wrote:
>
>> Perhaps the tests should be skipped on Windows or just produce an error?
>> Like LDAP tests, libraries are supported on Windows but the hardcoded
>> paths make things harder to handle there.
>
> Hmm, why couldn't someone install MIT krb5 on Windows and give it a try?
>  We don't need to block the platform outright.

krb5kdc doesn't support windows; only the client portion works there.

Thanks,
--Robbie


signature.asc
Description: PGP signature


Re: pgsql: Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)

2018-03-06 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Thomas Munro wrote:
> > On Tue, Mar 6, 2018 at 11:39 AM, Alvaro Herrera  
> > wrote:
> > > Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)
> > 
> > Is this commit responsible for this valgrind error?
> > 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack=2018-03-05%2023%3A03%3A01

> Uh ... it probably is.  I'll take a look tomorrow, but my guess is that
> my last minute removal of the conversion to Name is to blame.

I was right about the cause.  Fix pushed.  Thanks for notifying.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-03-06 Thread Alvaro Herrera
David Rowley wrote:
> On 6 March 2018 at 11:43, Alvaro Herrera  wrote:

> > 4. See elsewhere in the thread about list_copy vs. list_concat :-)
> 
> I saw that. Thanks for fixing. The only weird thing I see in the
> changes is that the comment here claims it makes a copy, but it does
> not.
> 
> + * Right now, there's nothing to do here, so we just copy the list.

Agreed, changed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PROPOSAL] Shared Ispell dictionaries

2018-03-06 Thread Arthur Zakirov
On Wed, Feb 07, 2018 at 07:28:29PM +0300, Arthur Zakirov wrote:
> Here is rebased version of the patch due to changes into dict_ispell.c.
> The patch itself wasn't changed.

Here is rebased version of the patch due to changes within pg_proc.h.
I haven't implemented a mmap prototype yet, though.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index b9fdd77e19..e071994523 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -78,6 +78,8 @@
 #define tmpalloc(sz)  MemoryContextAlloc(Conf->buildCxt, (sz))
 #define tmpalloc0(sz)  MemoryContextAllocZero(Conf->buildCxt, (sz))
 
+#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str))
+
 /*
  * Prepare for constructing an ISpell dictionary.
  *
@@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char 
*flag)
Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + 
strlen(word) + 1);
strcpy(Conf->Spell[Conf->nspell]->word, word);
Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0')
-   ? cpstrdup(Conf, flag) : VoidString;
+   ? tmpstrdup(flag) : VoidString;
Conf->nspell++;
 }
 
@@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, 
CompoundAffixFlag *entry,
entry->flag.i = i;
}
else
-   entry->flag.s = cpstrdup(Conf, s);
+   entry->flag.s = tmpstrdup(s);
 
entry->flagMode = Conf->flagMode;
entry->value = val;
@@ -1536,6 +1538,9 @@ nextline:
return;
 
 isnewformat:
+   pfree(recoded);
+   pfree(pstr);
+
if (oldformat)
ereport(ERROR,
(errcode(ERRCODE_CONFIG_FILE_ERROR),
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 259a2d83b4..439d2cdf87 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1364,6 +1364,35 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_shared_dictionaries_size 
(integer)
+  
+   max_shared_dictionaries_size configuration 
parameter
+  
+  
+  
+   
+Sets the maximum size of all text search dictionaries loaded into 
shared
+memory.  The default is 100 megabytes (100MB).  This
+parameter can only be set at server start.
+   
+
+   
+Currently controls only loading of Ispell
+dictionaries (see ).
+After compiling the dictionary it will be copied into shared memory.
+Another backends on first use of the dictionary will use it from shared
+memory, so it doesn't need to compile the dictionary second time.
+   
+
+   
+If total size of simultaneously loaded dictionaries reaches the maximum
+allowed size then a new dictionary will be loaded into local memory of
+a backend.
+   
+  
+ 
+
  
   huge_pages (enum)
   
diff --git a/src/backend/commands/tsearchcmds.c 
b/src/backend/commands/tsearchcmds.c
index 3a843512d1..b6aeae449b 100644
--- a/src/backend/commands/tsearchcmds.c
+++ b/src/backend/commands/tsearchcmds.c
@@ -39,6 +39,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "tsearch/ts_cache.h"
+#include "tsearch/ts_shared.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -396,7 +397,8 @@ verify_dictoptions(Oid tmplId, List *dictoptions)
 * Call the init method and see if it complains.  We don't 
worry about
 * it leaking memory, since our command will soon be over 
anyway.
 */
-   (void) OidFunctionCall1(initmethod, 
PointerGetDatum(dictoptions));
+   (void) OidFunctionCall2(initmethod, 
PointerGetDatum(dictoptions),
+   
ObjectIdGetDatum(InvalidOid));
}
 
ReleaseSysCache(tup);
@@ -513,6 +515,8 @@ RemoveTSDictionaryById(Oid dictId)
 
CatalogTupleDelete(relation, >t_self);
 
+   ts_dict_shmem_release(dictId);
+
ReleaseSysCache(tup);
 
heap_close(relation, RowExclusiveLock);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..c7dce8cac5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -44,6 +44,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
+#include "tsearch/ts_shared.h"
 #include "utils/backend_random.h"
 #include "utils/snapmgr.h"
 
@@ -150,6 +151,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
size = add_size(size, SyncScanShmemSize());
size = add_size(size, AsyncShmemSize());
size = add_size(size, BackendRandomShmemSize());
+   size = add_size(size, TsearchShmemSize());
 #ifdef EXEC_BACKEND
size = 

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-06 Thread Tom Lane
David Gould  writes:
> On Sun, 4 Mar 2018 07:49:46 -0800
> Jeff Janes  wrote:
>> I don't see how it could have caused the problem in the first place.  In
>> your demonstration case, you had to turn off autovac in order to get it to
>> happen, and then when autovac is turned back on, it is all primed for an
>> autovac to launch, go through, touch almost all of the pages, and fix it
>> for you.  How did your original table get into a state where this wouldn't
>> happen?

> One more way for this to happen, vacuum was including the dead tuples in the
> estimate in addition to the live tuples.

FWIW, I've been continuing to think about this and poke at your example,
and I am having the same difficulty as Jeff.  While it's clear that if you
managed to get into a state with wildly inflated reltuples, ANALYZE would
fail to get out of it, it's not clear how you could get to such a state
without additional contributing factors.  This ANALYZE behavior only seems
to result in an incremental increase in reltuples per run, and so that
shouldn't prevent autovacuum from happening and fixing the estimate ---
maybe not as quickly as it should happen, but it'd happen.

The reasons I'm harping on this are (a) if there are additional bugs
contributing to the problem, we need to identify them and fix them;
(b) we need to understand what the triggering conditions are in some
detail, so that we can decide whether this bug is bad enough to justify
back-patching a behavioral change.  I remain concerned that the proposed
fix is too simplistic and will have some unforeseen side-effects, so
I'd really rather just put it in HEAD and let it go through a beta test
cycle before it gets loosed on the world.

Another issue I have after thinking more is that we need to consider
what should happen during a combined VACUUM+ANALYZE.  In this situation,
with the proposed patch, we'd overwrite VACUUM's result with an estimate
derived from ANALYZE's sample ... even though VACUUM's result might've
come from a full-table scan and hence be exact.  In the existing code
a small ANALYZE sample can't do a lot of damage to VACUUM's result, but
that would no longer be true with this.  I'm inclined to feel that we
should trust VACUUM's result for reltuples more than ANALYZE's, on the
grounds that if there actually was any large change in reltuples, VACUUM
would have looked at most of the pages and hence have a more reliable
number.  Hence I think we should skip the pg_class update for ANALYZE if
it's in a combined VACUUM+ANALYZE, at least unless ANALYZE looked at all
(most of?) the pages.  That could be mechanized with something like

-   if (!inh)
+   if (!inh && !(options & VACOPT_VACUUM))

controlling do_analyze_rel's call to vac_update_relstats, maybe with a
check on scanned_pages vs total_pages.  Perhaps the call to
pgstat_report_analyze needs to be filtered similarly (or maybe we still
want to report that an analyze happened, but somehow tell the stats
collector not to change its counts?)

Also, as a stylistic matter, I'd be inclined not to touch
vac_estimate_reltuples' behavior.  The place where the rubber is meeting
the road is

*totalrows = vac_estimate_reltuples(onerel, true,
totalblocks,
bs.m,
liverows);
if (bs.m > 0)
*totaldeadrows = floor((deadrows / bs.m) * totalblocks + 0.5);
else
*totaldeadrows = 0.0;

and it seems to me it'd make more sense to abandon the use of
vac_estimate_reltuples entirely, and just calculate totalrows in a fashion
exactly parallel to totaldeadrows.  (I think that's how the code used to
look here ...)

In HEAD, we could then drop vac_estimate_reltuples' is_analyze argument
altogether, though that would be unwise in the back branches (if we
back-patch) since we have no idea whether any third party code is calling
this function.

regards, tom lane



Re: Kerberos test suite

2018-03-06 Thread Peter Eisentraut
On 3/5/18 21:08, Michael Paquier wrote:
> +my $kdc_port = int(rand() * 16384) + 49152;
> That may not be worth worrying as that's an ephemeral port range but it
> can make the test a bit unstable...

This is what we've been using in the other tests as well.  It's clearly
not optimal, but making it more robust in a platform-agnostic way seems
tricky.

> Perhaps the tests should be skipped on Windows or just produce an error?
> Like LDAP tests, libraries are supported on Windows but the hardcoded
> paths make things harder to handle there.

Hmm, why couldn't someone install MIT krb5 on Windows and give it a try?
 We don't need to block the platform outright.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Kerberos test suite

2018-03-06 Thread Peter Eisentraut
On 3/5/18 16:34, Thomas Munro wrote:
> On Tue, Mar 6, 2018 at 8:45 AM, Peter Eisentraut
>  wrote:
>> New patch attached.
> 
> Passes here.  LGTM.

committed

> Only complaint is your assumption that 'darwin' implies HomeBrew
> installation paths, but you already did that in other tests before
> this one.  Perhaps we can sort that out globally in some future patch.

right

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Speed up the removal of WAL files

2018-03-06 Thread Fujii Masao
On Wed, Feb 21, 2018 at 5:27 PM, Tsunakawa, Takayuki
 wrote:
> From: Michael Paquier [mailto:mich...@paquier.xyz]
> It seems to me that you would reintroduce partially the problems that
>> 1d4a0ab1 has fixed.  In short, if a crash happens in the code paths calling
>> RemoveXlogFile with durable = false before fsync'ing pg_wal, then a rename
>> has no guarantee to be durable, so you could finish again with a file that
>> as an old name, but new contents.  A crucial thing which matters for a rename
>
> Hmm, you're right.  Even during recovery, RemoveOldXlogFiles() can't skip 
> fsyncing pg_wal/ because new WAL records streamed from the master are written 
> to recycled WAL files.
>
> After all, it seems to me that we have to stand with the current patch which 
> only handles RemoveNonParentXlogFiles().

But the approach that the patch uses would cause the performance problem
as Horiguchi-san pointed out upthread.

So, what about, as another approach, making the checkpointer instead of
the startup process call RemoveNonParentXlogFiles() when end-of-recovery
checkpoint is executed? ISTM that a recovery doesn't need to wait for
RemoveNonParentXlogFiles() to end. Instead, RemoveNonParentXlogFiles()
seems to have to complete before the checkpointer calls RemoveOldXlogFiles()
and creates .ready files for the "garbage" WAL files on the old timeline.
So it seems natual to leave that WAL recycle task to the checkpointer.

Regards,

-- 
Fujii Masao



Re: using index or check in ALTER TABLE SET NOT NULL

2018-03-06 Thread Sergei Kornilov
Hello, Ildar
Thanks. I looked ATTACH PARTITION tests and found such checks. If no one is 
against i will just use in my patch INFO level instead DEBUG1, similarly ATTACH 
PARTITION code. Updated patch attached.

Or i can rewrite tests to use DEBUG1 level.

regards, Sergeidiff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index afe2139..783a0ef 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -215,8 +215,15 @@ WITH ( MODULUS numeric_literal, REM
 
  
   These forms change whether a column is marked to allow null
-  values or to reject null values.  You can only use SET
-  NOT NULL when the column contains no null values.
+  values or to reject null values.
+ 
+
+ 
+  You can only use SET NOT NULL when the column 
+  contains no null values. Full table scan is performed to check 
+  this condition unless there are valid CHECK 
+  constraints that prohibit NULL values for specified column.
+  In the latter case table scan is skipped.
  
 
  
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index fcf7655..4fec36d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1258,7 +1258,7 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 	 * not contain any row that would belong to the new partition, we can
 	 * avoid scanning the default partition.
 	 */
-	if (PartConstraintImpliedByRelConstraint(default_rel, def_part_constraints))
+	if (ConstraintImpliedByRelConstraint(default_rel, def_part_constraints))
 	{
 		ereport(INFO,
 (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
@@ -1302,8 +1302,8 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 			 * that it will not contain any row that would belong to the new
 			 * partition, we can avoid scanning the child table.
 			 */
-			if (PartConstraintImpliedByRelConstraint(part_rel,
-	 def_part_constraints))
+			if (ConstraintImpliedByRelConstraint(part_rel,
+ def_part_constraints))
 			{
 ereport(INFO,
 		(errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 74e020b..c35539a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -162,7 +162,7 @@ typedef struct AlteredTableInfo
 	/* Information saved by Phases 1/2 for Phase 3: */
 	List	   *constraints;	/* List of NewConstraint */
 	List	   *newvals;		/* List of NewColumnValue */
-	bool		new_notnull;	/* T if we added new NOT NULL constraints */
+	bool		verify_new_notnull; /* T if we should recheck NOT NULL */
 	int			rewrite;		/* Reason for forced rewrite, if any */
 	Oid			newTableSpace;	/* new tablespace; 0 means no change */
 	bool		chgPersistence; /* T if SET LOGGED/UNLOGGED is used */
@@ -377,6 +377,7 @@ static ObjectAddress ATExecDropNotNull(Relation rel, const char *colName, LOCKMO
 static void ATPrepSetNotNull(Relation rel, bool recurse, bool recursing);
 static ObjectAddress ATExecSetNotNull(AlteredTableInfo *tab, Relation rel,
  const char *colName, LOCKMODE lockmode);
+static bool NotNullImpliedByRelConstraints(Relation rel, Form_pg_attribute attr);
 static ObjectAddress ATExecColumnDefault(Relation rel, const char *colName,
 	Node *newDefault, LOCKMODE lockmode);
 static ObjectAddress ATExecAddIdentity(Relation rel, const char *colName,
@@ -4371,7 +4372,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode)
 			 * Test the current data within the table against new constraints
 			 * generated by ALTER TABLE commands, but don't rebuild data.
 			 */
-			if (tab->constraints != NIL || tab->new_notnull ||
+			if (tab->constraints != NIL || tab->verify_new_notnull ||
 tab->partition_constraint != NULL)
 ATRewriteTable(tab, InvalidOid, lockmode);
 
@@ -4532,7 +4533,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	}
 
 	notnull_attrs = NIL;
-	if (newrel || tab->new_notnull)
+	if (newrel || tab->verify_new_notnull)
 	{
 		/*
 		 * If we are rebuilding the tuples OR if we added any new NOT NULL
@@ -5545,7 +5546,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		 * null, but since it's filled specially, there's no need to test
 		 * anything.)
 		 */
-		tab->new_notnull |= colDef->is_not_null;
+		tab->verify_new_notnull |= colDef->is_not_null;
 	}
 
 	/*
@@ -5948,8 +5949,17 @@ ATExecSetNotNull(AlteredTableInfo *tab, Relation rel,
 
 		CatalogTupleUpdate(attr_rel, >t_self, tuple);
 
-		/* Tell Phase 3 it needs to test the constraint */
-		tab->new_notnull = true;
+		/*
+		 * Verifying table data can be done in Phase 3 with single table scan
+		 * for all constraint (both existing and new)
+		 * We can skip this check only if every new 

Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2018-03-06 Thread David Steele
Hi Álvaro,

On 3/6/18 10:25 AM, Alvaro Herrera wrote:
> David Steele wrote:
> 
>> On 3/1/18 2:09 PM, Tom Lane wrote:
>>
>>> TBH, I think we should reject this patch.  While it's not huge,
>>> it's not trivial either, and I find the grammar changes rather ugly.
>>> The argument for using the feature to fix pg_dump issues has evaporated,
>>> but I don't see anything in the discussion suggesting that people see
>>> a need for it beyond that.
> 
>> Based on Tom's feedback, and hearing no opinions to the contrary, I have
>> marked this patch Rejected.
> 
> I think I opine contrarywise, but I haven't made time to review the
> status of this in detail.  I'm fine with keeping it rejected for now,
> but I reserve the option to revive it in the future.

Absolutely.

>From my perspective reviving a patch is pretty much always an option.
I'm attempting to update patches based on what I see as the current
status, but my decision is certainly not final and I do make mistakes.

Regards,
-- 
-David
da...@pgmasters.net



Re: [HACKERS] pg_upgrade to clusters with a different WAL segment size

2018-03-06 Thread Bossart, Nathan
Thanks for taking a look.

On 3/3/18, 12:22 PM, "Peter Eisentraut"  
wrote:
> On 2/7/18 13:34, Bossart, Nathan wrote:
>> Here is a patch to allow users to change the WAL segment size of a cluster 
>> with
>> pg_resetwal.  Like the '--wal-segize' option in initdb, the new '-s' option
>> accepts segment sizes in megabytes.
>
> Or how about we call the new option, um, --wal-segsize?

It doesn't look like pg_resetwal takes any long options except for
--help and --version, although I suppose those could be added as part
of this effort.

>> The first is a division-by-zero error caused when the control data must be
>> guessed.  If the control data is damaged, WalSegSz will not be set to the
>> guessed WAL segment size, resulting in an error during XLogFromFileName().  
>> The
>> attached patch resolves this issue by ensuring WalSegSz is set to a valid 
>> value
>> after either reading or guessing the control values.
>
> I noticed this issue in pg_controldata too.  Maybe that should be
> combined in a separate patch.

IIUC you are talking about the following code in pg_controldata:

/*
 * Calculate name of the WAL file containing the latest checkpoint's 
REDO
 * start point.
 */
XLByteToSeg(ControlFile->checkPointCopy.redo, segno, WalSegSz);
XLogFileName(xlogfilename, ControlFile->checkPointCopy.ThisTimeLineID,
 segno, WalSegSz);

If the control file says that the WAL segment size is 0 bytes, then
XLByteToSeg() will indeed fail.  I think a similar issue is still
present for XLogFromFileName() in pg_resetwal even with this patch.
For pg_controldata, perhaps the affected output ("Latest checkpoint's
REDO WAL file") should be marked unknown if the control file specifies
a 0-byte WAL segment size.  For pg_resetwal, perhaps the WAL segment
size should be "guessed" in this case.

> The patch "Configurable file mode mask" contains the beginning of a test
> suite for pg_resetwal.  It would be great if we could get a test case
> for this new functionality implemented.

I agree.  I will take a look at that patch set.

Nathan



Re: JIT compiling with LLVM v11

2018-03-06 Thread Peter Eisentraut
With the build issues in check, I'm looking at the configuration settings.

I think taking the total cost as the triggering threshold is probably
good enough for a start.  The cost modeling can be refined over time.

We should document that both jit_optimize_above_cost and
jit_inline_above_cost require jit_above_cost to be set, or otherwise
nothing happens.

One problem I see is that if someone sets things like
enable_seqscan=off, the artificial cost increase created by those
settings would quite likely bump the query over the jit threshold, which
would alter the query performance characteristics in a way that the user
would not have intended.  I don't have an idea how to address this right
now.

I ran some performance assessments:

merge base (0b1d1a038babff4aadf0862c28e7b667f1b12a30)

make installcheck  3.14s user 3.34s system 17% cpu 37.954 total

jit branch default settings

make installcheck  3.17s user 3.30s system 13% cpu 46.596 total

jit_above_cost=0

make installcheck  3.30s user 3.53s system 5% cpu 1:59.89 total

jit_optimize_above_cost=0 (and jit_above_cost=0)

make installcheck  3.44s user 3.76s system 1% cpu 8:12.42 total

jit_inline_above_cost=0 (and jit_above_cost=0)

make installcheck  3.32s user 3.62s system 2% cpu 5:35.58 total

One can see the CPU savings quite nicely.

One obvious problem is that with the default settings, the test suite
run gets about 15% slower.  (These figures are reproducible over several
runs.)  Is there some debugging stuff turned on that would explain this?
 Or would just loading the jit module in each session cause this?

>From the other results, we can see that one clearly needs quite a big
database to see a solid benefit from this.  Do you have any information
gathered about this so far?  Any scripts to create test databases and
test queries?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2018-03-06 Thread Alvaro Herrera
David Steele wrote:

> On 3/1/18 2:09 PM, Tom Lane wrote:
>
> > TBH, I think we should reject this patch.  While it's not huge,
> > it's not trivial either, and I find the grammar changes rather ugly.
> > The argument for using the feature to fix pg_dump issues has evaporated,
> > but I don't see anything in the discussion suggesting that people see
> > a need for it beyond that.

> Based on Tom's feedback, and hearing no opinions to the contrary, I have
> marked this patch Rejected.

I think I opine contrarywise, but I haven't made time to review the
status of this in detail.  I'm fine with keeping it rejected for now,
but I reserve the option to revive it in the future.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Re: Re: [HACKERS] Cached plans and statement generalization

2018-03-06 Thread David Steele
On 3/2/18 9:26 AM, David Steele wrote:
> On 1/12/18 7:53 AM, Konstantin Knizhnik wrote:
>>
>>
>> On 12.01.2018 03:40, Thomas Munro wrote:
>>> On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost 
>>> wrote:
 * Konstantin Knizhnik (k.knizh...@postgrespro.ru) wrote:
> Updated version of the patch is attached.
 This patch appears to apply with just a bit of fuzz and make check
 passes, so I'm not sure why this is currently marked as 'Waiting for
 author'.

 I've updated it to be 'Needs review'.  If that's incorrect, feel free to
 change it back with an explanation.
>>> Hi Konstantin,
>>>
>>> /home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249:
>>>
>>> undefined reference to `PortalGetHeapMemory'
>>>
>>> That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed
>>> PortalGetHeapMemory.  Looks like it needs to be replaced with
>>> portal->portalContext.
>>>
>> Hi  Thomas,
>>
>> Thank you very much for reporting the problem.
>> Rebased version of the patch is attached.
> 
> This patch has received no review or comments since last May and appears
> too complex and invasive for the final CF of PG11.
> 
> I don't think it makes sense to keep pushing a patch through CFs when it
> is not getting reviewed.  I'm planning to mark this as Returned with
> Feedback unless there are solid arguments to the contrary.

Marked as Returned with Feedback.

Regards,
-- 
-David
da...@pgmasters.net



Re: WIP Patch: Pgbench Serialization and deadlock errors

2018-03-06 Thread David Steele
Hi Marina,

On 3/6/18 4:45 AM, Marina Polyakova wrote:
> On 05-03-2018 18:21, David Steele wrote:
>> Hello Marina,
> 
> Hello, David!
> 
>> On 1/12/18 12:01 PM, Marina Polyakova wrote:
> ...
>>
>> This patch was marked Waiting on Author on Jan 8 and no new patch was
>> submitted before this commitfest.
>>
>> I think we should mark this patch as Returned with Feedback.
> 
> I'm sorry, I was very busy with the patch to precalculate stable
> functions.. I'm working on a new version of this patch for pgbench
> unless, of course, it's too late for v11.

Understood, but in fairness to other authors who got their patches
pushed for not providing a new patch before the CF, we should push this
one as well.

Since this is in Waiting on Author state I have marked it Returned with
Feedback.  Please submit to the next CF when you have a new patch.

Regards,
-- 
-David
da...@pgmasters.net



Re: Add default role 'pg_access_server_files'

2018-03-06 Thread Stephen Frost
Magnus, all,

* Magnus Hagander (mag...@hagander.net) wrote:
> On Tue, Jan 2, 2018 at 1:08 PM, Stephen Frost  wrote:
> > Suggestions on a name for this..?  pg_server_copy_program?
> 
> Presumably it would also be used in postgres_fdw, so that seems like a bad
> name. Maybe pg_exec_server_command?

I went with 'pg_execute_server_program', since 'program' is what we use
in the COPY syntax, et al.

Attached is an updated patch which splits up the permissions as
suggested up-thread by Magnus:

The default roles added are:

* pg_read_server_files
* pg_write_server_files
* pg_execute_server_program

Reviews certainly welcome.

Thanks!

Stephen
From 02c13fcb8a41f320f178fad29e9949f3846420ce Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Sun, 31 Dec 2017 14:01:12 -0500
Subject: [PATCH] Add default roles for file/program access

This patch adds new default roles names 'pg_read_server_files',
'pg_write_server_files', 'pg_execute_server_program' which
allow an administrator to GRANT to a non-superuser role the ability to
access server-side files or run programs through PostgreSQL (as the user
the database is running as).  Having one of these roles allows a
non-superuser to use server-side COPY to read, write, or with a program,
and to use file_fdw (if installed by a superuser and GRANT'd USAGE on
it) to read from files or run a program.

Further, this patch moves the privilege check for the remaining misc
file functions from explicit superuser checks to the GRANT system,
similar to what's done for pg_ls_logdir() and others.  Lastly, these
functions are changed to allow a user with the 'pg_read_server_files'
role to be able to access files outside of the PG data directory.
---
 contrib/file_fdw/file_fdw.c  | 51 +++-
 doc/src/sgml/file-fdw.sgml   |  8 +++---
 doc/src/sgml/func.sgml   | 17 ++--
 doc/src/sgml/ref/copy.sgml   |  8 --
 src/backend/catalog/system_views.sql | 14 ++
 src/backend/commands/copy.c  | 37 ++
 src/backend/utils/adt/genfile.c  | 30 +++--
 src/include/catalog/pg_authid.h  |  6 +
 8 files changed, 109 insertions(+), 62 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index cf0a3629bc..4176d98aeb 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -18,6 +18,7 @@
 #include "access/htup_details.h"
 #include "access/reloptions.h"
 #include "access/sysattr.h"
+#include "catalog/pg_authid.h"
 #include "catalog/pg_foreign_table.h"
 #include "commands/copy.h"
 #include "commands/defrem.h"
@@ -201,24 +202,6 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	List	   *other_options = NIL;
 	ListCell   *cell;
 
-	/*
-	 * Only superusers are allowed to set options of a file_fdw foreign table.
-	 * This is because we don't want non-superusers to be able to control
-	 * which file gets read or which program gets executed.
-	 *
-	 * Putting this sort of permissions check in a validator is a bit of a
-	 * crock, but there doesn't seem to be any other place that can enforce
-	 * the check more cleanly.
-	 *
-	 * Note that the valid_options[] array disallows setting filename and
-	 * program at any options level other than foreign table --- otherwise
-	 * there'd still be a security hole.
-	 */
-	if (catalog == ForeignTableRelationId && !superuser())
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- errmsg("only superuser can change options of a file_fdw foreign table")));
-
 	/*
 	 * Check that only options supported by file_fdw, and allowed for the
 	 * current object type, are given.
@@ -264,6 +247,38 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
 		 errmsg("conflicting or redundant options")));
+
+			/*
+			 * Check permissions for changing which file or program is used by
+			 * the file_fdw.
+			 *
+			 * Only members of the role 'pg_read_server_files' are allowed to
+			 * set the 'filename' option of a file_fdw foreign table, while
+			 * only members of the role 'pg_execute_server_program' are
+			 * allowed to set the 'program' option.  This is because we don't
+			 * want regular users to be able to control which file gets read
+			 * or which program gets executed.
+			 *
+			 * Putting this sort of permissions check in a validator is a bit
+			 * of a crock, but there doesn't seem to be any other place that
+			 * can enforce the check more cleanly.
+			 *
+			 * Note that the valid_options[] array disallows setting filename
+			 * and program at any options level other than foreign table ---
+			 * otherwise there'd still be a security hole.
+			 */
+			if (strcmp(def->defname, "filename") == 0 &&
+!is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_SERVER_FILES))
+ereport(ERROR,
+		(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+		 errmsg("only superuser or a member of the 

Re: BUG #14941: Vacuum crashes

2018-03-06 Thread Bossart, Nathan
On 3/5/18, 7:58 PM, "Andres Freund"  wrote:
> I've pushed the first one now. There seems to be no reason to wait with
> it, even it takes a bit till we can get the second part right.

Thanks!

Nathan



Re: Re: [HACKERS] Can ICU be used for a database's default sort order?

2018-03-06 Thread David Steele
On 3/2/18 1:14 AM, Andres Freund wrote:
> 
> On 2018-02-10 20:45:40 +0500, Andrey Borodin wrote:
>> I've contacted Postgres Professional. Marina Polyakova had kindly provided 
>> their patch.
>> The patch allows to use libc locale with ICU collation as default for 
>> cluster or database.
>>
>> It seems that this patch brings important long-awaited feature and deserves 
>> to be included in last v11 commitfest.
>> Peter, everyone, do you agree with this? Or should we better adapt this work 
>> through v12 cycle?
>>
>> I'm planning to provide review asap and do necessary changes if required 
>> (this was discussed with Marina and Postgres Professional).
> 
> This patch was submitted for the last v11 commitfest, it's not a trivial
> patch, and hasn't yet been reviewed. I'm afraid the policy is that large
> patches shouldn't be submitted for the last commitfest...  Thus I think
> this should be moved to the next one.

This patch has been moved to the next CF.

Regards,
-- 
-David
da...@pgmasters.net



Re: Re: [PATCH] Add support for ON UPDATE/DELETE actions on ALTER CONSTRAINT

2018-03-06 Thread David Steele
Hi Matheus,

On 3/3/18 1:32 PM, Peter Eisentraut wrote:
> On 2/20/18 10:10, Matheus de Oliveira wrote:
>> Besides that, there is a another change in this patch on current ALTER
>> CONSTRAINT about deferrability options. Previously, if the user did
>> ALTER CONSTRAINT without specifying an option on deferrable or
>> initdeferred, it was implied the default options, so this:
>>
>>     ALTER TABLE tbl
>>     ALTER CONSTRAINT con_name;
>>
>> Was equivalent to:
>>
>>     ALTER TABLE tbl
>>     ALTER CONSTRAINT con_name NOT DEFERRABLE INITIALLY IMMEDIATE;
> 
> Oh, that seems wrong.  Probably, it shouldn't even accept that syntax
> with an empty options list, let alone reset options that are not
> mentioned.  Can you prepare a separate patch for this issue?

Can you prepare the patch that Peter has requested and post on a new
thread?  Please respond here with the reference (or email me directly)
and I will add to the CF.

Meanwhile, I'll push this patch to the next CF as Andres has
recommended, hearing no arguments to the contrary.

Thanks,
-- 
-David
da...@pgmasters.net



Re: Re: Boolean partitions syntax

2018-03-06 Thread David Steele
Hi Amit,

On 3/2/18 2:27 AM, Amit Langote wrote:
> On 2018/03/02 15:58, Andres Freund wrote:
>> On 2018-02-02 17:00:24 -0500, Tom Lane wrote:
>>> Peter Eisentraut  writes:
 There might be other options, but one way to solve this would be to
 treat partition bounds as a general expression in the grammar and then
 check in post-parse analysis that it's a constant.
>>>
>>> That's pretty much what I said upthread.  What I basically don't like
>>> about the current setup is that it's assuming that the bound item is
>>> a bare literal.  Even disregarding future-extension issues, that's bad
>>> because it can't result in an error message smarter than "syntax error"
>>> when someone tries the rather natural thing of writing a more complicated
>>> expression.
>>
>> Given the current state of this patch, with a number of senior
>> developers disagreeing with the design, and the last CF being in
>> progress, I think we should mark this as returned with feedback.
> 
> I see no problem with pursuing this in the next CF if the consensus is
> that we should fix how partition bounds are parsed, instead of adopting
> one of the patches to allow the Boolean literals to be accepted as
> partition bounds.

I'm inclined to mark this patch Returned with Feedback unless I hear
opinions to the contrary.

> That said, after seeing David Rowley's post earlier today [2], it seems
> that we may need to consider this issue a bug rather than a new feature.

Perhaps that should be handled as a bug fix.  Does this patch answer the
need or should a new one be developed?

Thanks,
-- 
-David
da...@pgmasters.net



Re: using index or check in ALTER TABLE SET NOT NULL

2018-03-06 Thread Ildar Musin



On 06.03.2018 16:12, Sergei Kornilov wrote:

Hello thank you for review!


Adding check constraint will also force the full table scan. So I
think it would be better to rephrased it as follows:

Agree. I updated docs in new attached patch slightly different


Regarding regression tests it may be useful to set
client_min_messages to 'debug1' before setting "NOT NULL" attribute
for a column. In this case you can tell for sure that
NotNullImpliedByRelConstraints() returned true (i.e. your code
actually did the job) as the special debug message is printed to
the log.

I can not find any debug messages in tests: grep client_min_messages
-irn src/test/ Only some warning level and few error. So i verify in
regression tests only top-level behavior. Or this paragraph was for
other people, not for tests?



Sorry, probably I didn't explain it clearly enough. I meant that your
regression tests can't distinguish cases when the full table scan was
actually performed from the ones when it was skipped due to
NotNullImpliedByRelConstraints() check. For instance, consider this
piece from the test suite:


# create table atacc1 (test_a int, test_b int);
CREATE TABLE
...

# alter table atacc1 add constraint atacc1_constr_a_valid check(test_a
is not null);
ALTER TABLE

# alter table atacc1 alter test_a set not null;
ALTER TABLE


It is not obvious that full table scan was omitted. But if we set
client_min_messages to 'debug1', we'll be able to see what is really
happened by the different debug messages. For example:


# create table atacc1 (test_a int, test_b int);
CREATE TABLE
...

# set client_min_messages to 'debug1';
SET

# alter table atacc1 alter test_a set not null;
DEBUG:  verifying table "atacc1"    full table scan!
ALTER TABLE

# alter table atacc1 alter test_a drop not null;
ALTER TABLE

# alter table atacc1 add constraint atacc1_constr_a_valid check(test_a
is not null);
DEBUG:  verifying table "atacc1"
ALTER TABLE

# alter table atacc1 alter test_a set not null;
DEBUG:  verifying table "atacc1" NOT NULL constraint on test_a attribute
by existed constraints    full scan was skipped!
ALTER TABLE


--
Ildar Musin
i.mu...@postgrespro.ru



Re: Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-03-06 Thread David Steele
Hi Nikita,

On 3/2/18 1:35 AM, Andres Freund wrote:
> 
> On 2018-03-01 00:58:42 +0300, Nikita Glukhov wrote:
>> Attached 3rd version of kNN for SP-GiST.
> 
> Given that this was submitted to the last v11 CF, after not being
> developed for a year, I think it's unfortunately too late for v11. As we
> should be concentrating on getting things into v11, I think it'd be
> appropriate to move this to the next CF.

I agree with Andres.  Pushing this patch to the next CF.

Regards,
-- 
-David
da...@pgmasters.net




Re: Changing the autovacuum launcher scheduling; oldest table first algorithm

2018-03-06 Thread Alvaro Herrera
Hello

I haven't read your respective patches yet, but both these threads
brought to memory a patch I proposed a few years ago that I never
completed:

https://www.postgresql.org/message-id/flat/20130124215715.GE4528%40alvh.no-ip.org

In that thread I posted a patch to implement a prioritisation scheme for
autovacuum, based on an equation which was still under discussion when
I abandoned it.  Chris Browne proposed a crazy equation to mix in both
XID age and fraction of dead tuples; probably that idea is worth
studying further.  I tried to implement that in my patch but I probably
didn't do it correctly (because, as I recall, it failed to work as
expected).  Nowadays I think we would also consider the multixact freeze
age, too.

Maybe that's worth giving a quick look in case some of the ideas there
are useful for the patches now being proposed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-03-06 Thread Amit Kapila
On Tue, Mar 6, 2018 at 4:53 AM, Andres Freund  wrote:
> Hi,
>
>> diff --git a/src/backend/executor/nodeLockRows.c 
>> b/src/backend/executor/nodeLockRows.c
>> index 7961b4be6a..b07b7092de 100644
>> --- a/src/backend/executor/nodeLockRows.c
>> +++ b/src/backend/executor/nodeLockRows.c
>> @@ -218,6 +218,11 @@ lnext:
>>   ereport(ERROR,
>>   
>> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>>errmsg("could not 
>> serialize access due to concurrent update")));
>> + if 
>> (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid
>> + ereport(ERROR,
>> + 
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> +  errmsg("tuple to be 
>> locked was already moved to another partition due to concurrent update")));
>> +
>
> Why are we using ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE rather than
> ERRCODE_T_R_SERIALIZATION_FAILURE?  A lot of frameworks have builtin
> logic to retry serialization failures, and this kind of thing is going
> to resolved by retrying, no?
>

I think it depends, in some cases retry can help in deleting the
required tuple, but in other cases like when the user tries to perform
delete on a particular partition table, it won't be successful as the
tuple would have been moved.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2018-03-06 Thread David Steele
Hi Jing,

On 3/1/18 2:09 PM, Tom Lane wrote:
> Jing Wang  writes:
>> [ support_CURRENT_DATABASE_keyword_v4.7.patch ]
> 
> TBH, I think we should reject this patch.  While it's not huge,
> it's not trivial either, and I find the grammar changes rather ugly.
> The argument for using the feature to fix pg_dump issues has evaporated,
> but I don't see anything in the discussion suggesting that people see
> a need for it beyond that.
> 
> I particularly object to inventing a CURRENT_DATABASE parameterless
> function.  That's encroaching on user namespace to no purpose whatever,
> as we already have a perfectly good regular function for that.
> 
> Also, from a user standpoint, turning CURRENT_DATABASE into a fully
> reserved word seems like a bad idea.  If nothing else, that breaks
> queries that are relying on the existing current_database() function.
> The parallel to CURRENT_ROLE is not very good, because there at least
> we can point to the SQL spec and say it's reserved according to the
> standard.  CURRENT_DATABASE has no such excuse.

Based on Tom's feedback, and hearing no opinions to the contrary, I have
marked this patch Rejected.

Regards,
-- 
-David
da...@pgmasters.net



Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-06 Thread Jeevan Chalke
On Tue, Mar 6, 2018 at 4:59 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Hi Jeevan,
> I am back reviewing this. Here are some comments.
>
> @@ -1415,7 +1413,8 @@ add_paths_to_append_rel(PlannerInfo *root,
> RelOptInfo *rel,
>   * the unparameterized Append path we are constructing for the
> parent.
>   * If not, there's no workable unparameterized path.
>   */
> -if (childrel->cheapest_total_path->param_info == NULL)
> +if (childrel->pathlist != NIL &&
> +childrel->cheapest_total_path->param_info == NULL)
>  accumulate_append_subpath(childrel->cheapest_total_path,
>, NULL);
>  else
> @@ -1683,6 +1682,13 @@ add_paths_to_append_rel(PlannerInfo *root,
> RelOptInfo *rel,
>  RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
>  Path   *subpath;
>
> +if (childrel->pathlist == NIL)
> +{
> +/* failed to make a suitable path for this child */
> +subpaths_valid = false;
> +break;
> +}
> +
> When can childrel->pathlist be NIL?
>
> diff --git a/src/backend/optimizer/plan/createplan.c
> b/src/backend/optimizer/plan/createplan.c
> index 9ae1bf3..f90626c 100644
> --- a/src/backend/optimizer/plan/createplan.c
> +++ b/src/backend/optimizer/plan/createplan.c
> @@ -1670,7 +1670,15 @@ create_sort_plan(PlannerInfo *root, SortPath
> *best_path, int flags)
>  subplan = create_plan_recurse(root, best_path->subpath,
>flags | CP_SMALL_TLIST);
>
> -plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
> NULL);
> +/*
> + * In find_ec_member_for_tle(), child EC members are ignored if they
> don't
> + * belong to the given relids. Thus, if this sort path is based on a
> child
> + * relation, we must pass the relids of it. Otherwise, we will end-up
> into
> + * an error requiring pathkey item.
> + */
> +plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
> +   IS_OTHER_REL(best_path->subpath->parent)
> ?
> +   best_path->path.parent->relids : NULL);
>
>  copy_generic_path_info(>plan, (Path *) best_path);
>
> Please separate this small adjustment in a patch of its own, with some
> explanation of why we need it i.e. now this function can see SortPaths from
> child (other) relations.
>
> +if (child_data)
> +{
> +/* Must be other rel as all child relations are marked OTHER_RELs
> */
> +Assert(IS_OTHER_REL(input_rel));
>
> I think we should check IS_OTHER_REL() and Assert(child_data). That way we
> know
> that the code in the if block is executed for OTHER relation.
>
> -if ((root->hasHavingQual || parse->groupingSets) &&
> +if (!child_data && (root->hasHavingQual || parse->groupingSets) &&
>
> Degenerate grouping will never see child relations, so instead of checking
> for
> child_data, Assert (!IS_OTHER_REL()) inside this block. Add a comment there
> explaining the assertion.
>
> + *
> + * If we are performing grouping for a child relation, fetch can_sort
> from
> + * the child_data to avoid re-calculating same.
>   */
> -can_sort = (gd && gd->rollups != NIL)
> -|| grouping_is_sortable(parse->groupClause);
> +can_sort = child_data ? child_data->can_sort : ((gd &&
> gd->rollups != NIL) ||
> +
> grouping_is_sortable(parse->groupClause));
>
> Instead of adding a conditional here, we can compute these values before
> create_grouping_paths() is called from grouping_planner() and then pass
> them
> down to try_partitionwise_grouping(). I have attached a patch which
> refactors
> this code. Please see if this refactoring is useful. In the attached
> patch, I
> have handled can_sort, can_hash and partial aggregation costs. More on the
> last
> component below.
>

Changes look good to me and refactoring will be useful for partitionwise
patches.

However, will it be good if we add agg_costs into the GroupPathExtraData
too?
Also can we pass this to the add_partial_paths_to_grouping_rel() and
add_paths_to_grouping_rel() to avoid passing can_sort, can_hash and costs
related details individually to them?


>
>
>  /*
>   * Figure out whether a PartialAggregate/Finalize Aggregate execution
> @@ -3740,10 +3789,8 @@ create_grouping_paths(PlannerInfo *root,
>   * partial paths for partially_grouped_rel; that way, later code can
>   * easily consider both parallel and non-parallel approaches to
> grouping.
>   */
> -if (try_parallel_aggregation)
> +if (!child_data && !(agg_costs->hasNonPartial ||
> agg_costs->hasNonSerial))
>  {
> -PathTarget *partial_grouping_target;
> -
> [... clipped ...]
> +get_agg_clause_costs(root, havingQual,
>   AGGSPLIT_FINAL_DESERIAL,
> - _final_costs);

Re: Re: [HACKERS] Subscription code improvements

2018-03-06 Thread David Steele
Hi Masahiko,

On 1/30/18 5:00 AM, Masahiko Sawada wrote:
> On Fri, Jan 26, 2018 at 11:41 AM, Peter Eisentraut
>  wrote:
>> On 1/24/18 02:33, Masahiko Sawada wrote:
>>> Thank you for notification. Since it seems to me that no one is
>>> interested in this patch, it would be better to close out this patch.
>>> This patch is a refactoring patch but subscription code seems to work
>>> fine so far. If a problem appears around subscriptions, I might
>>> propose it again.

This patch is still in the Waiting for Author state but it looks like
you intended to close it.  Should I do so now?

Thanks,
-- 
-David
da...@pgmasters.net



Re: [PATCH] Atomic pgrename on Windows

2018-03-06 Thread Alexander Korotkov
On Tue, Mar 6, 2018 at 5:11 PM, David Steele  wrote:

> On 3/6/18 9:06 AM, Alexander Korotkov wrote:
> >
> > On Tue, Mar 6, 2018 at 5:04 PM, David Steele  > > wrote:
> >
> > On 1/20/18 10:13 AM, Magnus Hagander wrote:
> > >
> > > Unlinking it first seems dangerous, as pointed out by Andres.
> > >
> > > What about first trying ReplaceFile() and then if it fails with
> "target
> > > doesn't exist", then call MoveFileEx().
> > >
> > > Or the other way around -- try MoveFileEx() first since that seems
> to
> > > work most of the time today (if it mostly broke we'd be in trouble
> > > already), and if it fails with a sharing violation, try
> ReplaceFile()?
> > > And perhaps end up doing it something similar to what we do with
> shared
> > > memory which is just to loop over it and try  each a couple of
> time,
> > > before giving up and failing?
> >
> > This patch was mistakenly left as Needs Review during the last
> > commitfest but it's pretty clear that a new patch is required.
> >
> > OK!  No objections against marking this patch RWF.
>
> Hmmm, I just noticed this categorized as a bug.  I thought it was a
> refactor.
>

Yes, that's naturally a bug.  Not very critical though.

Even so, it looks like the approach needs a rethink so better to wait
> for that.
>

In this thread we've found at least two possible approaches to fix this
bug.  But both of them need to be implemented and tested.


> Marked Returned with Feedback.
>

OK!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [PATCH] Atomic pgrename on Windows

2018-03-06 Thread David Steele
On 3/6/18 9:06 AM, Alexander Korotkov wrote:
> 
> On Tue, Mar 6, 2018 at 5:04 PM, David Steele  > wrote:
> 
> On 1/20/18 10:13 AM, Magnus Hagander wrote:
> >
> > Unlinking it first seems dangerous, as pointed out by Andres.
> >
> > What about first trying ReplaceFile() and then if it fails with "target
> > doesn't exist", then call MoveFileEx().
> >
> > Or the other way around -- try MoveFileEx() first since that seems to
> > work most of the time today (if it mostly broke we'd be in trouble
> > already), and if it fails with a sharing violation, try ReplaceFile()?
> > And perhaps end up doing it something similar to what we do with shared
> > memory which is just to loop over it and try  each a couple of time,
> > before giving up and failing?
> 
> This patch was mistakenly left as Needs Review during the last
> commitfest but it's pretty clear that a new patch is required.
> 
> OK!  No objections against marking this patch RWF.

Hmmm, I just noticed this categorized as a bug.  I thought it was a
refactor.

Even so, it looks like the approach needs a rethink so better to wait
for that.

Marked Returned with Feedback.

Thanks,
-- 
-David
da...@pgmasters.net



Re: JIT compiling with LLVM v11

2018-03-06 Thread Peter Eisentraut
On 3/6/18 04:39, Andres Freund wrote:
> I did, and reproduced. Turned out I just missed the error in the above
> test.
> 
> The bug was caused by one ifdef in get_LifetimeEnd() being wrong
> (function is is overloaded starting in 5 rather than 4). The comment
> above it even had it right...

OK, it's fixed for me now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Re: [PATCH] Atomic pgrename on Windows

2018-03-06 Thread Alexander Korotkov
Hi, David!

On Tue, Mar 6, 2018 at 5:04 PM, David Steele  wrote:

> On 1/20/18 10:13 AM, Magnus Hagander wrote:
> >
> > On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier
> > > wrote:
> >
> > On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
> > >
> wrote:
> > > Attached patch atomic-pgrename-windows-1.patch fixes this
> problem.  It
> > > appears to be possible to atomically replace file on Windows –
> ReplaceFile()
> > > does that.  ReplaceFiles() requires target file to exist, this is
> why we
> > > still need to call MoveFileEx() when it doesn't exist.
> >
> > Do you think that it could be safer to unlink the target file first
> > with pgunlink()? This way you make sure that the target file is
> > removed and not locked. This change makes me worrying about the
> > introduction of more race conditions.
> >
> > Unlinking it first seems dangerous, as pointed out by Andres.
> >
> > What about first trying ReplaceFile() and then if it fails with "target
> > doesn't exist", then call MoveFileEx().
> >
> > Or the other way around -- try MoveFileEx() first since that seems to
> > work most of the time today (if it mostly broke we'd be in trouble
> > already), and if it fails with a sharing violation, try ReplaceFile()?
> > And perhaps end up doing it something similar to what we do with shared
> > memory which is just to loop over it and try  each a couple of time,
> > before giving up and failing?
>
> This patch was mistakenly left as Needs Review during the last
> commitfest but it's pretty clear that a new patch is required.
>

OK!  No objections against marking this patch RWF.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Contention preventing locking

2018-03-06 Thread Amit Kapila
On Mon, Mar 5, 2018 at 1:26 PM, Konstantin Knizhnik
 wrote:
>
> On 03.03.2018 16:44, Amit Kapila wrote:
>>
>> On Thu, Mar 1, 2018 at 1:22 PM, Konstantin Knizhnik
>>  wrote:
>>>
>>> On 28.02.2018 16:32, Amit Kapila wrote:

 On Mon, Feb 26, 2018 at 8:26 PM, Konstantin Knizhnik
  wrote:
>>>
>>>
>>> Yes, but two notices:
>>> 1. Tuple lock is used inside heap_* functions. But not in
>>> EvalPlanQualFetch
>>> where transaction lock is also used.
>>> 2. Tuple lock is hold until the end of update, not until commit of the
>>> transaction. So other transaction can receive conrol before this
>>> transaction
>>> is completed. And contention still takes place.
>>> Contention is reduced and performance is increased only if locks (either
>>> tuple lock, either xid lock) are hold until the end of transaction.
>>> Unfortunately it may lead to deadlock.
>>>
>>> My last attempt to reduce contention was to replace shared lock with
>>> exclusive in XactLockTableWait and removing unlock from this function. So
>>> only one transaction can get xact lock and will will hold it until the
>>> end
>>> of transaction. Also tuple lock seems to be not needed in this case. It
>>> shows better performance on pgrw test but on YCSB benchmark with workload
>>> A
>>> (50% of updates) performance was even worser than with vanilla postgres.
>>> But
>>> was is wost of all - there are deadlocks in pgbench tests.
>>>
 I think in this whole process backends may need to wait multiple times
 either on tuple lock or xact lock.  It seems the reason for these
 waits is that we immediately release the tuple lock (acquired by
 heap_acquire_tuplock) once the transaction on which we were waiting is
 finished.  AFAICU, the reason for releasing the tuple lock immediately
 instead of at end of the transaction is that we don't want to
 accumulate too many locks as that can lead to the unbounded use of
 shared memory.  How about if we release the tuple lock at end of the
 transaction unless the transaction acquires more than a certain
 threshold (say 10 or 50) of such locks in which case we will fall back
 to current strategy?

>>> Certainly, I have tested such version. Unfortunately it doesn't help.
>>> Tuple
>>> lock is using tuple TID. But once transaction has made the update, new
>>> version of tuple will be produced with different TID and all new
>>> transactions will see this version, so them will not notice this lock at
>>> all.
>>>
>> Sure, but out of all new transaction again the only one transaction
>> will allow to update it and among new waiters, only one should get
>> access to it.  The situation should be better than when all the
>> waiters attempt to lock and update the tuple with same CTID.
>>
>
> I do not argue against necessity of tuple lock.
> It certainly helps to reduce contention... But still is not able to
> completely eliminate the problem (prevent significant performance
> degradation with increasing number of competing transactions).
>

Yeah, that is what I thought should happen.  I think if we have that
kind of solution, it will perform much better with zheap [1] where
TIDs won't change in many of the cases.

> Better
> results can be achieved with:
> a) Replacing TID lock with PK lock and holding this lock till the end of
> transaction
> b) Chaining xact locks.
>
> I failed to make second approach work, it still suffers from deadlocks.
> I wonder if approach with PK locks can be considered as perspective?
> Should I spent some more time to make it more reliable (support different PK
> types, handle cases of non-hot updates and tables without PK,,...)?
> Or it is dead-end solution in any case?
>

I don't know, maybe you can try to completely sketch out the solution
and see how much impact it has.  If it can be done with the reasonable
effort, then maybe it is worth pursuing.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BYtM5vxzSM2NZm%2BpC37MCwyvtkmJrO_yRBQeZDp9Wa2w%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Re: [PATCH] Atomic pgrename on Windows

2018-03-06 Thread David Steele
Hi Alexander,

On 1/20/18 10:13 AM, Magnus Hagander wrote:
> 
> On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier
> > wrote:
> 
> On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
> > wrote:
> > Attached patch atomic-pgrename-windows-1.patch fixes this problem.  It
> > appears to be possible to atomically replace file on Windows – 
> ReplaceFile()
> > does that.  ReplaceFiles() requires target file to exist, this is why we
> > still need to call MoveFileEx() when it doesn't exist.
> 
> Do you think that it could be safer to unlink the target file first
> with pgunlink()? This way you make sure that the target file is
> removed and not locked. This change makes me worrying about the
> introduction of more race conditions.
> 
> Unlinking it first seems dangerous, as pointed out by Andres.
> 
> What about first trying ReplaceFile() and then if it fails with "target
> doesn't exist", then call MoveFileEx().
> 
> Or the other way around -- try MoveFileEx() first since that seems to
> work most of the time today (if it mostly broke we'd be in trouble
> already), and if it fails with a sharing violation, try ReplaceFile()?
> And perhaps end up doing it something similar to what we do with shared
> memory which is just to loop over it and try  each a couple of time,
> before giving up and failing? 

This patch was mistakenly left as Needs Review during the last
commitfest but it's pretty clear that a new patch is required.

This certainly sounds like a non-trivial change.  Perhaps it should be
submitted for PG12?

Thanks,
-- 
-David
da...@pgmasters.net



Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2018-03-06 Thread Arthur Zakirov
On Tue, Mar 06, 2018 at 08:09:50PM +0900, Etsuro Fujita wrote:
> Agreed.  I added a comment to that function.  I think that that comment in
> combination with changes to the FDW docs in the patch would help FDW authors
> understand why that is needed.  Please find attached an updated version of
> the patch.

Thank you.

All tests pass, the documentation builds. There was the suggestion [1]
of different approach. But the patch fix the issue in much more simple
way.

Marked as "Ready for Commiter".


1 - 
https://www.postgresql.org/message-id/20171005.200631.134118679.horiguchi.kyotaro%40lab.ntt.co.jp

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: General purpose hashing func in pgbench

2018-03-06 Thread Ildar Musin

Hello Teodor,

Thank you for reviewing this patch.

On 06.03.2018 15:53, Teodor Sigaev wrote:

Patch applies, compiles, pgbench & global "make check" ok, doc
built ok.


Agree.

If I understand upthread correctly, implementation of Murmur hash
algorithm based on Austin Appleby work
https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cpp

If so, I have notice and objections:

1) Seems, it's good idea to add credits to Austin Appleby to
comments.



Sounds fair, I'll send an updated version soon.


2) Reference implementaion directly says (link above): // 2. It will
not produce the same results on little-endian and big-endian //
machines.

I don't think that is good thing for testing and benchmarking for
several reasons: it could produce different data collection,
different selects, different distribution.

3) Again, from comments of reference implementation: // Note - This
code makes a few assumptions about how your machine behaves - // 1.
We can read a 4-byte value from any address without crashing

It's not true for all supported platforms. Any box with strict
aligment will SIGBUSed here.



I think that both points refer to the fact that original algorithm
accepts a byte string as an input, slices it up by 8 bytes and form
unsigned int values from it. In that case the order of bytes in the
input string does matter since it may result in different integers on
different architectures. And it is also fair requirement for a byte
string to be aligned as some architectures cannot handle unaligned data.
In this patch though I slightly modified the original algorithm in a way
that it takes unsigned ints as an input (not byte string), so neither of
this points should be a problem as it seems to me. But I'll double check
it on big-endian machine with strict alignment (Sparc).

Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru



  1   2   >