Re: [HACKERS] proposal: schema variables

2018-03-11 Thread Pavel Stehule
2018-03-12 7:49 GMT+01:00 Pavel Luzanov :

> Hi,
>
> I plan to make usability and feature test review in several days.
>
> Is there any chances that it will work on replicas?
> Such possibility is very helpful in generating reports.
> Now, LET command produces an error:
>
> ERROR:  cannot execute LET in a read-only transaction
>
>

> But if we say that variables are non-transactional ?
>

sure, it should to work. Now, I am try to solve a issues on concept level -
the LET code is based on DML code base, so probably there is check for rw
transactions. But it is useless for LET command.

Regards

Pavel


>
> -
> Pavel Luzanov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
> On 08.03.2018 21:00, Pavel Stehule wrote:
>
> Hi
>
> 2018-02-07 7:34 GMT+01:00 Pavel Stehule :
>
>> Hi
>>
>> updated patch with your changes in documentation and pg_dump (initial)
>> support
>>
>> Main issue of this patch is storage. We can reuse local buffers used for
>> temp tables. But it does allocation by 8KB and it creates temp files for
>> every object. That is too big overhead. Storing just in session memory is
>> too simple - then there should be lot of new code used, when variable will
>> be dropped.
>>
>> I have ideas how to allow work with mix of scalar and composite types -
>> so it will be next step of this prototype.
>>
>> Regards
>>
>> Pavel
>>
>
> new update - rebased, + some initial support for composite values on right
> side and custom types, arrays are supported too.
>
> omega=# CREATE VARIABLE xx AS (a int, b numeric);
> CREATE VARIABLE
> omega=# LET xx = (10, 20)::xx;
> LET
> omega=# SELECT xx;
> +-+
> |   xx|
> +-+
> | (10,20) |
> +-+
> (1 row)
>
> omega=# SELECT xx.a + xx.b;
> +--+
> | ?column? |
> +--+
> |   30 |
> +--+
> (1 row)
>
> omega=# \d xx
> schema variable "public.xx"
> ++-+
> | Column |  Type   |
> ++-+
> | a  | integer |
> | b  | numeric |
> ++-+
>
>
> Regards
>
> Pavel
>
>
>
>


Re: [HACKERS] proposal: schema variables

2018-03-11 Thread Pavel Luzanov

Hi,

I plan to make usability and feature test review in several days.

Is there any chances that it will work on replicas?
Such possibility is very helpful in generating reports.
Now, LET command produces an error:

ERROR:  cannot execute LET in a read-only transaction

But if we say that variables are non-transactional ?

-
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On 08.03.2018 21:00, Pavel Stehule wrote:

Hi

2018-02-07 7:34 GMT+01:00 Pavel Stehule >:


Hi

updated patch with your changes in documentation and pg_dump
(initial) support

Main issue of this patch is storage. We can reuse local buffers
used for temp tables. But it does allocation by 8KB and it creates
temp files for every object. That is too big overhead. Storing
just in session memory is too simple - then there should be lot of
new code used, when variable will be dropped.

I have ideas how to allow work with mix of scalar and composite
types - so it will be next step of this prototype.

Regards

Pavel


new update - rebased, + some initial support for composite values on 
right side and custom types, arrays are supported too.


omega=# CREATE VARIABLE xx AS (a int, b numeric);
CREATE VARIABLE
omega=# LET xx = (10, 20)::xx;
LET
omega=# SELECT xx;
+-+
|   xx    |
+-+
| (10,20) |
+-+
(1 row)

omega=# SELECT xx.a + xx.b;
+--+
| ?column? |
+--+
|   30 |
+--+
(1 row)

omega=# \d xx
schema variable "public.xx"
++-+
| Column |  Type   |
++-+
| a  | integer |
| b  | numeric |
++-+


Regards

Pavel






Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2018-03-11 Thread Andrey Borodin


> 12 марта 2018 г., в 1:54, Alexander Korotkov  
> написал(а):
> 
> On Wed, Mar 7, 2018 at 8:30 PM, Alvaro Herrera  
> wrote:
> I suggest to create a new function GinPredicateLockPage() that checks
> whether fast update is enabled for the index.  The current arrangement
> looks too repetitive and it seems easy to make a mistake.
> 
> BTW, should we also skip CheckForSerializableConflictIn() when
> fast update is enabled?  AFAICS, now it doesn't cause any errors or
> false positives, but makes useless load.  Is it correct?
> 
BTW to BTW. I think we should check pending list size with 
GinGetPendingListCleanupSize() here
+
+   /*
+* If fast update is enabled, we acquire a predicate lock on the entire
+* relation as fast update postpones the insertion of tuples into index
+* structure due to which we can't detect rw conflicts.
+*/
+   if (GinGetUseFastUpdate(ginstate->index))
+   PredicateLockRelation(ginstate->index, snapshot);

Because we can alter alter index set (fastupdate = off), but there still will 
be pending list.

We were discussing this with Shubham back in July, chosen some approach that 
seemed better, but I can't remember what was that...

Best regards, Andrey Borodin.


Re: Inconsistent behavior in serializable snapshot

2018-03-11 Thread Kuntal Ghosh
On Sun, Mar 11, 2018 at 7:52 PM, Kuntal Ghosh
 wrote:
> Hello hackers,
>
> While working on serializable transaction isolation, I've noticed some
> strange behavior in the first permutation mentioned in
> isolation/specs/read-only-anomaly-2.spec file.
>
> setup
> {
> CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
> INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
> }
>
> # without s3, s1 and s2 commit
> permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"
>
> Here, we can see a serial order T1 <- T2 without any conflict.
> However, if I perform "VACUUM FREEZE bank_account" after the setup
> step, s2wx throws a conflict error:
> ERROR:  could not serialize access due to read/write dependencies
> among transactions
> DETAIL:  Reason code: Canceled on identification as a pivot, during write.
> HINT:  The transaction might succeed if retried.
>
> Is this an expected behavior?
Got the answer.

In this case, when we perform VACUUM FREEZE on the table, the planner
chooses a plan with sequential scan(instead of Index Scan) to scan the
table for SELECT and UPDATE statements. Hence, we've to take relation
level SIReadLock instead of page and tuple level SIReadLock. This
causes the serialization conflict.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-03-11 Thread amul sul
On Sat, Mar 10, 2018 at 5:25 PM, Amit Kapila  wrote:
> On Fri, Mar 9, 2018 at 3:18 PM, amul sul  wrote:
>> On Thu, Mar 8, 2018 at 12:31 PM, Amit Kapila  wrote:
>>> On Thu, Mar 8, 2018 at 11:04 AM, Pavan Deolasee
>>>
 This is just one example. I am almost certain there are many such cases 
 that
 will require careful attention.

>>>
>>> Right, I think we should be able to detect and fix such cases.
>>>
>>
>> I found a couple of places (in heap_lock_updated_tuple, rewrite_heap_tuple,
>> EvalPlanQualFetch & heap_lock_updated_tuple_rec) where ItemPointerEquals is
>> use to check tuple has been updated/deleted.  With the proposed patch
>> ItemPointerEquals() will no longer work as before, we required addition check
>> for updated/deleted tuple, proposed the same in latest patch[1]. Do let me 
>> know
>> your thoughts/suggestions on this, thanks.
>>
>
> I think you have identified the places correctly.  I have few
> suggestions though.
>
> 1.
> - if (!ItemPointerEquals(&tuple->t_self, ctid))
> + if (!(ItemPointerEquals(&tuple->t_self, ctid) ||
> +   (!ItemPointerValidBlockNumber(ctid) &&
> +(ItemPointerGetOffsetNumber(&tuple->t_self) ==   /* TODO: Condn.
> should be macro? */
> + ItemPointerGetOffsetNumber(ctid)
>
> Can't we write this and similar tests as:
> ItemPointerValidBlockNumber(ctid) &&
> !ItemPointerEquals(&tuple->t_self, ctid)?  It will be bit simpler to
> understand and serve the purpose.
>

Yes, you are correct, we need not worry about offset matching -- invalid block
number check and ItemPointerEquals is more than enough to conclude the tuple has
been deleted or not.  Will change the condition accordingly in the next version.

> 2.
>
> if (mytup.t_data->t_infomask & HEAP_XMAX_INVALID ||
>   ItemPointerEquals(&mytup.t_self, &mytup.t_data->t_ctid) ||
> + !HeapTupleHeaderValidBlockNumber(mytup.t_data) ||
>   HeapTupleHeaderIsOnlyLocked(mytup.t_data))
>
> I think it is better to keep the check for
> HeapTupleHeaderValidBlockNumber earlier than ItemPointerEquals as it
> will first validate if block number is valid and then only compare the
> complete CTID.

Sure, will do that.

Thanks for the confirmation and suggestions.

Regards,
Amul



WARNING in parallel index creation.

2018-03-11 Thread Jeff Janes
If i run:

pgbench -i -s30

And then create the function:

CREATE OR REPLACE FUNCTION foobar(text)
 RETURNS text
 LANGUAGE plperl
 IMMUTABLE PARALLEL SAFE STRICT COST 1
AS $function$
  return scalar reverse($_[0]);
$function$;

Then when I create in index, I get a warning:

jjanes=# create index on pgbench_accounts (foobar(filler));
WARNING:  cannot set parameters during a parallel operation
WARNING:  cannot set parameters during a parallel operation

If I create the index again within the same session, there is no WARNING.

This only occurs if plperl.on_init is set in the postgresql.conf file.  It
doesn't seem to matter what it is set to,
even the empty string triggers the warning.

plperl.on_init=''

As far as I can tell the index is created correctly despite the warning.

Cheers,

Jeff


Re: inserts into partitioned table may cause crash

2018-03-11 Thread Etsuro Fujita

(2018/03/09 20:18), Etsuro Fujita wrote:

Here are updated patches for PG10 and HEAD.

Other changes:
* Add regression tests based on your test cases shown upthread


I added a little bit more regression tests and revised comments.  Please 
find attached an updated patch.


Best regards,
Etsuro Fujita
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 2656,2668  CopyFrom(CopyState cstate)
  			if (cstate->transition_capture != NULL)
  			{
  if (resultRelInfo->ri_TrigDesc &&
! 	(resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
! 	 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
  {
  	/*
! 	 * If there are any BEFORE or INSTEAD triggers on the
! 	 * partition, we'll have to be ready to convert their
! 	 * result back to tuplestore format.
  	 */
  	cstate->transition_capture->tcs_original_insert_tuple = NULL;
  	cstate->transition_capture->tcs_map =
--- 2656,2667 
  			if (cstate->transition_capture != NULL)
  			{
  if (resultRelInfo->ri_TrigDesc &&
! 	resultRelInfo->ri_TrigDesc->trig_insert_before_row)
  {
  	/*
! 	 * If there are any BEFORE triggers on the partition,
! 	 * we'll have to be ready to convert their result back to
! 	 * tuplestore format.
  	 */
  	cstate->transition_capture->tcs_original_insert_tuple = NULL;
  	cstate->transition_capture->tcs_map =
***
*** 2803,2814  CopyFrom(CopyState cstate)
  			 * tuples inserted by an INSERT command.
  			 */
  			processed++;
  
! 			if (saved_resultRelInfo)
! 			{
! resultRelInfo = saved_resultRelInfo;
! estate->es_result_relation_info = resultRelInfo;
! 			}
  		}
  	}
  
--- 2802,2814 
  			 * tuples inserted by an INSERT command.
  			 */
  			processed++;
+ 		}
  
! 		/* Restore the saved ResultRelInfo */
! 		if (saved_resultRelInfo)
! 		{
! 			resultRelInfo = saved_resultRelInfo;
! 			estate->es_result_relation_info = resultRelInfo;
  		}
  	}
  
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
***
*** 62,67  static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
--- 62,71 
  	 EState *estate,
  	 bool canSetTag,
  	 TupleTableSlot **returning);
+ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
+ 		EState *estate,
+ 		ResultRelInfo *targetRelInfo,
+ 		TupleTableSlot *slot);
  
  /*
   * Verify that the tuples to be produced by INSERT or UPDATE match the
***
*** 259,265  ExecInsert(ModifyTableState *mtstate,
  {
  	HeapTuple	tuple;
  	ResultRelInfo *resultRelInfo;
- 	ResultRelInfo *saved_resultRelInfo = NULL;
  	Relation	resultRelationDesc;
  	Oid			newId;
  	List	   *recheckIndexes = NIL;
--- 263,268 
***
*** 275,374  ExecInsert(ModifyTableState *mtstate,
  	 * get information on the (current) result relation
  	 */
  	resultRelInfo = estate->es_result_relation_info;
- 
- 	/* Determine the partition to heap_insert the tuple into */
- 	if (mtstate->mt_partition_dispatch_info)
- 	{
- 		int			leaf_part_index;
- 		TupleConversionMap *map;
- 
- 		/*
- 		 * Away we go ... If we end up not finding a partition after all,
- 		 * ExecFindPartition() does not return and errors out instead.
- 		 * Otherwise, the returned value is to be used as an index into arrays
- 		 * mt_partitions[] and mt_partition_tupconv_maps[] that will get us
- 		 * the ResultRelInfo and TupleConversionMap for the partition,
- 		 * respectively.
- 		 */
- 		leaf_part_index = ExecFindPartition(resultRelInfo,
- 			mtstate->mt_partition_dispatch_info,
- 			slot,
- 			estate);
- 		Assert(leaf_part_index >= 0 &&
- 			   leaf_part_index < mtstate->mt_num_partitions);
- 
- 		/*
- 		 * Save the old ResultRelInfo and switch to the one corresponding to
- 		 * the selected partition.
- 		 */
- 		saved_resultRelInfo = resultRelInfo;
- 		resultRelInfo = mtstate->mt_partitions + leaf_part_index;
- 
- 		/* We do not yet have a way to insert into a foreign partition */
- 		if (resultRelInfo->ri_FdwRoutine)
- 			ereport(ERROR,
- 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- 	 errmsg("cannot route inserted tuples to a foreign table")));
- 
- 		/* For ExecInsertIndexTuples() to work on the partition's indexes */
- 		estate->es_result_relation_info = resultRelInfo;
- 
- 		/*
- 		 * If we're capturing transition tuples, we might need to convert from
- 		 * the partition rowtype to parent rowtype.
- 		 */
- 		if (mtstate->mt_transition_capture != NULL)
- 		{
- 			if (resultRelInfo->ri_TrigDesc &&
- (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
-  resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
- 			{
- /*
-  * If there are any BEFORE or INSTEAD triggers on the
-  * partition, we'll have to be ready to convert their result
-  * back to tuplestore format.
-  */
- mtstate->mt_transition_capture->tcs_original_insert

Re: Fixes for missing schema qualifications

2018-03-11 Thread Michael Paquier
On Sat, Mar 10, 2018 at 03:13:09PM -0300, Alvaro Herrera wrote:
> ... and substring() ...

substring(A from B for C) gets parsed.
--
Michael


signature.asc
Description: PGP signature


Re: Foreign keys and partitioned tables

2018-03-11 Thread Alvaro Herrera
[ Resending an email from yesterday.  Something is going very wrong with
my outgoing mail provider :-( ]

Rebase of the prior code, on top of the improved row triggers posted
elsewhere.  I added some more tests too, and fixed a couple of small
bugs.

(This includes the patches I just posted in the row triggers patch)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From a34e786924b54d94dbf28c182aed27d0e92dba06 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 8 Mar 2018 14:01:39 -0300
Subject: [PATCH v3 1/4] add missing CommandCounterIncrement in
 StorePartitionBound

---
 src/backend/catalog/heap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index cf36ce4add..2b5377bdf2 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -3299,6 +3299,9 @@ StorePartitionBound(Relation rel, Relation parent, 
PartitionBoundSpec *bound)
heap_freetuple(newtuple);
heap_close(classRel, RowExclusiveLock);
 
+   /* Make update visible */
+   CommandCounterIncrement();
+
/*
 * The partition constraint for the default partition depends on the
 * partition bounds of every other partition, so we must invalidate the
-- 
2.11.0

>From 1165c0438c627ea214de9ee4cffa83d89b0aa485 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 8 Mar 2018 14:04:13 -0300
Subject: [PATCH v3 2/4] Add missing CommandCounterIncrement() in partitioned
 index code

---
 src/backend/catalog/pg_constraint.c | 4 
 src/backend/commands/indexcmds.c| 6 ++
 src/backend/commands/tablecmds.c| 2 --
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/pg_constraint.c 
b/src/backend/catalog/pg_constraint.c
index 731c5e4317..38fdf72877 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -18,6 +18,7 @@
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
+#include "access/xact.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
 #include "catalog/objectaccess.h"
@@ -781,6 +782,9 @@ ConstraintSetParentConstraint(Oid childConstrId, Oid 
parentConstrId)
recordDependencyOn(&depender, &referenced, DEPENDENCY_INTERNAL_AUTO);
 
heap_close(constrRel, RowExclusiveLock);
+
+   /* make update visible */
+   CommandCounterIncrement();
 }
 
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 504806b25b..9ca632865b 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1003,6 +1003,9 @@ DefineIndex(Oid relationId,
ReleaseSysCache(tup);
heap_close(pg_index, RowExclusiveLock);
heap_freetuple(newtup);
+
+   /* make update visible */
+   CommandCounterIncrement();
}
}
else
@@ -2512,5 +2515,8 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 
recordDependencyOn(&partIdx, &partitionTbl, 
DEPENDENCY_AUTO);
}
+
+   /* make our updates visible */
+   CommandCounterIncrement();
}
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 74e020bffc..7ecfbc17a0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14571,8 +14571,6 @@ ATExecAttachPartitionIdx(List **wqueue, Relation 
parentIdx, RangeVar *name)
 
pfree(attmap);
 
-   CommandCounterIncrement();
-
validatePartitionedIndex(parentIdx, parentTbl);
}
 
-- 
2.11.0

>From ee640078fa9bd7662a28058b8b0affe49cdd336f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 21 Nov 2017 15:53:11 -0300
Subject: [PATCH v3 3/4] Allow FOR EACH ROW triggers on partitioned tables

---
 src/backend/catalog/heap.c |   1 +
 src/backend/catalog/index.c|   4 +-
 src/backend/catalog/pg_constraint.c|   3 +
 src/backend/commands/tablecmds.c   |  92 +-
 src/backend/commands/trigger.c | 191 ++--
 src/backend/commands/typecmds.c|   1 +
 src/backend/tcop/utility.c |   3 +-
 src/include/catalog/indexing.h |   2 +
 src/include/catalog/pg_constraint.h|  39 ++--
 src/include/catalog/pg_constraint_fn.h |   1 +
 src/include/commands/trigger.h |   4 +-
 src/test/regress/expected/triggers.out | 277 ++---
 src/test/regress/input/constraints.source  |  16 ++
 src/test/regress/output/constraints.source |  26 +++
 src/test/regress/sql/triggers.sql  | 178 --
 15 files changed, 763 insertions(+), 75 deletions(-)


Re: FOR EACH ROW triggers on partitioned tables

2018-03-11 Thread Thomas Munro
On Fri, Mar 9, 2018 at 7:06 AM, Alvaro Herrera  wrote:
> Thomas Munro wrote:
>> +create trigger failed after update on parted_trig
>> +  referencing old table as old_table
>> +  for each statement execute procedure trigger_nothing();
>>
>> It doesn't fail as you apparently expected.  Perhaps it was supposed
>> to be "for each row" so you could hit your new error with
>> errdetail("Triggers on partitioned tables cannot have transition
>> tables.")?
>
> You're absolutely right.  Fixed in the attached version.

+create trigger failed after update on parted_trig
+  referencing old table as old_table
+  for each row execute procedure trigger_nothing();
+ERROR:  "parted_trig" is a partitioned table
+DETAIL:  Triggers on partitioned tables cannot have transition tables.

I think this should probably say "row-level".  Statement-level
triggers on partitioned tables can have transition tables.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-03-11 Thread Alvaro Herrera
On 0002:

In terms of docs, I think it's better not to have anything user-facing
in the README.  Consider that users are going to be reading the HTML
docs only, and many of them may not have the README available at all.
So anything that could be useful to users must be in the XML docs only;
keep in the README only stuff that would be useful to a developer (a
section such as "not yet implemented" would belong there, for example).
Stuff that's in the XML should not appear in the README (because DRY).
For the same reason, having the XML docs end with "see the README" seems
a bad idea to me.

UPDATE_RESULT() is a bit weird to me.  I think after staring at it for a
while it looks okay, but why was it such a shock?  In 0002 it's only
used in one place so I would suggest to have it expanded, but I see you
use it in 0003 also, three times I think.  IMO for clarity it seems
better to just have the expanded code rather than the macro.

find_ext_attnums (and perhaps other places) have references to renamed
columns, "starelid" and others.  Also there is this comment:
/* Prepare to scan pg_statistic_ext for entries having indrelid = this rel. */
which is outdated since it uses syscache, not a scan.  Just remove the
comment ...

Please add a comment on what does build_attnums() do.

pg_stats_ext_mcvlist_items is odd.  I suppose you made it take oid to
avoid having to deal with a malicious bytea?  The query in docs is
pretty odd-looking,

SELECT * FROM pg_mcv_list_items((SELECT oid FROM pg_statistic_ext WHERE stxname 
= 'stts2'));
If we keep the function as is, I would suggest to use LATERAL instead,
  SELECT m.* FROM pg_statistic_ext, pg_mcv_list_items(oid) m WHERE stxname = 
'stts2';
but seems like it should be more like this instead:
  SELECT m.* FROM pg_statistic_ext, pg_mcv_list_items(stxmcv) m WHERE stxname = 
'stts2';
and not have the output formatting function load the data again from the
table.  It'd be a bit like a type-specific UNNEST.

There are a few elog(ERROR) messages.  The vast majority seem to be just
internal messages so they're okay, but there is one that should be
ereport:

+   if (total_length > (1024 * 1024))
+   elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length);
I think we have some precedent for better wording, such as
errmsg("index row size %zu exceeds maximum %zu for index \"%s\""
so I would say
   errmsg("serialized MCV list size %zu exceedes maximum %zu" )
though I wonder when is this error thrown -- if this is detected during
analyze for example, what happens?

There is this FIXME:
+* FIXME Should skip already estimated clauses (using the estimatedclauses
+* bitmap).
Are you planning on implementing this before commit?

There are other FIXMEs also.  This in particular caught my attention:

+   /* merge the bitmap into the existing one */
+   for (i = 0; i < mcvlist->nitems; i++)
+   {
+   /*
+* Merge the result into the bitmap (Min for AND, Max for OR).
+*
+* FIXME this does not decrease the number of matches
+*/
+   UPDATE_RESULT(matches[i], or_matches[i], is_or);
+   }

We come back to UPDATE_RESULT again ... and note how the comment makes
no sense unless you know what UPDATE_RESULT does internally.  This is
one more indication that the macro is not a great thing to have.  Let's
lose it.  But while at it, what to do about the FIXME?

You also have this
+   /* XXX syscache contains OIDs of deleted stats (not invalidated) */
+   if (!HeapTupleIsValid(htup))
+   return NULL;
but what does it mean?  Is it here to cover for some unknown bug?
Should we maybe not have this at all?

Another XXX comment says
+ * XXX All the memory is allocated in a single chunk, so that the caller
+ * can simply pfree the return value to release all of it.

but I would say just remove the XXX and leave the rest of the comment.

There is another XXX comment that says "this is useless", and I agree.
Just take it all out ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Fixes for missing schema qualifications

2018-03-11 Thread Alvaro Herrera
Noah Misch wrote:
> On Fri, Mar 09, 2018 at 04:55:38PM +0900, Michael Paquier wrote:
> > --- a/src/backend/catalog/information_schema.sql
> > +++ b/src/backend/catalog/information_schema.sql
> > @@ -186,7 +186,7 @@ CREATE FUNCTION _pg_interval_type(typid oid, mod int4) 
> > RETURNS text
> >  AS
> >  $$SELECT
> >CASE WHEN $1 IN (1186) /* interval */
> > -   THEN upper(substring(format_type($1, $2) from 'interval[()0-9]* 
> > #"%#"' for '#'))
> > +   THEN pg_catalog.upper(substring(pg_catalog.format_type($1, $2) 
> > from 'interval[()0-9]* #"%#"' for '#'))
> > ELSE null
> >END$$;

> This qualifies some functions, but it leaves plenty of unqualified operators.

... and substring() ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FOR EACH ROW triggers on partitioned tables

2018-03-11 Thread Alvaro Herrera
Peter Eisentraut wrote:
> On 3/7/18 20:57, Alvaro Herrera wrote:
> > So, unless someone has a brilliant idea on how to construct a column
> > mapping from partitioned table to partition, I'm going back to the
> > design I was proposing earlier, ie., creating individual pg_trigger rows
> > for each partition that are essentially adjusted copies of the ones for
> > the partitioned table.
> 
> Yes, that seems easiest.
> 
> The idea of having only one pg_trigger entry was derived from the
> assumption that we wouldn't need the other ones for anything.  But if
> that doesn't apply, then it's better to just go with the straightforward
> way instead of bending the single-pg_trigger way to our will.

I think you changed the commitfest status to "waiting on author" after
posting this comment, but I had already posted an updated version which
addressed this problem.  I have changed it back to needs-review.

thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [bug fix] pg_rewind creates corrupt WAL files, and the standby cannot catch up the primary

2018-03-11 Thread Michael Paquier
On Sun, Mar 11, 2018 at 09:56:33AM -0500, Justin Pryzby wrote:
> Hopefully a set of CHUNKS not JUNKS ?

Yes. (laugh)
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2018-03-11 Thread Alexander Korotkov
On Wed, Mar 7, 2018 at 8:30 PM, Alvaro Herrera 
wrote:

> I suggest to create a new function GinPredicateLockPage() that checks
> whether fast update is enabled for the index.  The current arrangement
> looks too repetitive and it seems easy to make a mistake.
>

BTW, should we also skip CheckForSerializableConflictIn() when
fast update is enabled?  AFAICS, now it doesn't cause any errors or
false positives, but makes useless load.  Is it correct?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Using JIT for VACUUM, COPY, ANALYZE

2018-03-11 Thread Andres Freund
On 2018-03-11 12:38:54 -0700, Andres Freund wrote:
> 
> 
> On March 11, 2018 12:31:33 PM PDT, Pavel Stehule  
> wrote:
> >Hi
> >
> >Today, these task can be CPU limited . Do you think, so JIT can be used
> >there too?
> 
> Copy definitely, with the others I'm much more doubtful. Don't see anything 
> around their bottlenecks that could be removed by JITing. Haven't looked at 
> profiles of them recently however.

To expand a bit on that: JITing isn't magic - it needs to be able to
remove overhead to be beneficial. That can be removing very frequently
hit branches, indirect jumps and memory accesses. For e.g. expression
evaluation and tuple deforming it's not hard to see how that can be
done, yielding nice speedups. But for vacuuming where a lot of overhead
is in hot pruning and the like there's very little that can be removed.

Greetings,

Andres Freund



Re: Parallel index creation does not properly cleanup after error

2018-03-11 Thread Peter Geoghegan
On Sun, Mar 11, 2018 at 3:22 AM, David Rowley
 wrote:
> Due to the failure during the index build, it appears that the
> PG_TRY/PG_CATCH block in reindex_relation() causes the reindex_index()
> to abort and jump out to the catch block. Here there's a call to
> ResetReindexPending(), which complains as we're still left in parallel
> mode from the aborted _bt_begin_parallel() call which has called
> EnterParallelMode(), but not managed to make it all the way to
> _bt_end_parallel() (called from btbuild()), where ExitParallelMode()
> is normally called.
>
> Subsequent attempts to refresh the materialized view result in an
> Assert failure in list_member_oid()

Thanks for the report.

> I've not debugged that, but I assume it's because
> pendingReindexedIndexes is left as a non-empty list but has had its
> memory context obliterated due to the previous query having ended.

It's not really related to memory lifetime, so much as a corruption of
the state that tracks reindexed indexes within a backend. This is of
course due to that "cannot modify reindex state during a parallel
operation" error you saw.

> The comment in the following fragment is not well honored by the
> ResetReindexPending() since it does not clear the list if there's an
> error.

> A perhaps simple fix would be just to have ResetReindexPending() only
> reset the list to NIL again and not try to raise any error.

I noticed a very similar bug in ResetReindexProcessing() just before
parallel CREATE INDEX was committed. The fix there was simply not
throwing a "can't happen" error. I agree that the same fix should be
used here. It's not worth enforcing !IsInParallelMode() in the reset
functions; just enforcing !IsInParallelMode() in the set functions is
sufficient. Attached patch does this.

-- 
Peter Geoghegan
From 79f6708165c83c39b3e1bf785539bee84a28f650 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Sun, 11 Mar 2018 12:18:25 -0700
Subject: [PATCH] Fix corruption of backend REINDEX processing state.

When parallel index builds within the PG_TRY/PG_CATCH block in
reindex_relation() raised any error, reindex_index() jumped out to the
reindex_relation() catch block.  ResetReindexPending() was called from
there, without being prepared for the possibility that the backend is
still in parallel mode due to being in an error/cleanup path.  By
raising an error before the backend's state could be reset, the state
could never get reset.  Reindexing could continually fail within an
affected backend.

To fix, make ResetReindexPending() take the same approach as
ResetReindexProcessing(), and simply don't enforce that we cannot be in
parallel mode.

Author: Peter Geoghegan
Reported-By: David Rowley
Discussion: https://postgr.es/m/CAKJS1f91kq1wfYR8rnRRfKtxyhU2woEA+=whd640uxmyu+o...@mail.gmail.com
---
 src/backend/catalog/index.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 564f206..0da37d9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4054,8 +4054,7 @@ RemoveReindexPending(Oid indexOid)
 static void
 ResetReindexPending(void)
 {
-	if (IsInParallelMode())
-		elog(ERROR, "cannot modify reindex state during a parallel operation");
+	/* This may be called in leader error path */
 	pendingReindexedIndexes = NIL;
 }
 
-- 
2.7.4



Re: Using JIT for VACUUM, COPY, ANALYZE

2018-03-11 Thread Andres Freund


On March 11, 2018 12:31:33 PM PDT, Pavel Stehule  
wrote:
>Hi
>
>Today, these task can be CPU limited . Do you think, so JIT can be used
>there too?

Copy definitely, with the others I'm much more doubtful. Don't see anything 
around their bottlenecks that could be removed by JITing. Haven't looked at 
profiles of them recently however.


Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Sample values for pg_stat_statements

2018-03-11 Thread Greg Stark
I've often wanted something similar. But I've struggled to come up
with a good way to decide which parameters to keep. And as someone
mentioned, there's the question of how to deal with very large
constants.

The other day I was poking around with pg_stat_statements and jsonlog
and I thought of another way to tackle this problem that I think may
be a better approach. If we logged the queryid in slow error messages
and slow query logs that would let you deal with larger data and also
keep more history without burdening the live system.

If the queryid was a column in the CSV logs (or a field in json logs,
etc) then you people who load their logs into a database for handling
would be able to index that column and quickly look up example
queries, sort them by time taken, or analyze them in other ways. Using
jsonlog you could do the same thing in Elasticsearch/Kibana.

I tried to hack this together quickly but it's actually a bit of a
pain for mundane reasons. Our current slow query logs are actually
slow *statement* logs which makes it a bit of an impedance mismatch
with pg_stat_statements which works per planned query. I think the
solution to this would be to drop the slow statement logs and have
pg_stat_statements log slow queries directly in the ExecutorEnd hook.

It would be nice to have the queryid be accessible for other logs as
well like debug_query_str is. I'm not sure the right way to do that
though. I tried just peeking in ActivePortal->plannedstmt but that's
not always set (and in particular is not set at the point slow
statements are logged). And it was causing crashes, presumably
ActivePortal is left uninitialized in some background worker that
doesn't need it?



Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-03-11 Thread Dmitry Dolgov
> On 26 February 2018 at 11:03, Ashutosh Bapat 
>  wrote:
> On Fri, Feb 23, 2018 at 7:35 PM, Robert Haas  wrote:
>> On Fri, Feb 16, 2018 at 12:14 AM, Ashutosh Bapat
>>  wrote:
>>> Appreciate you taking time for review.
>>>
>>> PFA updated version.
>>
>> Committed 0001.
>
> Thanks.
>
> Here's patchset rebased on the latest head. I have fixed all the
> crashes and bugs reported till now.

Hi,

I've stumbled upon this patch and noticed, that it's not compiled anymore after
2af28e6033, where `parttypcoll` was renamed to `partcollation`.



Using JIT for VACUUM, COPY, ANALYZE

2018-03-11 Thread Pavel Stehule
Hi

Today, these task can be CPU limited . Do you think, so JIT can be used
there too?

Regards

Pavel


Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

2018-03-11 Thread Alexander Korotkov
On Sat, Mar 3, 2018 at 2:53 PM, Amit Kapila  wrote:

> On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro
> >  If that is indeed a race, could it be fixed by
> > calling PredicateLockPageSplit() at the start of _hash_splitbucket()
> > instead?
> >
>
> Yes, but I think it would be better if we call this once we are sure
> that at least one tuple from the old bucket has been transferred
> (consider if all tuples in the old bucket are dead).


Is it really fair?  For example, predicate lock can be held by session
which queried some key, but didn't find any corresponding tuple.
If we imagine this key should be in new bucket while all existing
tuples would be left in old bucket.  As I get, in this case no locks
would be transferred since no tuples were moved to the new bucket.
So, further insertion to the new bucket wouldn't conflict with session,
which looked for non-existing key, while it should.  Do it make sense?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

2018-03-11 Thread Alexander Korotkov
On Fri, Mar 2, 2018 at 6:57 AM, Thomas Munro 
wrote:

> My thought experiments about pseudo-pages and avoiding the split stuff
> were not intended to get the patch kicked out.  I thought for a while
> that hash indexes were a special case and could benefit from
> dispensing with those trickier problems.  Upon further reflection, for
> interesting size hash indexes pure hash value predicate tags wouldn't
> be much better.  Furthermore, if we do decide we want to use using x %
> max_predicate_locks_per_relation to avoid having to escalate to
> relation predicate locks at the cost of slightly higher collision rate
> then we should consider that for the whole system (including heap page
> predicate locking), not just hash indexes.  Please consider those
> ideas parked for now.


OK.  While our potential pseudo-pages are identified as
"hash_value % some_constant_modulus", real bucket pages are very roughly
identified as "hash_value % number_of_index_pages".  So, page number is
adoptive to index size, despite it costs us handling page split. In the
same way,
locking in other index access methods is adoptive to an index size, so
that should be considered as useful feature which should be present in hash
index
as well.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [GSOC 18] Performance Farm Project——Initialization Project

2018-03-11 Thread Dave Page
Hi

Maybe I’m missing something (I’ve been offline a lot recently for
unavoidable reasons), but the perf farm project already has a Django
backend initialised and configured to work with community auth, on
community infrastructure.

https://git.postgresql.org/gitweb/?p=pgperffarm.git;a=summary

On Sunday, March 11, 2018, Hongyuan Ma  wrote:

> Hello, mark
> I initialized a Django project and imported the Django REST Framework. Its
> github address is: https://github.com/PGPerfFarm/server-code
> I created some model classes and also wrote scripts in the dbtools folder
> to import simulation data for development. I am hesitant to use admin or
> xadmin as the administrator's backend for the site (I prefer to use xadmin).
>
> I also initialized the website's front-end application. Here is its
> address: https://github.com/PGPerfFarm/front-end-code.git
> I wrote the header component as shown:
>
> I hope this effect can enhance the future user experience:).
> This application uses vue.js and element-ui, but if you insist on using
> other js libraries or not using js, please let me know. I will empty this
> project and then rewrite it as you wish.
>
> My next step is to determine the necessary api interface and the
> corresponding components of the front-end application. Then implement them
> one by one.
> Finally, as you can see, I created an organization named PGPerfFarm on
> github without permission. I sent you an invitation letter, and after you
> join, I am willing to hand over the administrator of the organization to
> you.
>
>
> Regards,
> Hongyuan Ma (cs_maleica...@163.com)
>
>
>
>


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: JIT compiling with LLVM v11

2018-03-11 Thread Andres Freund
On 2018-03-11 13:19:57 -0400, Peter Eisentraut wrote:
> On 3/9/18 15:56, Andres Freund wrote:
> > I think that's largely that unnecessary trivial queries get JITed and
> > optimized, because the stats are entirely completely off.
> 
> Right.  I instrumented this a bit, and there are indeed two handfuls of
> queries that exceed the default JIT thresholds, as well as a few that
> trigger JIT because they disable some enable_* planner setting, as
> previously discussed.
> 
> Should we throw in some ANALYZEs to avoid this?

Hm, I'd actually lean to just leave it as is for now. JITing halfway
random queries isn't actually that bad... If we get fed up with the
additional time after a while, we can do something then?


> It's perhaps a bit confusing that some of the jit_* settings take effect
> at plan time and some at execution time.  At the moment, this mainly
> affects me reading the code ;-), but it would also have some effect on
> prepared statements and such.

Not quite sure what you mean?


> Also, jit_tuple_deforming is apparently used only when jit_expressions
> is on.

Right. I've not found a good place to hook into that has enough context
to do JITed deforming otherwise.  I'm inclined to just relegate
jit_tuple_deforming to debugging status (i.e. exclude from show all,
docs etc) for now.


> So, we should work toward more clarity on all these different settings,
> what they are useful for, when to set them, how they interact.

Yep.

Greetings,

Andres Freund



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-03-11 Thread Mark Dilger

> On Mar 3, 2018, at 2:40 PM, Tomas Vondra  wrote:
> 
> An updated patch version, fixing the breakage caused by fd1a421fe6
> twiddling with pg_proc.

Hi Tomas!

Reviewing the sgml documentation, I think something like the following should 
be added:

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a0e6d7062b..108c4ec430 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6496,7 +6496,9 @@ SCRAM-SHA-256$:&l
 An array containing codes for the enabled statistic kinds;
 valid values are:
 d for n-distinct statistics,
-f for functional dependency statistics
+f for functional dependency statistics,
+m for most common values (mcv) statistics, and
+h for histogram statistics
   
  


mark


Re: disable SSL compression?

2018-03-11 Thread Tom Lane
Peter Eisentraut  writes:
> The change in the Debian package I found was to build without zlib at
> all.  So no amount of turning it back on will help.  Whereas the
> upstream change was just to make the default to be off.  But anyway,
> this feature is clearly dying, so we probably shouldn't be trying very
> hard to keep it.

Right; the other point made in the referenced thread was that future
versions of the TLS spec would probably drop compression altogether.
So adding any sort of server or libpq option is work that's going to
be wasted in the long run.

> My proposal is the attached patch that sets the default in libpq to off
> and adjusts the documentation a bit so it doesn't sound like we have
> missed the news altogether.

Seems reasonable as far as it goes, but do we need to make corresponding
server-side changes?

regards, tom lane



Re: Bogus use of canonicalize_qual

2018-03-11 Thread Tom Lane
Dean Rasheed  writes:
> On 10 March 2018 at 20:21, Tom Lane  wrote:
>> If we suppose that we only need to fix it in HEAD, the most attractive
>> answer is to add a parameter distinguishing WHERE and CHECK arguments
>> to canonicalize_qual.

> I agree that this looks like the best choice, but it feels a little
> unsatisfactory to not back-patch a fix for such a glaring bug. You
> could perhaps leave the signature of canonicalize_qual() the same, but
> add a new canonicalize_check() function, and make both thin wrappers
> on top of a local function accepting the is_check parameter.

Hm.  I'd be inclined to create canonicalize_qual_extended(qual, is_check)
and then make canonicalize_qual() call that with is_check = false.
But either way would avoid breaking API compatibility for the back
branches.

I guess the next question is whether we should do it the same way
in HEAD, avoiding a cross-branch difference.  But I don't like that,
because part of the point here IMO is to force any external callers
of canonicalize_qual() to reconsider what they're doing.

regards, tom lane



Re: JIT compiling with LLVM v11

2018-03-11 Thread Peter Eisentraut
On 3/9/18 15:42, Peter Eisentraut wrote:
> The default of jit_above_cost = 50 seems pretty good.  I constructed
> a query that cost about 45 where the run time with and without JIT
> were about even.  This is obviously very limited testing, but it's a
> good start.

Actually, the default in your latest code is 10, which per my
analysis would be too low.  Did you arrive at that setting based on testing?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: JIT compiling with LLVM v11

2018-03-11 Thread Peter Eisentraut
On 3/9/18 15:56, Andres Freund wrote:
> On 2018-03-09 15:28:19 -0500, Peter Eisentraut wrote:
>> On 3/6/18 15:16, Andres Freund wrote:
>>> 2) Don't load the JIT provider until fully needed. Right now
>>>jit_compile_expr() will load the jit provider even if not really
>>>needed. We should probably move the first two return blocks in
>>>llvm_compile_expr() into jit_compile_expr(), to avoid that.
>>
>> I see that you have implemented that, but it doesn't seem to have helped
>> with my make installcheck times.
> 
> What's the exact comparison you're looking at?

I'm just running `time make installcheck` with default settings, as
described in my message from March 6.

> I think that's largely that unnecessary trivial queries get JITed and
> optimized, because the stats are entirely completely off.

Right.  I instrumented this a bit, and there are indeed two handfuls of
queries that exceed the default JIT thresholds, as well as a few that
trigger JIT because they disable some enable_* planner setting, as
previously discussed.

Should we throw in some ANALYZEs to avoid this?

If I set jit_expressions = off, then the timings match again.

It's perhaps a bit confusing that some of the jit_* settings take effect
at plan time and some at execution time.  At the moment, this mainly
affects me reading the code ;-), but it would also have some effect on
prepared statements and such.

Also, jit_tuple_deforming is apparently used only when jit_expressions
is on.

So, we should work toward more clarity on all these different settings,
what they are useful for, when to set them, how they interact.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



GSOC 2018 proposal

2018-03-11 Thread Charles Cui
Hi Aleksander,

   I am currently preparing a proposal for pg_thrift project. I noticed
that there are several protocols supported by thrift, which ones do we have
higher priority? I mean which ones I need to implement during this project?





Thanks, Charles.


Re: disable SSL compression?

2018-03-11 Thread Magnus Hagander
On Sun, Mar 11, 2018 at 2:05 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/11/18 04:00, Magnus Hagander wrote:
> > I am not talking about the OpenSSL disabling it. It was disabled on most
> > *distributions* years ago, long before that commit. Which is why I'm
> > still curious as to what platform you actually got it enabled by default
> > on...
>
> Homebrew package
>
> > So for your purposes, you could add a server option to turn it back
> on.
> >
> > Such a server option would also be useful for those users who are
> using
> > OpenSSL <1.1.0 and want to turn off compression on the server side.
> >
> >
> > We'd probably have to put in the distribution specific workarounds like
> > mentioned above to make it actually useful for that.
>
> The change in the Debian package I found was to build without zlib at
> all.  So no amount of turning it back on will help.  Whereas the
> upstream change was just to make the default to be off.  But anyway,
> this feature is clearly dying, so we probably shouldn't be trying very
> hard to keep it.
>
> My proposal is the attached patch that sets the default in libpq to off
> and adjusts the documentation a bit so it doesn't sound like we have
> missed the news altogether.
>
>
I think it's worth mentioning in the docs around "it's now considered
insecure" that it's still an option to use if compression is the main thing
one is looking for, rather than security. As in, it doesn't make it any
less secure than no ssl at all. (obviously not those words)

+1 otherwise.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Faster inserts with mostly-monotonically increasing values

2018-03-11 Thread Claudio Freire
On Sun, Mar 11, 2018 at 2:27 AM, Pavan Deolasee
 wrote:
>
>
> On Sat, Mar 10, 2018 at 12:11 AM, Claudio Freire 
> wrote:
>>
>> On Fri, Mar 9, 2018 at 2:54 PM, Pavan Deolasee 
>> wrote:
>> >
>> >
>>
>> >
>> > So yes, the benefits of the patch go down with higher number of clients,
>> > but
>> > it does not entirely vanish.
>>
>> What if you implement my suggestion?
>>
>> That should improve the multi-client case considerably.
>
>
>
> Yes, I will try that next - it seems like a good idea. So the idea would be:
> check if the block is still the rightmost block and the insertion-key is
> greater than the first key in the page. If those conditions are satisfied,
> then we do a regular binary search within the page to find the correct
> location. This might add an overhead of binary search when keys are strictly
> ordered and a single client is inserting the data. If that becomes a
> concern, we might be able to look for that special case too and optimise for
> it too.

Yeah, pretty much that's the idea. Beware, if the new item doesn't
fall in the rightmost place, you still need to check for serialization
conflicts.



[GSOC 18] Performance Farm Project——Initialization Project

2018-03-11 Thread Hongyuan Ma
Hello, mark
I initialized a Django project and imported the Django REST Framework. Its 
github address is: https://github.com/PGPerfFarm/server-code
I created some model classes and also wrote scripts in the dbtools folder to 
import simulation data for development. I am hesitant to use admin or xadmin as 
the administrator's backend for the site (I prefer to use xadmin).


I also initialized the website's front-end application. Here is its address: 
https://github.com/PGPerfFarm/front-end-code.git
I wrote the header component as shown:


I hope this effect can enhance the future user experience:).
This application uses vue.js and element-ui, but if you insist on using other 
js libraries or not using js, please let me know. I will empty this project and 
then rewrite it as you wish.


My next step is to determine the necessary api interface and the corresponding 
components of the front-end application. Then implement them one by one.
Finally, as you can see, I created an organization named PGPerfFarm on github 
without permission. I sent you an invitation letter, and after you join, I am 
willing to hand over the administrator of the organization to you.




Regards,
Hongyuan Ma (cs_maleica...@163.com) 

Re: ALTER TABLE ADD COLUMN fast default

2018-03-11 Thread David Rowley
On 9 March 2018 at 02:11, David Rowley  wrote:
> On 8 March 2018 at 18:40, Andrew Dunstan  
> wrote:
>>  select * from t;
>>  fastdef tps = 107.145811
>>  master  tps = 150.207957
>>
>> "select * from t" used to be about a wash, but with this patch it's
>> got worse. The last two queries were worse and are now better, so
>> that's a win.
>
> How does it compare to master if you drop a column out the table?
> Physical tlists will be disabled in that case too. I imagine the
> performance of master will drop much lower than the all columns
> missing case.

I decided to test this for myself, and the missing version is still
slightly slower than the dropped column version, but not by much. I'm
not personally concerned about this.

The following results are with 1000 column tables with 64 rows each.

*** Benchmarking normal table...
tps = 232.794734 (excluding connections establishing)

*** Benchmarking missing table...
tps = 202.827754 (excluding connections establishing)

*** Benchmarking dropped table...
tps = 217.339255 (excluding connections establishing)

There's nothing particularly interesting in the profiles for the
missing and normal case either:

-- Missing case for select * from missing;

  12.87%  postgres  postgres[.] AllocSetAlloc
  10.09%  postgres  libc-2.17.so[.] __strlen_sse2_pminub
   6.35%  postgres  postgres[.] pq_sendcountedtext
   5.97%  postgres  postgres[.] enlargeStringInfo
   5.47%  postgres  postgres[.] ExecInterpExpr
   4.45%  postgres  libc-2.17.so[.] __memcpy_ssse3_back
   4.39%  postgres  postgres[.] palloc
   4.31%  postgres  postgres[.] printtup
   3.81%  postgres  postgres[.] pg_ltoa
   3.72%  postgres  [kernel.kallsyms]   [k] __lock_text_start
   3.38%  postgres  postgres[.] FunctionCall1Coll
   3.11%  postgres  postgres[.] int4out
   2.97%  postgres  postgres[.] appendBinaryStringInfoNT
   2.49%  postgres  postgres[.] slot_getmissingattrs
   1.66%  postgres  postgres[.] pg_server_to_any
   0.99%  postgres  postgres[.] MemoryContextAllocZeroAligned
   0.94%  postgres  postgres[.] expression_tree_walker
   0.93%  postgres  postgres[.] lappend
   0.89%  postgres  [kernel.kallsyms]   [k] __do_page_fault
   0.72%  postgres  [kernel.kallsyms]   [k] clear_page_c_e
   0.72%  postgres  postgres[.] SendRowDescriptionMessage
   0.71%  postgres  postgres[.] expandRelAttrs
   0.64%  postgres  [kernel.kallsyms]   [k] get_page_from_freelist
   0.64%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
   0.62%  postgres  postgres[.] pg_server_to_client
   0.59%  postgres  libc-2.17.so[.] __memset_sse2
   0.58%  postgres  postgres[.] set_pathtarget_cost_width
   0.56%  postgres  postgres[.] OutputFunctionCall
   0.56%  postgres  postgres[.] memcpy@plt
   0.54%  postgres  postgres[.] makeTargetEntry
   0.50%  postgres  postgres[.] SearchCatCache3

-- Normal case for select * from normal;

  12.57%  postgres  postgres[.] AllocSetAlloc
  10.50%  postgres  libc-2.17.so[.] __strlen_sse2_pminub
   7.65%  postgres  postgres[.] slot_deform_tuple
   6.52%  postgres  postgres[.] pq_sendcountedtext
   6.03%  postgres  postgres[.] enlargeStringInfo
   4.51%  postgres  postgres[.] printtup
   4.35%  postgres  postgres[.] palloc
   4.34%  postgres  libc-2.17.so[.] __memcpy_ssse3_back
   3.90%  postgres  postgres[.] pg_ltoa
   3.49%  postgres  [kernel.kallsyms]   [k] __lock_text_start
   3.35%  postgres  postgres[.] int4out
   3.31%  postgres  postgres[.] FunctionCall1Coll
   2.82%  postgres  postgres[.] appendBinaryStringInfoNT
   1.68%  postgres  postgres[.] pg_server_to_any
   1.30%  postgres  postgres[.] SearchCatCache3
   1.01%  postgres  postgres[.] SendRowDescriptionMessage
   0.89%  postgres  postgres[.] lappend
   0.88%  postgres  postgres[.] MemoryContextAllocZeroAligned
   0.79%  postgres  postgres[.] expression_tree_walker
   0.67%  postgres  postgres[.] SearchCatCache1
   0.67%  postgres  postgres[.] expandRelAttrs
   0.64%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
   0.60%  postgres  postgres[.] pg_server_to_client
   0.57%  postgres  [kernel.kallsyms]   [k] __do_page_fault
   0.56%  postgres  postgres[.] OutputFunctionCall
   0.53%  postgres  postgres[.] memcpy@plt
   0.53%  postgres  postgres[.] makeTargetEntry
   0.51%  postgres  [kernel.kallsyms]   [k] get_page_from_freelist

The difference appears to be in ExecInterpExpr, although, that only
accounts for about 5.5%, and we're missi

Re: [bug fix] pg_rewind creates corrupt WAL files, and the standby cannot catch up the primary

2018-03-11 Thread Justin Pryzby
On Sun, Mar 11, 2018 at 11:04:01PM +0900, Michael Paquier wrote:
> On Fri, Mar 09, 2018 at 08:22:49AM +, Tsunakawa, Takayuki wrote:
> > Thanks for reviewing.  All done.
> 
> Thanks.  Please be careful with the indentation of the patch.  Attached
> is a slightly-updated version with a small modification in
> remove_target_file to make the code cleaner, a proposal of commit
> message and pgindent applied on the code.  I am switching that as ready
> for committer.
> --
> Michael

> From 62950c615d87e3e58998f295ce446eb5c80707e4 Mon Sep 17 00:00:00 2001
> From: Michael Paquier 
> Date: Sun, 11 Mar 2018 22:49:55 +0900
> Subject: [PATCH] Fix handling of removed files on target server with pg_rewind
> 
> After processing the filemap to build the list of chunks that will be
> fetched from the source to rewing the target server, it is possible that
> a file which was previously processed is removed from the source.  A
> simple example of such an occurence is a WAL segment which gets recycled
> on the target in-between.  When the filemap is processed, files not
> categorized as relation files are first truncated to prepare for its
> full copy of which is going to be taken from the source, divided into a
> set of junks.

Hopefully a set of CHUNKS not JUNKS ?

Justin



Inconsistent behavior in serializable snapshot

2018-03-11 Thread Kuntal Ghosh
Hello hackers,

While working on serializable transaction isolation, I've noticed some
strange behavior in the first permutation mentioned in
isolation/specs/read-only-anomaly-2.spec file.

setup
{
CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
}

# without s3, s1 and s2 commit
permutation "s2rx" "s2ry" "s1ry" "s1wy" "s1c" "s2wx" "s2c" "s3c"

Here, we can see a serial order T1 <- T2 without any conflict.
However, if I perform "VACUUM FREEZE bank_account" after the setup
step, s2wx throws a conflict error:
ERROR:  could not serialize access due to read/write dependencies
among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during write.
HINT:  The transaction might succeed if retried.

Is this an expected behavior?
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: [bug fix] pg_rewind creates corrupt WAL files, and the standby cannot catch up the primary

2018-03-11 Thread Michael Paquier
On Fri, Mar 09, 2018 at 08:22:49AM +, Tsunakawa, Takayuki wrote:
> Thanks for reviewing.  All done.

Thanks.  Please be careful with the indentation of the patch.  Attached
is a slightly-updated version with a small modification in
remove_target_file to make the code cleaner, a proposal of commit
message and pgindent applied on the code.  I am switching that as ready
for committer.
--
Michael
From 62950c615d87e3e58998f295ce446eb5c80707e4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sun, 11 Mar 2018 22:49:55 +0900
Subject: [PATCH] Fix handling of removed files on target server with pg_rewind

After processing the filemap to build the list of chunks that will be
fetched from the source to rewing the target server, it is possible that
a file which was previously processed is removed from the source.  A
simple example of such an occurence is a WAL segment which gets recycled
on the target in-between.  When the filemap is processed, files not
categorized as relation files are first truncated to prepare for its
full copy of which is going to be taken from the source, divided into a
set of junks.  However, for a recycled WAL segment, this would result in
a segment which has a zero-byte size.  With such an empty file,
post-rewind recovery thinks that records are saved but they are actually
not because of the truncation which happened when processing the
filemap, resulting in data loss.

In order to fix the problem, make sure that files which are found as
removed on the source when receiving chunks of them are as well deleted
on the target server for consistency.  This ensures that no empty junk
files remain.  If a file has a size so large that it needs more than one
chunk to be fully copied, it could be possible that the deletion is
attempted more than once.  It could be possible as well that the file
has gone away after already copying some chunks on the target.  For
those reasons, the file removals done when processing the file chunks
are lossy, and ignore missing entries.

Patch and report from Tsunakawa Takayuki, reviewed by Michael Paquier.

Discussion: https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8DAAA2%40G01JPEXMBYT05
---
 src/bin/pg_rewind/file_ops.c| 16 
 src/bin/pg_rewind/file_ops.h|  1 +
 src/bin/pg_rewind/libpq_fetch.c | 10 +++---
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_rewind/file_ops.c b/src/bin/pg_rewind/file_ops.c
index 705383d184..f491ed7f5c 100644
--- a/src/bin/pg_rewind/file_ops.c
+++ b/src/bin/pg_rewind/file_ops.c
@@ -29,7 +29,6 @@
 static int	dstfd = -1;
 static char dstpath[MAXPGPATH] = "";
 
-static void remove_target_file(const char *path);
 static void create_target_dir(const char *path);
 static void remove_target_dir(const char *path);
 static void create_target_symlink(const char *path, const char *link);
@@ -134,7 +133,7 @@ remove_target(file_entry_t *entry)
 			break;
 
 		case FILE_TYPE_REGULAR:
-			remove_target_file(entry->path);
+			remove_target_file(entry->path, false);
 			break;
 
 		case FILE_TYPE_SYMLINK:
@@ -165,8 +164,12 @@ create_target(file_entry_t *entry)
 	}
 }
 
-static void
-remove_target_file(const char *path)
+/*
+ * Remove a file from target data directory.  If missing_ok is true, it
+ * is fine for the target file to not exist.
+ */
+void
+remove_target_file(const char *path, bool missing_ok)
 {
 	char		dstpath[MAXPGPATH];
 
@@ -175,8 +178,13 @@ remove_target_file(const char *path)
 
 	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
 	if (unlink(dstpath) != 0)
+	{
+		if (errno == ENOENT && missing_ok)
+			return;
+
 		pg_fatal("could not remove file \"%s\": %s\n",
  dstpath, strerror(errno));
+	}
 }
 
 void
diff --git a/src/bin/pg_rewind/file_ops.h b/src/bin/pg_rewind/file_ops.h
index be580ee4db..9d26cf4f77 100644
--- a/src/bin/pg_rewind/file_ops.h
+++ b/src/bin/pg_rewind/file_ops.h
@@ -15,6 +15,7 @@
 extern void open_target_file(const char *path, bool trunc);
 extern void write_target_range(char *buf, off_t begin, size_t size);
 extern void close_target_file(void);
+extern void remove_target_file(const char *path, bool missing_ok);
 extern void truncate_target_file(const char *path, off_t newsize);
 extern void create_target(file_entry_t *t);
 extern void remove_target(file_entry_t *t);
diff --git a/src/bin/pg_rewind/libpq_fetch.c b/src/bin/pg_rewind/libpq_fetch.c
index 8f8d504455..5914b15017 100644
--- a/src/bin/pg_rewind/libpq_fetch.c
+++ b/src/bin/pg_rewind/libpq_fetch.c
@@ -311,15 +311,19 @@ receiveFileChunks(const char *sql)
 		chunk = PQgetvalue(res, 0, 2);
 
 		/*
-		 * It's possible that the file was deleted on remote side after we
-		 * created the file map. In this case simply ignore it, as if it was
-		 * not there in the first place, and move on.
+		 * If a file has been deleted on the source, remove it on the target
+		 * as well.  Note that multiple unlink() calls may happen on the same
+		 * file if multiple data chunks ar

Re: disable SSL compression?

2018-03-11 Thread Peter Eisentraut
On 3/11/18 04:00, Magnus Hagander wrote:
> I am not talking about the OpenSSL disabling it. It was disabled on most
> *distributions* years ago, long before that commit. Which is why I'm
> still curious as to what platform you actually got it enabled by default
> on...

Homebrew package

> So for your purposes, you could add a server option to turn it back on.
> 
> Such a server option would also be useful for those users who are using
> OpenSSL <1.1.0 and want to turn off compression on the server side.
> 
> 
> We'd probably have to put in the distribution specific workarounds like
> mentioned above to make it actually useful for that. 

The change in the Debian package I found was to build without zlib at
all.  So no amount of turning it back on will help.  Whereas the
upstream change was just to make the default to be off.  But anyway,
this feature is clearly dying, so we probably shouldn't be trying very
hard to keep it.

My proposal is the attached patch that sets the default in libpq to off
and adjusts the documentation a bit so it doesn't sound like we have
missed the news altogether.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 42d7f920c2732b5ea47bffd7bc1dc03b36fb6d05 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 11 Mar 2018 08:53:33 -0400
Subject: [PATCH] Set libpq sslcompression to off by default

Since SSL compression is no longer recommended, turn the default in
libpq from on to off.

OpenSSL 1.1.0 and many distribution packages already turn compression
off by default, so such a server won't accept compression anyway.  So
this will mainly affect users of older OpenSSL installations.

Also update the documentation to make clear that this setting is no
longer recommended.
---
 doc/src/sgml/libpq.sgml  | 24 +---
 src/interfaces/libpq/fe-connect.c|  2 +-
 src/interfaces/libpq/fe-secure-openssl.c |  8 
 3 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index da9421486b..18686bcad3 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1438,19 +1438,21 @@ Parameter Key Words
   sslcompression
   

-If set to 1 (default), data sent over SSL connections will be
-compressed.
-If set to 0, compression will be disabled (this requires
-OpenSSL 1.0.0 or later).
-This parameter is ignored if a connection without SSL is made,
-or if the version of OpenSSL used does not 
support
-it.
+If set to 1, data sent over SSL connections will be compressed.  If
+set to 0, compression will be disabled.  The default is 0.  This
+parameter is ignored if a connection without SSL is made.

+

-Compression uses CPU time, but can improve throughput if
-the network is the bottleneck.
-Disabling compression can improve response time and throughput
-if CPU performance is the limiting factor.
+SSL compression is nowadays considered insecure and its use is no
+longer recommended.  OpenSSL 1.1.0 disables
+compression by default, and many operating system distributions
+disable it in prior versions as well, so setting this parameter to on
+will not have any effect if the server does not accept compression.
+On the other hand, OpenSSL before 1.0.0
+does not support disabling compression, so this parameter is ignored
+with those versions, and whether compression is used depends on the
+server.

   
  
diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index 77eebb0ba1..39c19998c2 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -279,7 +279,7 @@ static const internalPQconninfoOption PQconninfoOptions[] = 
{
"SSL-Mode", "", 12, /* sizeof("verify-full") == 12 
*/
offsetof(struct pg_conn, sslmode)},
 
-   {"sslcompression", "PGSSLCOMPRESSION", "1", NULL,
+   {"sslcompression", "PGSSLCOMPRESSION", "0", NULL,
"SSL-Compression", "", 1,
offsetof(struct pg_conn, sslcompression)},
 
diff --git a/src/interfaces/libpq/fe-secure-openssl.c 
b/src/interfaces/libpq/fe-secure-openssl.c
index cade4e157c..25aa7a21bf 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1188,14 +1188,14 @@ initialize_SSL(PGconn *conn)
SSL_set_verify(conn->ssl, SSL_VERIFY_PEER, verify_cb);
 
/*
-* If the OpenSSL version used supports it (from 1.0.0 on) and the user
-* requested it, disable SSL compression.
+* Set compression option if the OpenSSL version used supports it (from
+* 1.0.0 on).
 */
 #ifdef SSL_OP_NO_COMPRESSION
if (conn->sslc

Re: [HACKERS] why not parallel seq scan for slow functions

2018-03-11 Thread Amit Kapila
On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
 wrote:
>
>

After recent commits, the patch doesn't get applied cleanly, so
rebased it and along the way addressed the comments raised by you.

> Here are some comments on the patch.
>
> +/*
> + * Except for the topmost scan/join rel, consider gathering
> + * partial paths.  We'll do the same for the topmost 
> scan/join
> This function only works on join relations. Mentioning scan rel is confusing.
>

Okay, removed the 'scan' word from the comment.


> index 6e842f9..5206da7 100644
> --- a/src/backend/optimizer/path/allpaths.c
> +++ b/src/backend/optimizer/path/allpaths.c
> @@ -481,14 +481,21 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
>  }
>
> + *
> + * Also, if this is the topmost scan/join rel (that is, the only 
> baserel),
> + * we postpone this until the final scan/join targelist is available (see
>
> Mentioning join rel here is confusing since we deal with base relations here.
>

Okay, removed the 'join' word from the comment.

> +bms_membership(root->all_baserels) != BMS_SINGLETON)
>
> set_tablesample_rel_pathlist() is also using this method to decide whether
> there are any joins in the query. May be macro-ize this and use that macro at
> these two places?
>

maybe, but I am not sure if it improves the readability.  I am open to
changing it if somebody else also feels it is better to macro-ize this
usage.

> - * for the specified relation.  (Otherwise, add_partial_path might delete a
> + * for the specified relation. (Otherwise, add_partial_path might delete a
>
> Unrelated change?
>

oops, removed.

> +/* Add projection step if needed */
> +if (target && simple_gather_path->pathtarget != target)
>
> If the target was copied someplace, this test will fail. Probably we want to
> check containts of the PathTarget structure? Right now copy_pathtarget() is 
> not
> called from many places and all those places modify the copied target. So this
> isn't a problem. But we can't guarantee that in future. Similar comment for
> gather_merge path creation.
>

I am not sure if there is any use of copying the path target unless
you want to modify it , but anyway we use the check similar to what is
used in patch in the multiple places in code.  See
create_ordered_paths.  So, we need to change all those places first if
we sense any such danger.

> +simple_gather_path = apply_projection_to_path(root,
> +  rel,
> +  simple_gather_path,
> +  target);
> +
>
> Why don't we incorporate those changes in create_gather_path() by passing it
> the projection target instead of rel->reltarget? Similar comment for
> gather_merge path creation.
>

Nothing important, just for the sake of code consistency, some other
places in code uses it this way.  See create_ordered_paths. Also, I am
not sure if there is any advantage of doing it inside
create_gather_path.

> +/*
> + * Except for the topmost scan/join rel, consider gathering
> + * partial paths.  We'll do the same for the topmost scan/join 
> rel
>
> Mentioning scan rel is confusing here.
>

Okay, changed.

>
>  deallocate tenk1_count;
> +explain (costs off) select ten, costly_func(ten) from tenk1;
>
> verbose output will show that the parallel seq scan's targetlist has
> costly_func() in it. Isn't that what we want to test here?
>

Not exactly, we want to just test whether the parallel plan is
selected when the costly function is used in the target list.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_paths_include_tlist_cost_v9.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-11 Thread Alexander Korotkov
On Sat, Mar 10, 2018 at 5:49 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Mar 9, 2018 at 9:40 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada 
>> wrote:
>>
>>> Attached an updated patch
>>>
>> fixed these issue. Will review the patch again.
>>
>>
>> Thank you!
>>
>
> I've fixed a bug: _bt_vacuum_needs_cleanup() didn't release a lock when
> metapage is in old format.
> Bug was found by Darafei Praliaskouski and Grigory Smolkin who tested this
> patch.
> Revised patch is attached.
>

I also found that online upgrade of meta-page didn't work: version field
wasn't set.
Fixed in the attached version of patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-lazy-btree-cleanup-6.patch
Description: Binary data


Re: Parallel Aggregates for string_agg and array_agg

2018-03-11 Thread Tomas Vondra


On 03/11/2018 07:31 AM, David Rowley wrote:
> On 11 March 2018 at 12:11, Tomas Vondra  wrote:
>> On 03/05/2018 04:51 AM, David Rowley wrote:
>>> On 5 March 2018 at 04:54, Tomas Vondra  wrote:
>>> Consider the following slightly backward-looking case;
>>>
>>> select string_agg(',', x::text) from generate_Series(1,10)x;
>>>   string_agg
>>> --
>>>  ,2,3,4,5,6,7,8,9,10,
>>>
>>> Here the delimiter is the number, not the ','. Of course, the
>>> delimiter for the first aggregated result is not shown.  The previous
>>> implementation of the transfn for this aggregate just threw away the
>>> first delimiter, but that's no good for parallel aggregates as the
>>> transfn may be getting called in a parallel worker, in which case
>>> we'll need that delimiter in the combine function to properly join to
>>> the other partial aggregated results matching the same group.
>>>
>>
>> Hmmm, you're right I haven't considered the delimiter might be variable.
>> But isn't just yet another hint that while StringInfo was suitable for
>> aggregate state of non-parallel string_agg, it may not be for the
>> parallel version?
>>
>> What if the parallel string_agg instead used something like this:
>>
>> struct StringAggState
>> {
>> char   *delimiter;/* first delimiter */
>> StringInfoData  str;  /* partial aggregate state */
>> } StringAggState;
>>
>> I think that would make the code cleaner, compared to using the cursor
>> field for that purpose. But maybe it'd make it not possible to share
>> code with the non-parallel aggregate, not sure.
> 
> It would be possible to use something like that. The final function
> would just need to take 'str' and ignore 'delimiter', whereas the
> combine function would need both. However, you could optimize the
> above to just have a single StringInfoData and have a pointer to the
> start of the actual data (beyond the first delimiter), that would save
> us a call to palloc and also allow the state's data to exist in one
> contiguous block of memory, which will be more cache friendly when it
> comes to reading it back again.  The pointer here would, of course,
> have to be an offset into the data array, since repallocs would cause
> problems as the state grew.
> 
> This is kinda the option I was going for with using the cursor. I
> didn't think that was going to be a problem. It seems that cursor was
> invented so that users of StringInfo could store a marker somehow
> along with the string. The comment for cursor read:
> 
>  * cursor is initialized to zero by makeStringInfo or initStringInfo,
>  * but is not otherwise touched by the stringinfo.c routines.
>  * Some routines use it to scan through a StringInfo.
> 
> The use of the cursor field does not interfere with pqformat.c's use
> of it as in the serialization functions we're creating a new
> StringInfo for the 'result'. If anything tries to send the internal
> state, then that's certainly broken. It needs to be serialized before
> that can happen.
> 
> I don't quite see how inventing a new struct would make the code
> cleaner. It would make the serialization and deserialization functions
> have to write and read more fields for the lengths of the data
> contained in the state.
> 
> I understand that the cursor field is used in pqformat.c quite a bit,
> but I don't quite understand why it should get to use that field the
> way it wants, but the serialization and deserialization functions
> shouldn't.  I'd understand that if we were trying to phase out the
> cursor field from StringInfoData, but I can't imagine that's going to
> happen.
> 
> Of course, with that all said. If you really strongly object to how 
> I've done this, then I'll change it to use a new struct type. I had 
> just tried to make the patch footprint as small as possible, and the 
> above text is me just explaining my reasoning behind this, not me 
> objecting to your request for me to change it. Let me know your 
> thoughts after reading the above.
> 

Hmmm, I guess you're right, I didn't really thought that through. I
don't object to sticking to the current approach.

> In the meantime, I've attached an updated patch. The only change it 
> contains is updating the "Partial Mode" column in the docs from "No" 
> to "Yes".
> 

Yeah, seems fine to me. I wonder what else would be needed before
switching the patch to RFC. I plan to do that after a bit more testing
sometime early next week, unless someone objects.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Parallel index creation does not properly cleanup after error

2018-03-11 Thread David Rowley
Hi,

I've just stumbled on a bug in the parallel reindexing code.

Example:

-- force parallel index creation
set parallel_tuple_cost = 0;
set parallel_setup_cost = 0;
set min_parallel_table_scan_size = '0MB';
set min_parallel_index_scan_size = '0kB';

-- example (from the regression tests)
CREATE TABLE mvtest_foo(a, b) AS VALUES(1, 10);
CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
CREATE UNIQUE INDEX ON mvtest_mv(a);
INSERT INTO mvtest_foo SELECT * FROM mvtest_foo;
REFRESH MATERIALIZED VIEW mvtest_mv;
ERROR:  cannot modify reindex state during a parallel operation

Due to the failure during the index build, it appears that the
PG_TRY/PG_CATCH block in reindex_relation() causes the reindex_index()
to abort and jump out to the catch block. Here there's a call to
ResetReindexPending(), which complains as we're still left in parallel
mode from the aborted _bt_begin_parallel() call which has called
EnterParallelMode(), but not managed to make it all the way to
_bt_end_parallel() (called from btbuild()), where ExitParallelMode()
is normally called.

Subsequent attempts to refresh the materialized view result in an
Assert failure in list_member_oid()

REFRESH MATERIALIZED VIEW mvtest_mv;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I've not debugged that, but I assume it's because
pendingReindexedIndexes is left as a non-empty list but has had its
memory context obliterated due to the previous query having ended.

The comment in the following fragment is not well honored by the
ResetReindexPending() since it does not clear the list if there's an
error.

PG_CATCH();
{
/* Make sure list gets cleared on error exit */
ResetReindexPending();
PG_RE_THROW();
}

static void
ResetReindexPending(void)
{
if (IsInParallelMode())
elog(ERROR, "cannot modify reindex state during a parallel operation");
pendingReindexedIndexes = NIL;
}

A perhaps simple fix would be just to have ResetReindexPending() only
reset the list to NIL again and not try to raise any error.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [WIP PATCH] Index scan offset optimisation using visibility map

2018-03-11 Thread Andrey Borodin


> 10 марта 2018 г., в 19:20, Michail Nikolaev  
> написал(а):
> 
> > Also, I think that this check could be removed from loop. We do not expect 
> > that it's state will change during execution, do we?  
> 
> Removed.
Sorry, seems like I've incorrectly expressed what I wanted to say.
I mean this Assert() can be checked before loop, not on every loop cycle.

Best regards, Andrey Borodin.




Re: MCV lists for highly skewed distributions

2018-03-11 Thread Dean Rasheed
On 6 March 2018 at 16:48, Dean Rasheed  wrote:
> On 6 March 2018 at 08:51, John Naylor  wrote:
>> On 3/5/18, Dean Rasheed  wrote:
>>> Attached is an updated patch.
>> Nice. The results look good.
>
> Thanks for the review.
>

So I was about ready to commit this, but decided to do more testing,
because I worry that there may be instances that a user could point to
where it makes estimates worse. Maybe that's inevitable for any change
we might make, and I don't think that should stop us from at least
trying to improve things, but it does make me wary about committing
this without a lot of testing.

In all the testing I've done so far, this looks to be a definite
improvement over the current algorithm, but it looks like it's
possible to do better.

Consider the following test case:

drop table if exists foo;
create temp table foo(a int);
insert into foo
  select * from
   (
 select x/2 from generate_series(0,1999) g(x)
 union all
 select 0 from generate_series(0,999)
 union all
 select 1 from generate_series(0,99)
 union all
 select 2 from generate_series(0,9)
 union all
 select 3 from generate_series(0,)
 union all
 select 4 from generate_series(0,999)
 union all
 select 5 from generate_series(0,99)
   ) t
   order by random();
alter table foo alter column a set statistics 1;
analyse foo;

So the table has around 31 million rows, and the stats target is maxed
out -- we're sampling around 10% of the table, and it's not possible
to sample more. Looking at the value a=5, it appears 102 times in the
table, so we can expect it to be seen around 10 times in any sample,
but the minimum count for the new algorithm in this case is 23, so it
won't be included in the MCV list. This leads to it having the same
estimate as all other non-MCV values, which turns out to be rows=5, a
considerable under-estimate.

By comparison, the current algorithm in HEAD does include this value
in the MCV list, so it gets a good estimate. On the other hand, it
generates a complete MCV-list with 1 entries, most of which are
just random values from the list that appear twice in the table and
also happen to appear twice in the sample. So there are nearly 1
values that are significantly over-estimated in this case.

So arguably, the new algorithm is still better than the current one
for this data, but the fact that it doesn't include a=5 in the list
bugs me, because the estimate for that single value is consistently
worse. Looking at the distribution of a=5 in the sample, it is a
hypergeometric distribution with N=3100, n=300 and K=102. This
gives a sample mean of 9.8 and a variance of 8.9 (a standard deviation
of 3.0). Also, this is common enough that in fact that distribution
can be reasonably approximated by a normal distribution. The value is
excluded from the MCV list because the spread is deemed too large,
relative to the mean, so the relative error of the MCV-based estimate
would be too high. However, not including it in the MCV list causes it
to have an estimate of 5, which would correspond to a sample mean of
0.5, and the observed sample mean is more than 3 standard deviations
higher than that. So we do actually have enough information to
conclude that the value is almost certainly more common than the
non-MCV selectivity would suggest, and I think that's sufficient
justification for including it in the MCV list.

The crux of this is that the relative-standard-error algorithm is
making use of 2 pieces of information -- the sample frequency, and an
estimate of its statistical spread. However, there is a 3rd piece of
information that we know, that is not being made use of -- the
selectivity that would be assigned to the value if it were not
included in the MCV list. Looking at just the first 2 pieces of
information ensures that we get decent estimates for the values in the
MCV list (where what constitutes "decent" depends on an arbitrarily
chosen RSE cutoff), but it doesn't take into account the fact that the
values not in the list may have much worse estimates. Making use of
the non-MCV-based estimate allows us to do better -- if the MCV-based
estimate is statistically significantly higher than the non-MCV-based
estimate, then it would almost certainly be better to include that
value in the MCV list, even if its spread is quite large.

This is somewhat similar to the initial idea from Jeff Janes, except
that patch didn't take into account the spread, so it ended up making
the too-many-mcvs problem worse for uniform distributions. There's
also another subtlety with that patch -- when comparing frequencies of
values not in the MCV list, you really have to start from a fully
populated MCV list and work down, rather than starting with a empty
one and working up, because if all the common values have about the
same frequency, the most common amongst them may not be much more
common than the initial average, so you may end up with an empty MCV
list.

Re: PATCH: Unlogged tables re-initialization tests

2018-03-11 Thread Michael Paquier
On Fri, Mar 09, 2018 at 05:23:48PM -0500, Peter Eisentraut wrote:
> This seems like a useful test.
> 
> On 3/5/18 12:35, David Steele wrote:
> > +mkdir($tablespaceDir)
> > +   or die "unable to mkdir \"$tablespaceDir\"";
> 
> Use BAIL_OUT instead of die in tests.

Would it be better to make this practice more uniform?  From the code of
the tests:
$ git grep die -- */t/*.pl | wc -l
50
--
Michael


signature.asc
Description: PGP signature


Re: Bogus use of canonicalize_qual

2018-03-11 Thread Dean Rasheed
On 10 March 2018 at 20:21, Tom Lane  wrote:
> I wrote:
>> Whilst fooling about with predtest.c, I noticed a rather embarrassing
>> error.  Consider the following, rather silly, CHECK constraint:
>> ...
>> So, what to do?  We have a few choices, none ideal:
>
> I'd been assuming that we need to back-patch a fix for this, but after
> further reflection, I'm not so sure.  The bug is only triggered by fairly
> silly CHECK constraints, and given that it's been there a long time (at
> least since 9.2 according to my tests) without any field reports, it seems
> likely that nobody is writing such silly CHECK constraints.
>
> If we suppose that we only need to fix it in HEAD, the most attractive
> answer is to add a parameter distinguishing WHERE and CHECK arguments
> to canonicalize_qual.  That allows symmetrical simplification of constant-
> NULL subexpressions in the two cases, and the fact that the caller now
> has to make an explicit choice of WHERE vs CHECK semantics might help
> discourage people from applying the function in cases where it's not
> clear which one applies.  PFA a patch that does it like that.
>

I agree that this looks like the best choice, but it feels a little
unsatisfactory to not back-patch a fix for such a glaring bug. You
could perhaps leave the signature of canonicalize_qual() the same, but
add a new canonicalize_check() function, and make both thin wrappers
on top of a local function accepting the is_check parameter.

Regards,
Dean



Re: initdb help message about WAL segment size

2018-03-11 Thread Michael Paquier
On Sun, Mar 11, 2018 at 08:23:39AM +0100, Magnus Hagander wrote:
> I propose the attached patch, aligning the help message with the docs. Any
> reason not to?

+1 for doing what you are suggesting.
--
Michael


signature.asc
Description: PGP signature


Re: disable SSL compression?

2018-03-11 Thread Magnus Hagander
 Sun, Mar 11, 2018 at 12:36 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/9/18 09:06, Magnus Hagander wrote:
> > What platform does that actually work out of the box on? I have
> > customers who actively want to use it (for compression, not security --
> > replication across limited and metered links), and the amount of
> > workarounds they have to put in place OS level to get it working is
> > increasingly complicated.
>
> It was disabled in OpenSSL 1.1.0:
>

I am not talking about the OpenSSL disabling it. It was disabled on most
*distributions* years ago, long before that commit. Which is why I'm still
curious as to what platform you actually got it enabled by default on...

Like the stuff here:
https://www.postgresql.org/message-id/flat/CAKwe89Cj7KQ3BZDoUXLF5KBZ8X6icKXHi2Y1mDzTut3PNrH2VA%40mail.gmail.com


  *) CRIME protection: disable compression by default, even if OpenSSL is
>  compiled with zlib enabled. Applications can still enable compression
>  by calling SSL_CTX_clear_options(ctx, SSL_OP_NO_COMPRESSION), or by
>  using the SSL_CONF library to configure compression.
>  [Emilia Käsper]
>
> So for your purposes, you could add a server option to turn it back on.

Such a server option would also be useful for those users who are using
> OpenSSL <1.1.0 and want to turn off compression on the server side.
>
>
We'd probably have to put in the distribution specific workarounds like
mentioned above to make it actually useful for that.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/