[HACKERS] RTE_NAMEDTUPLESTORE, enrtuples and comments

2017-06-10 Thread Noah Misch
While completing my annual src/backend/nodes/*funcs.c audit, I noticed defects
in commit 18ce3a4 changes to RangeTblEntry:

1. Field relid is under a comment saying it is valid for RTE_RELATION only.
   Fields coltypes, coltypmods and colcollations are under a comment saying
   they are valid for RTE_VALUES and RTE_CTE only.  But _outRangeTblEntry()
   treats all of the above as valid for RTE_NAMEDTUPLESTORE.  Which is right?

2. New fields enrname and enrtuples are set only for RTE_NAMEDTUPLESTORE, yet
   they're under the comment for RTE_VALUES and RTE_CTE.  This pair needs its
   own comment.

3. Each of _{copy,equal,out,read}RangeTblEntry() silently ignores enrtuples.
   _equalRangeTblEntry() ignores enrname, too.  In each case, the function
   should either use the field or have a comment to note that skipping the
   field is intentional.  Which should it be?

This fourth point is not necessarily a defect: I wonder if RangeTblEntry is
the right place for enrtuples.  It's a concept regularly seen in planner data
structures but not otherwise seen at parse tree level.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make ANALYZE more selective about what is a "most common value"?

2017-06-10 Thread Mark Kirkwood

On 06/06/17 10:12, Gavin Flower wrote:


On 06/06/17 09:41, Mark Kirkwood wrote:

On 05/06/17 09:30, Tom Lane wrote:


I've been thinking about the behavior discussed in
https://www.postgresql.org/message-id/flat/20170522132017.29944.48391%40wrigleys.postgresql.org 

and it seems to me that there are a couple of things we ought to do 
about

it.

First, I think we need a larger hard floor on the number of occurrences
of a value that're required to make ANALYZE decide it is a "most common
value".  The existing coding is willing to believe that anything that
appears at least twice in the sample is a potential MCV, but that 
design
originated when we were envisioning stats samples of just a few 
thousand
rows --- specifically, default_statistics_target was originally just 
10,
leading to a 3000-row sample size.  So accepting two-appearance 
values as

MCVs would lead to a minimum MCV frequency estimate of 1/1500. Now it
could be a tenth or a hundredth of that.

As a round number, I'm thinking that a good floor would be a frequency
estimate of 1/1000.  With today's typical sample size of 3 rows,
a value would have to appear at least 30 times in the sample to be
believed to be an MCV.  That seems like it gives us a reasonable margin
of error against the kind of sampling noise seen in the above-cited
thread.

Second, the code also has a rule that potential MCVs need to have an
estimated frequency at least 25% larger than what it thinks the 
"average"
value's frequency is.  A rule of that general form seems like a good 
idea,
but I now think the 25% threshold is far too small to do anything 
useful.
In particular, in any case like this where there are more distinct 
values

than there are sample rows, the "average frequency" estimate will
correspond to less than one occurrence in the sample, so that this 
rule is
totally useless to filter anything that we would otherwise consider 
as an
MCV.  I wonder if we shouldn't make it be "at least double the 
estimated

average frequency".



Or possibly calculate the sample standard deviation and make use of 
that to help decide on a more flexible cutoff than twice the avg 
frequency?


Are there any research papers that might help us here (I'm drowning 
in a sea of barely relevant search results for most phrases I've 
tried so far)? I recall there were some that Tom referenced when this 
stuff was originally written.


On the other hand I do have access to some mathematicians 
specializing in statistics - so can get their thoughts on this issue 
if you feel it would be worthwhile.


Cheers

Mark


The standard deviation (sd) is proportional to the square root of the 
number in the sample in a Normal Distribution.


In a Normal Distribution, about 2/3 the values will be within plus or 
minus one sd of the mean.


There seems to be an implicit assumption that the distribution of 
values follows the Normal Distribution - has this been verified? I 
suspect that real data will have a skewed distribution of values, and 
may even be multi modal (multiple peaks)  The Normal Distribution has 
one central peak with 2 tails of the same shape & size.


So in a sample of 100 the sd is proportional to 10%,
for 10,000 the sd is proportional to 1%.

So essentially, the larger the sample size the more reliable is 
knowledge of the most common values (ignoring pathologically extreme 
distributions!) - the measure of reliability depends on the distribution.


How about selecting the cut off as the mean plus one sd, or something 
of that nature?  Note that the cut off point may result in no mcv's 
being selected - especially for small samples.


If practicable, it would be good to sample real datasets. Suggest 
looking at datasets were the current mechanism looks reasonable, and 
ones were the estimates are too far off.  Also, if possible, try any 
new selection method on the datasets and see what the difference is.


The above is based on what I remember from my university statistics 
papers, I took it up to 4th year level many moons ago.






With respect to the assumption of Normal distribution - it is easy to 
come up with an example that shows it need not be: consider our our 
Pgbench 'accounts' table. The distribution of 'bid' values is not Normal 
(it is Uniform).


Now I realized I made people think about Normal distributions by 
mentioning computing the standard deviation of the sample (and I 
probably should have said 'standard deviation of the *sample 
frequencies*') - but this was only a suggestion for working out how to 
(maybe) be less arbitrary about how we decide what values to put in the 
MCV list (currently 25% different from the mean frequency).


regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make ANALYZE more selective about what is a "most common value"?

2017-06-10 Thread Alvaro Herrera
Gavin Flower wrote:

> The standard deviation (sd) is proportional to the square root of
> the number in the sample in a Normal Distribution.
> 
> In a Normal Distribution, about 2/3 the values will be within plus
> or minus one sd of the mean.
> 
> There seems to be an implicit assumption that the distribution of
> values follows the Normal Distribution - has this been verified?

The whole problem here is precisely to determine what is the data
distribution -- one side of it is how to represent it for the planner
(which we do by storing a number of distinct values, a list of MCVs and
their respective frequencies, and a histogram representing values not in
the MCV list); the other side is how to figure out what data to put in
the MCV list and histogram (i.e. what to compute during ANALYZE).

If we knew the distribution was a normal, we wouldn't need any of these
things -- we'd just store the mean and standard deviation.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] memory fields from getrusage()

2017-06-10 Thread Tom Lane
Justin Pryzby  writes:
> This comment from ~1996 says:
> https://doxygen.postgresql.org/postgres_8c_source.html
>  4421  * the only stats we don't show here are for memory usage -- i can't
>  4422  * figure out how to interpret the relevant fields in the rusage 
> struct,
>  4423  * and they change names across o/s platforms, anyway. if you can 
> figure
>  4424  * out what the entries mean, you can somehow extract resident set 
> size,
>  4425  * shared text size, and unshared data and stack sizes.

> .. is that really (still) the case for supported platforms?  I'm hoping that 
> in
> 2017 one can just call getrusage() if autoconf says it's okay ??

We already do call getrusage().  The point of that comment is that the
contents of the resulting struct rusage are not very well standardized.
POSIX says only

The  header defines the rusage structure that includes
at least the following members:

struct timeval ru_utime   user time used
struct timeval ru_stime   system time used

(seems the same in 1997 and 2008 text).  So the existing code has already
got its neck stuck way out in terms of portability.  Maybe you could push
it further, but I bet you'll find that the situation isn't any better than
it was at the time that comment was written.

It's entirely possible that we could simplify the code some, because
I suspect that Windows is the only remaining platform that doesn't
HAVE_GETRUSAGE.  But that in itself doesn't mean we can use any more
fields than we do now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

2017-06-10 Thread Alvaro Herrera
Noah Misch wrote:

> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
> for your status update.  Please reacquaint yourself with the policy on open
> item ownership[1] and then reply immediately.  If I do not hear from you by
> 2017-06-11 07:00 UTC, I will transfer this item to release management team
> ownership without further notice.

I volunteer to own this item.  I'll report on Wednesday 14th at the latest.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-06-10 Thread Mark Rofail
• After finding the arraycontains function, I implemented
arraycontainselem that corresponds to the operator @<(anyarray,
anyelem)
   ◦ Please read the attached patch file to view my progress.

•  In addition to src/backend/utils/adt/arrayfuncs.c where I
implemented arraycontainselem.

   ◦ I also edited pg_amop (src/include/catalog/pg_amop.h) since
it stores information about operators associated with access method
operator families.

+DATA(insert ( 2745   2277 2283 2 s 2753 2742 0 ));
{
2745: Oid amopfamily; (denotes gin array_ops)
277: Oid amoplefttype; (denotes anyaray)
2283: Oid amoprighttype; (denotes anyelem)
5: int16 amopstrategy; /* operator strategy number */ (denotes the new
startegy that is yet to be created)
's': char amoppurpose; (denotes 's' for search)
2753: Oid amopopr; (denotes the new operator Oid)
2742: Oid amopmethod;(denotes gin)
0: Oid amopsortfamily; (0 since search operator)
}

   ◦ And pg_operator (src/include/catalog/pg_operator.h) since it
stores information about operators.
+DATA(insert OID = 2753 (  "@>"   PGNSP PGUID b f f 2277 2283 16 0  0
arraycontainselem 0 0 ));
{
 "@>": NameData oprname; /* name of operator */
Oid oprnamespace; /* OID of namespace containing this oper */
Oid oprowner; /* operator owner */
'b': char oprkind; /* 'l', 'r', or 'b' */ (denotes infix)
'f': bool oprcanmerge; /* can be used in merge join? */
'f': bool oprcanhash; /* can be used in hash join? */
277: Oid oprleft; (denotes anyaray)
2283: Oid oprright; (denotes anyelem)
16: Oid oprresult;  (denotes boolean)
0: Oid oprcom; /* OID of commutator oper, or 0 if none */ (needs to be
revisited)
0: Oid oprnegate; /* OID of negator oper, or 0 if none */ (needs to be
revisited)
arraycontainselem: regproc oprcode; /* OID of underlying function */
0: regproc oprrest; /* OID of restriction estimator, or 0 */
0: regproc oprjoin; /* OID of join estimator, or 0 */
}
diff --git a/src/backend/access/gin/ginarrayproc.c b/src/backend/access/gin/ginarrayproc.c
index cc7435e030..14fedc8066 100644
--- a/src/backend/access/gin/ginarrayproc.c
+++ b/src/backend/access/gin/ginarrayproc.c
@@ -24,6 +24,7 @@
 #define GinContainsStrategy		2
 #define GinContainedStrategy	3
 #define GinEqualStrategy		4
+#define GinContainsElemStrategy		5
 
 
 /*
@@ -110,7 +111,8 @@ ginqueryarrayextract(PG_FUNCTION_ARGS)
 		case GinOverlapStrategy:
 			*searchMode = GIN_SEARCH_MODE_DEFAULT;
 			break;
+		case GinContainsElemStrategy:
 		case GinContainsStrategy:
 			if (nelems > 0)
 *searchMode = GIN_SEARCH_MODE_DEFAULT;
 			else	/* everything contains the empty set */
@@ -171,6 +173,7 @@ ginarrayconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
+		case GinContainsElemStrategy:
 		case GinContainsStrategy:
 			/* result is not lossy */
 			*recheck = false;
@@ -258,7 +261,8 @@ ginarraytriconsistent(PG_FUNCTION_ARGS)
 }
 			}
 			break;
-		case GinContainsStrategy:
+		case GinContainsElemStrategy:
 		case GinContainsStrategy:
 			/* must have all elements in check[] true, and no nulls */
 			res = GIN_TRUE;
 			for (i = 0; i < nkeys; i++)
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index d9c8aa569c..e1ff6d33b5 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -4215,6 +4215,40 @@ arraycontains(PG_FUNCTION_ARGS)
 }
 
 Datum
+arraycontainselem(PG_FUNCTION_ARGS)
+{
+		Datum *elem = PG_GETARG_DATUM(0);
+		AnyArrayType *array1;
+		AnyArrayType *array2 = PG_GETARG_ANY_ARRAY(1);
+		Oid collation = PG_GET_COLLATION();
+		bool result;
+
+		int16 typlen;
+		bool typbyval;
+		char typalign;
+		int nelems;
+
+		/* we have one element */
+		nelems= 1;
+
+		/* get required info about the element type */
+		get_typlenbyvalalign(ARR_ELEMTYPE(array),
+			 &typlen, &typbyval, &typalign);
+
+		/* now build the array */
+		array1 =  construct_array(&elem, nelems,collation, &typlen, &typbyval, &typalign);
+
+		result = array_contain_compare(array2, array1, collation, true,
+			&fcinfo->flinfo->fn_extra);
+
+		/* Avoid leaking memory when handed toasted input. */
+		PG_FREE_IF_COPY(elem,0);
+		AARR_FREE_IF_COPY(array, 1);
+
+		PG_RETURN_BOOL(result);
+}
+
+Datum
 arraycontained(PG_FUNCTION_ARGS)
 {
 	AnyArrayType *array1 = PG_GETARG_ANY_ARRAY(0);
diff --git a/src/include/catalog/pg_amop.h b/src/include/catalog/pg_amop.h
index da0228de6b..2da9002577 100644
--- a/src/include/catalog/pg_amop.h
+++ b/src/include/catalog/pg_amop.h
@@ -687,6 +687,8 @@ DATA(insert (	2595   718 600 15 o 3291 783 1970 ));
  */
 DATA(insert (	2745   2277 2277 1 s 2750 2742 0 ));
 DATA(insert (	2745   2277 2277 2 s 2751 2742 0 ));
+//TODO link the operator's pg_operator OID
+DATA(insert ( 2745   2277 2283 5 s 2753 2742 0 ));
 DATA(insert (	2745   2277 2277 3 s 2752 2742 0 ));
 DATA(insert (	2745   2277 2277 4 s 1070 2742 0 ));
 
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index ccbb17efec..626a0b1c49 100644
--- a/src/include/catalog/pg_operator.h
+++ b/

[HACKERS] memory fields from getrusage()

2017-06-10 Thread Justin Pryzby
I'm interested to expose output of the remaining (memory) fields from 
getrusage().

postgres=# SET log_parser_stats='on';
postgres=# SELECT c.oid::regclass, usagecount FROM pg_buffercache b JOIN 
pg_class c USING (relfilenode) WHERE usagecount=1 ;
LOG:  PARSER STATISTICS
DETAIL:  ! system usage stats:
!   0.000197 elapsed 0.000119 user 0.79 system sec
!   [0.011572 user 0.007714 sys total]
!   0/0 [0/264] filesystem blocks in/out
!   0/39 [0/989] page faults/reclaims, 0 [0] swaps
!   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!   0/0 [2/2] voluntary/involuntary context switches
!   3920 MAX RESIDENT, 0 SHARED, 0 UNSHARED DATA, 0 UNSHARED STACK 
(Kb)

Before that can work for some platforms, it looks like rusagestub.h needs to
have any desired fields added, and ./src/port/getrusage.c should memset(0) in
the non-windows case before adding any reliance on the rest of the structure.

This comment from ~1996 says:
https://doxygen.postgresql.org/postgres_8c_source.html
 4421  * the only stats we don't show here are for memory usage -- i can't
 4422  * figure out how to interpret the relevant fields in the rusage 
struct,
 4423  * and they change names across o/s platforms, anyway. if you can 
figure
 4424  * out what the entries mean, you can somehow extract resident set 
size,
 4425  * shared text size, and unshared data and stack sizes.

.. is that really (still) the case for supported platforms?  I'm hoping that in
2017 one can just call getrusage() if autoconf says it's okay ??

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

2017-06-10 Thread Alvaro Herrera
Tom Lane wrote:
> I wrote:
> > Yes, we already have guards for those cases, but they return fairly opaque
> > error messages to the tune of "set-valued function called in context that
> > cannot accept a set", because the executor hasn't enough context to do
> > better.  I'd like the messages to be more specific, like "set-valued
> > function cannot appear within CASE" and so on.
> 
> Here's an expanded version of the "bottom up" patch that adjusts some
> parser APIs to allow these additional messages to be thrown.  This changes
> all occurrences of "set-valued function called in context that cannot
> accept a set" in the core regression tests into something more specific.
> There are probably some remaining cases that this doesn't cover, but the
> existing execution-time checks will still catch those.

Interesting stuff.  Here's a small recommendation for a couple of those
new messages.



-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
*** a/src/backend/parser/parse_expr.c
--- b/src/backend/parser/parse_expr.c
***
*** 1068,1074  transformAExprNullIf(ParseState *pstate, A_Expr *a)
if (result->opretset)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
!errmsg("NULLIF operator must not return a 
set"),
 parser_errposition(pstate, a->location)));
  
/*
--- 1068,1075 
if (result->opretset)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
!   /* translator: %s is name of a SQL construct, eg NULLIF */
!errmsg("%s operator must not return a set", 
"NULLIF"),
 parser_errposition(pstate, a->location)));
  
/*
***
*** 3041,3047  make_distinct_op(ParseState *pstate, List *opname, Node 
*ltree, Node *rtree,
if (((OpExpr *) result)->opretset)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
!errmsg("IS DISTINCT FROM operator must not 
return a set"),
 parser_errposition(pstate, location)));
  
/*
--- 3042,3050 
if (((OpExpr *) result)->opretset)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
!   /* translator: %s is name of a SQL construct, eg NULLIF */
!errmsg("%s operator must not return a set",
!   "IS DISTINCT FROM"),
 parser_errposition(pstate, location)));
  
/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-10 Thread Alvaro Herrera
Robert Haas wrote:

> Not this patch's problem directly, but while scrutinizing this it
> crossed my mind that we would need to prohibit constraint triggers
> with transition tables.  It turns out that we do, in the parser:
> 
> create constraint trigger table2_trig
> after insert on table2 referencing new table as new_table
> for each statement execute procedure dump_insert();
> ERROR:  syntax error at or near "referencing"
> 
> Possibly it would be better to allow the syntax and error out in
> CreateTrigger(), so that we can give a better error message.  It's
> certainly not obvious from the syntax summary produced by \h CREATE
> TRIGGER why this doesn't work.

Yeah, I agree.  This doesn't look like actual protection, but just a
"happy accident", and the syntax error message is not helpful either.
We could keep it very simple by just throwing the error right there in
gram.y's production, adding TriggerReferencing in the right place in the
CREATE CONSTRAINT TRIGGER production -- at this stage there doesn't seem
to be a need to expand this any further.

> This might be more future-proof, too,
> if somebody someday adds support for REFERENCING { OLD | NEW } ROW to
> constraint triggers and fails to realize that there's not a check
> anywhere outside the parser for the table case.

I suppose in the future it would make sense to allow for-each-statement
constraint triggers with transition tables.  It'd probably a useful way
to optimize FK checks for bulk operations.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-10 Thread Josh Berkus
On 06/09/2017 07:54 PM, Greg Stark wrote:
> On 7 June 2017 at 01:01, Josh Berkus  wrote:
>> P3: apparently jsonb_to_tsvector with lang parameter isn't immutable?
>> This means that it can't be used for indexing:
>>
>> libdata=# create index bookdata_fts on bookdata using gin ((
>> to_tsvector('english',bookdata)));
>> ERROR:  functions in index expression must be marked IMMUTABLE
> 
> I don't have a machine handy to check on but isn't this a strange
> thing to do? Isn't there a GIN opclass on jsonb itself which would be
> the default if you didn't have that to_tsvector() call  -- and which
> would also work properly with the jsonb operators?
> 

The above is the documented way to create an FTS index on a JSONB field.

-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication: \dRp+ and "for all tables"

2017-06-10 Thread Tom Lane
Jeff Janes  writes:
> On Fri, Jun 9, 2017 at 10:20 PM, Masahiko Sawada 
> wrote:
>> On Sat, Jun 10, 2017 at 7:29 AM, Jeff Janes  wrote:
>>> That seems unfortunate.  Should the "for all tables" be included as
>>> another column in \dRp and \dRp+, or at least as a footnote tag in \dRp+ ?

>> +1. I was thinking the same. Attached patch adds "All Tables" column
>> to both \dRp and \dRp+.

> Looks good to me.  Attached with regression test expected output  changes.

This patch confuses me.  In the first place, I don't see the argument for
adding the "all tables" property to \dRp output; it seems out of place
there.  In the second place, this really fails to respond to what I'd call
the main usability problem with \dRp+, which is that the all-tables
property is likely to lead to an unreadably bulky list of affected tables.
What I'd say the patch ought to do is *replace* \dRp+'s list of affected
tables with a notation like "(all tables)" when puballtables is true.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Getting server crash on Windows when using ICU collation

2017-06-10 Thread Ashutosh Sharma
Hi All,

I am seeing a server crash when running queries using ICU collations on
Windows. Following are the steps to reproduce the crash with the help of
patch to enable icu feature on Windows - [1],

1) psql -d postgres

2) CREATE DATABASE icu_win_test
 TEMPLATE template0
 ENCODING 'UTF8'
LC_CTYPE 'C'
   LC_COLLATE 'C';

3) \c icu_win_test;

4) icu_win_test=# select 'B' > 'a' COLLATE "de-u-co-standard-x-icu";
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

The backtrace observed in the core dump is,

> postgres.exe!varstr_cmp(char * arg1, int len1, char * arg2, int len2,
unsigned int collid)  Line 1494 C
  postgres.exe!text_cmp(varlena * arg1, varlena * arg2, unsigned int
collid)  Line 1627 C
  postgres.exe!text_gt(FunctionCallInfoData * fcinfo)  Line 1738 + 0x12
bytes C
  postgres.exe!ExecInterpExpr(ExprState * state, ExprContext * econtext,
char * isnull)  Line 650 + 0xa bytes C
  postgres.exe!evaluate_expr(Expr * expr, unsigned int result_type, int
result_typmod, unsigned int result_collation)  Line 4719 C
  postgres.exe!evaluate_function(unsigned int funcid, unsigned int
result_type, int result_typmod, unsigned int result_collid, unsigned int
input_collid, List * args, char funcvariadic, HeapTupleData * func_tuple,
eval_const_expressions_context * context)  Line 4272 + 0x50 bytes C
  postgres.exe!simplify_function(unsigned int funcid, unsigned int
result_type, int result_typmod, unsigned int result_collid, unsigned int
input_collid, List * * args_p, char funcvariadic, char process_args, char
allow_non_const, eval_const_expressions_context * context)  Line 3914 +
0x44 bytes C
  postgres.exe!eval_const_expressions_mutator(Node * node,
eval_const_expressions_context * context)  Line 2627 C
  postgres.exe!expression_tree_mutator(Node * node, Node * (void)* mutator,
void * context)  Line 2735 + 0x37 bytes C
  postgres.exe!expression_tree_mutator(Node * node, Node * (void)* mutator,
void * context)  Line 2932 + 0x8 bytes C
  postgres.exe!eval_const_expressions(PlannerInfo * root, Node * node)
 Line 2413 C
  postgres.exe!subquery_planner(PlannerGlobal * glob, Query * parse,
PlannerInfo * parent_root, char hasRecursion, double tuple_fraction)  Line
623 + 0x2d bytes C

RCA:

As seen in the backtrace, the crash is basically happening in varstr_cmp()
function. AFAICU, this crash is only happening on Windows because in
varstr_cmp(), if the encoding type is UTF8, we don't even think of calling
ICU functions to compare the string. Infact, we directly attempt to call
the OS function wsccoll*().

The point is, if collation provider is ICU, then we should tryto call ICU
functions for collation support instead of calling OS functions. This thing
is being taken care inside varstr_cmp() for Linux but surprisingly it's not
handled for Windows. Please have a look at below code snippet in
varstr_cmp() to know on how it is being taken care for linux,

*#endif   /* WIN32 */*













































*#ifdef USE_ICU#ifdef HAVE_UCOL_STRCOLLUTF8if (GetDatabaseEncoding() ==
PG_UTF8){   UErrorCode status;   status = U_ZERO_ERROR;   result =
ucol_strcollUTF8(mylocale->info.icu.ucol,
arg1, len1,  arg2, len2,
 &status); if (U_FAILURE(status))
   ereport(ERROR, (errmsg("collation failed: %s",
u_errorName(status; }else#endif{  int32_t ulen1,  ulen2;
 UChar   *uchar1, *uchar2;ulen1 = icu_to_uchar(&uchar1, arg1, len1);
ulen2 = icu_to_uchar(&uchar2, arg2, len2);result =
ucol_strcoll(mylocale->info.icu.ucol,
uchar1, ulen1,  uchar2, ulen2); }#else /*
not USE_ICU *//* shouldn't happen */ elog(ERROR, "unsupported collprovider:
%c", mylocale->provider); #endif   /* not USE_ICU */}else{#ifdef
HAVE_LOCALE_T result = strcoll_l(a1p, a2p, mylocale->info.lt
);#else/* shouldn't happen */ elog(ERROR,
"unsupported collprovider: %c", mylocale->provider);#endif}*

Fix:

I am currently working on this and will try to come up with the fix asap.

[1] -
https://www.postgresql.org/message-id/CAE9k0P%3DQRjtS1a0rgTyKag_%2Bs6XCs7vovV%2BgSkUfYVASog0P8w%40mail.gmail.com


--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


[HACKERS] ICU support on Windows

2017-06-10 Thread Ashutosh Sharma
Hi All,

Currently, we cannot perform ICU enabled build for postgres on Windows
platform. However, this can be done on Linux platforms using
'--with-icu' configuration parameter. Attached is the patch that
allows us to perform icu enabled build for postgres on Windows
platform provided that we have correct version of ICU libraries
installed on our system.

With the help of attached patch, we can use icu feature on Windows.
All we have to do is, download the ICU libraries from - [1] and add
the installation path for icu libraires in config.pl like,

icu => 'E:\Users\pg\icu',

[1]- http://site.icu-project.org/download/53

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


ICU_support_windows.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tablesync.c - comment improvements

2017-06-10 Thread Erik Rijkers

tablesync.c - comment improvements--- src/backend/replication/logical/tablesync.c.orig	2017-06-10 10:20:07.617662465 +0200
+++ src/backend/replication/logical/tablesync.c	2017-06-10 10:45:52.620514397 +0200
@@ -12,18 +12,18 @@
  *	  logical replication.
  *
  *	  The initial data synchronization is done separately for each table,
- *	  in separate apply worker that only fetches the initial snapshot data
- *	  from the publisher and then synchronizes the position in stream with
+ *	  in a separate apply worker that only fetches the initial snapshot data
+ *	  from the publisher and then synchronizes the position in the stream with
  *	  the main apply worker.
  *
- *	  The are several reasons for doing the synchronization this way:
+ *	  There are several reasons for doing the synchronization this way:
  *	   - It allows us to parallelize the initial data synchronization
  *		 which lowers the time needed for it to happen.
  *	   - The initial synchronization does not have to hold the xid and LSN
  *		 for the time it takes to copy data of all tables, causing less
  *		 bloat and lower disk consumption compared to doing the
- *		 synchronization in single process for whole database.
- *	   - It allows us to synchronize the tables added after the initial
+ *		 synchronization in a single process for the whole database.
+ *	   - It allows us to synchronize any tables added after the initial
  *		 synchronization has finished.
  *
  *	  The stream position synchronization works in multiple steps.
@@ -37,7 +37,7 @@
  *		 read the stream and apply changes (acting like an apply worker) until
  *		 it catches up to the specified stream position.  Then it sets the
  *		 state to SYNCDONE.  There might be zero changes applied between
- *		 CATCHUP and SYNCDONE, because the sync worker might be ahead of the
+ *		 CATCHUP and SYNCDONE because the sync worker might be ahead of the
  *		 apply worker.
  *	   - Once the state was set to SYNCDONE, the apply will continue tracking
  *		 the table until it reaches the SYNCDONE stream position, at which
@@ -147,7 +147,7 @@
 }
 
 /*
- * Wait until the relation synchronization state is set in catalog to the
+ * Wait until the relation synchronization state is set in the catalog to the
  * expected one.
  *
  * Used when transitioning from CATCHUP state to SYNCDONE.
@@ -206,12 +206,12 @@
 }
 
 /*
- * Wait until the the apply worker changes the state of our synchronization
+ * Wait until the apply worker changes the state of our synchronization
  * worker to the expected one.
  *
  * Used when transitioning from SYNCWAIT state to CATCHUP.
  *
- * Returns false if the apply worker has disappeared or table state has been
+ * Returns false if the apply worker has disappeared or the table state has been
  * reset.
  */
 static bool
@@ -225,7 +225,7 @@
 
 		CHECK_FOR_INTERRUPTS();
 
-		/* Bail if he apply has died. */
+		/* Bail if the apply has died. */
 		LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
 		worker = logicalrep_worker_find(MyLogicalRepWorker->subid,
 		InvalidOid, false);
@@ -333,7 +333,7 @@
 
 	Assert(!IsTransactionState());
 
-	/* We need up to date sync state info for subscription tables here. */
+	/* We need up-to-date sync state info for subscription tables here. */
 	if (!table_states_valid)
 	{
 		MemoryContext oldctx;
@@ -365,7 +365,7 @@
 	}
 
 	/*
-	 * Prepare hash table for tracking last start times of workers, to avoid
+	 * Prepare a hash table for tracking last start times of workers, to avoid
 	 * immediate restarts.  We don't need it if there are no tables that need
 	 * syncing.
 	 */
@@ -401,7 +401,7 @@
 		{
 			/*
 			 * Apply has caught up to the position where the table sync has
-			 * finished.  Time to mark the table as ready so that apply will
+			 * finished.  Mark the table as ready so that apply will
 			 * just continue to replicate it normally.
 			 */
 			if (current_lsn >= rstate->lsn)
@@ -436,7 +436,7 @@
 			else
 
 /*
- * If no sync worker for this table yet, count running sync
+ * If there is no sync worker for this table yet, count running sync
  * workers for this subscription, while we have the lock, for
  * later.
  */
@@ -477,7 +477,7 @@
 
 			/*
 			 * If there is no sync worker registered for the table and there
-			 * is some free sync worker slot, start new sync worker for the
+			 * is some free sync worker slot, start a new sync worker for the
 			 * table.
 			 */
 			else if (!syncworker && nsyncworkers < max_sync_workers_per_subscription)
@@ -551,7 +551,7 @@
 	int			bytesread = 0;
 	int			avail;
 
-	/* If there are some leftover data from previous read, use them. */
+	/* If there are some leftover data from previous read, use it. */
 	avail = copybuf->len - copybuf->cursor;
 	if (avail)
 	{
@@ -694,7 +694,7 @@
 (errmsg("could not fetch table info for table \"%s.%s\": %s",
 		nspname, relname, res->err)));
 
-	/* We don't know number of rows coming, so allocate enough space. */
+	/*