Re: [HACKERS] Statement-level rollback

2018-01-07 Thread Simon Riggs
On 6 November 2017 at 12:36, MauMau  wrote:
> when I submit the next revision of my patch.

When will the next version be posted?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical decoding fast-forward and slot advance

2018-01-07 Thread Simon Riggs
On 31 December 2017 at 10:44, Petr Jelinek  wrote:

> Attached is patch which adds ability to do fast-forwarding while
> decoding. That means wal is consumed as fast as possible and changes are
> not given to output plugin for sending. The implementation is less
> invasive than I originally though it would be. Most of it is just
> additional filter condition in places where we would normally filter out
> changes because we don't yet have full snapshot.

Looks good.

The precise definition of "slot advance" or "fast forward" isn't
documented in the patch. If we advance past everything, why is there
not just one test in LogicalDecodingProcessRecord() to say if
(ctx->fast_forward)? Why put it in multiple decoding subroutines?

If ctx->fast_forward is set it might throw off other opps, so it would
be useful to see some Asserts elsewhere to make sure we understand and
avoid breakage.

In pg_replication_slot_advance() the moveto variable is set to
PG_GETARG_LSN(1) and then unconditionally overwritten before it is
used for anything. Why?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel append plan instability/randomness

2018-01-07 Thread Tom Lane
Amit Khandekar  writes:
> The fact that b_star gets moved from 5th position to  the first
> position in the scans, indicates that it's cost shoots up from 1.04 to
> a value greater than 1.16. It does not look like a case where two
> costs are almost same due to which their positions swap sometimes. I
> am trying to figure out what else can it be ...

The gut feeling I had upon seeing the failure was that the plan shape
depends on the order in which rows happen to be read from the system
catalogs by a heapscan.  I've not tried to run that idea to ground yet.

regards, tom lane



Re: Parallel append plan instability/randomness

2018-01-07 Thread Amit Khandekar
On 8 January 2018 at 10:10, Amit Kapila  wrote:
> regression=# explain select round(avg(aa)), sum(aa) from a_star;
>   QUERY PLAN
> ---
>  Finalize Aggregate  (cost=2.30..2.31 rows=1 width=40)
>->  Gather  (cost=2.28..2.29 rows=3 width=40)
>  Workers Planned: 3
>  ->  Partial Aggregate  (cost=2.28..2.29 rows=1 width=40)
>->  Parallel Append  (cost=0.00..2.20 rows=15 width=4)
>  ->  Seq Scan on d_star  (cost=0.00..1.16 rows=16 width=4)
>  ->  Seq Scan on f_star  (cost=0.00..1.16 rows=16 width=4)
>  ->  Seq Scan on e_star  (cost=0.00..1.07 rows=7 width=4)
>  ->  Seq Scan on b_star  (cost=0.00..1.04 rows=4 width=4)
>  ->  Seq Scan on c_star  (cost=0.00..1.04 rows=4 width=4)
>  ->  Seq Scan on a_star  (cost=0.00..1.03 rows=3 width=4)
> (11 rows)
>
> The above indicates that paths are listed in the order as expected.
> What makes you think that the order of sub-scans can be random?  Is it
> possible that the number of rows in child relations can vary across
> runs?
>
> One theory that can explain above failure is that the costs of
> scanning some of the sub-paths is very close due to which sometimes
> the results can vary.  If that is the case, then probably using
> fuzz_factor in costs comparison (as is done in attached patch) can
> improve the situation, may be we have to consider some other factors
> like number of rows in each subpath.  However, it might be better to
> first somehow reproduce this case and see what is going wrong, any
> ideas?

The fact that b_star gets moved from 5th position to  the first
position in the scans, indicates that it's cost shoots up from 1.04 to
a value greater than 1.16. It does not look like a case where two
costs are almost same due to which their positions swap sometimes. I
am trying to figure out what else can it be ...

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: heads up: Fix for intel hardware bug will lead to performance regressions

2018-01-07 Thread Michael Paquier
On Mon, Jan 8, 2018 at 1:32 PM, Thomas Munro
 wrote:
> Also pgarch.c, syncrep.c, walsender.c and walreceiver.c use
> PostmasterIsAlive() every time through their loops[1] generating extra
> syscalls, one instance of which has caused complaints before[1] on a
> system where the syscall was expensive (arguably because that kernel
> needs some work but still, it's an example of the thing you asked
> about).
>
> [1] 
> https://www.postgresql.org/message-id/flat/20160915135755.GC19008%40genua.de

Or we could replace calls to PostmasterIsAlive() by checks on
WL_POSTMASTER_DEATH? At least for the WAL sender portion it looks
doable.
-- 
Michael



Re: Parallel append plan instability/randomness

2018-01-07 Thread Amit Kapila
On Sun, Jan 7, 2018 at 5:44 AM, Tom Lane  wrote:
> According to buildfarm member silverfish,
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=silverfish=2018-01-06%2008%3A53%3A38
>
> it's possible to sometimes get this failure in the regression tests:
>
> *** 
> /mnt/buildfarm/buildroot/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out
> Tue Dec 19 20:24:02 2017
> --- 
> /mnt/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/results/select_parallel.out
>   Sat Jan  6 09:21:39 2018
> ***
> *** 75,84 
>Workers Planned: 3
>->  Partial Aggregate
>  ->  Parallel Append
>->  Seq Scan on d_star
>->  Seq Scan on f_star
>->  Seq Scan on e_star
> -  ->  Seq Scan on b_star
>->  Seq Scan on c_star
>->  Seq Scan on a_star
>   (11 rows)
> --- 75,84 
>Workers Planned: 3
>->  Partial Aggregate
>  ->  Parallel Append
> +  ->  Seq Scan on b_star
>->  Seq Scan on d_star
>->  Seq Scan on f_star
>->  Seq Scan on e_star
>->  Seq Scan on c_star
>->  Seq Scan on a_star
>   (11 rows)
>
> Irreproducible failures in the regression tests are not very acceptable.
> Furthermore, considering that the query being tested is
>
> explain (costs off)
>   select round(avg(aa)), sum(aa) from a_star;
>
> it seems to me that the "expected" order of the sub-scans is mighty
> random to begin with.
>

I think order of sub-scans can be random if the number of rows in
child relations can vary across runs.  For the above case, the
subpaths (non-partial-paths) are always in the descending order of
their cost and I can see that by running it locally.  On my local m/c,
output is as below:

regression=# explain select round(avg(aa)), sum(aa) from a_star;
  QUERY PLAN
---
 Finalize Aggregate  (cost=2.30..2.31 rows=1 width=40)
   ->  Gather  (cost=2.28..2.29 rows=3 width=40)
 Workers Planned: 3
 ->  Partial Aggregate  (cost=2.28..2.29 rows=1 width=40)
   ->  Parallel Append  (cost=0.00..2.20 rows=15 width=4)
 ->  Seq Scan on d_star  (cost=0.00..1.16 rows=16 width=4)
 ->  Seq Scan on f_star  (cost=0.00..1.16 rows=16 width=4)
 ->  Seq Scan on e_star  (cost=0.00..1.07 rows=7 width=4)
 ->  Seq Scan on b_star  (cost=0.00..1.04 rows=4 width=4)
 ->  Seq Scan on c_star  (cost=0.00..1.04 rows=4 width=4)
 ->  Seq Scan on a_star  (cost=0.00..1.03 rows=3 width=4)
(11 rows)

The above indicates that paths are listed in the order as expected.
What makes you think that the order of sub-scans can be random?  Is it
possible that the number of rows in child relations can vary across
runs?

One theory that can explain above failure is that the costs of
scanning some of the sub-paths is very close due to which sometimes
the results can vary.  If that is the case, then probably using
fuzz_factor in costs comparison (as is done in attached patch) can
improve the situation, may be we have to consider some other factors
like number of rows in each subpath.  However, it might be better to
first somehow reproduce this case and see what is going wrong, any
ideas?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_pa_cost_comp_v1.patch
Description: Binary data


Re: Condition variable live lock

2018-01-07 Thread Tom Lane
Thomas Munro  writes:
> On Mon, Jan 8, 2018 at 12:38 PM, Tom Lane  wrote:
>> Concretely, as per attached.

> +1 for the idea.  Haven't looked at the code yet but I'll review this
> and the proclist patch shortly.

Thanks.  BTW, I realized that there is a second (and perhaps more
important) reason why we can only prepare one CV sleep at a time:
we only have one cvWaitLink in our PGPROC.  So I'm now inclined
to word the revised comment in ConditionVariablePrepareToSleep as

/*
 * If some other sleep is already prepared, cancel it; this is necessary
 * because we have just one static variable tracking the prepared sleep,
 * and also only one cvWaitLink in our PGPROC.  It's okay to do this
 * because whenever control does return to the other test-and-sleep loop,
 * its ConditionVariableSleep call will just re-establish that sleep as
 * the prepared one.
 */

regards, tom lane



Re: Condition variable live lock

2018-01-07 Thread Thomas Munro
On Mon, Jan 8, 2018 at 12:38 PM, Tom Lane  wrote:
> I wrote:
>> Actually ... perhaps a better design would be to have
>> ConditionVariable[PrepareTo]Sleep auto-cancel any prepared sleep for
>> a different condition variable, analogously to what we just did in
>> ConditionVariableBroadcast, on the same theory that whenever control
>> returns to the other CV wait loop it can re-establish the relevant
>> state easily enough.  I have to think that if the use of CVs grows
>> much, the existing restriction is going to become untenable anyway,
>> so why not just get rid of it?
>
> Concretely, as per attached.

+1 for the idea.  Haven't looked at the code yet but I'll review this
and the proclist patch shortly.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: heads up: Fix for intel hardware bug will lead to performance regressions

2018-01-07 Thread Thomas Munro
On Fri, Jan 5, 2018 at 6:28 AM, Robert Haas  wrote:
> On Tue, Jan 2, 2018 at 5:23 PM, Andres Freund  wrote:
>> Note that real-world scenarios probably will see somewhat smaller
>> impact, as this was measured over a loopback unix sockets which'll have
>> smaller overhead itself than proper TCP sockets + actual network.
>
> What about scenarios with longer-running queries?
>
> Is it feasible to think about reducing the number of system calls we
> issue in cases that weren't previously worth optimizing?

Maybe the places where syscall rate is controlled by arbitrary buffer
sizes?  Examples: 8kB BufFile buffers and 128kB replication stream
buffers.  Just an idea, not sure if it's worth looking into; maybe we
already spend enough time filling those buffers that a 50% syscall
markup won't hurt.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: to_timestamp TZH and TZM format specifiers

2018-01-07 Thread Nikita Glukhov

On 03.01.2018 21:34, Tom Lane wrote:


Andrew Dunstan  writes:

This small and simple standalone patch extracted from the SQL/JSON work
would allow the user to supply a string with a time zone specified as
hh:mm thus:
 SELECT to_timestamp('2011-12-18 11:38 -05:20', '-MM-DD HH12:MI
 TZH:TZM');
  to_timestamp
 --
  Sun Dec 18 08:58:00 2011 PST

I see that Oracle's to_timestamp supports these format codes, so +1
if you've checked that the behavior is compatible with Oracle.  The
most obvious possible gotcha is whether + is east or west of GMT,
but also there's formatting questions like what the field width is
and whether leading zeroes are printed.

Also, I'm unimpressed that you've not bothered to implement the
to_char direction.  That moves this from a feature addition to
a kluge, IMO, especially since that ought to be the easier direction.


BTW, I had not known this before, but according to the page I'm
looking at

https://docs.oracle.com/database/121/SQLRF/sql_elements004.htm#SQLRF00212

Oracle also supports "TZD" to mean a time zone abbreviation (their
example is "PDT") and "TZR" to mean a time zone name (their example
is "US/Pacific", so yes they mean the IANA zone names).  Those seem
remarkably useful, so I'm surprised we've not added support for them.



The patch seems pretty straightforward to me, and it's required for the
jsonpath patches which would be the next cab off the rank in the
SQL/JSON work.

I'm quite confused as to why a patch that alleges to be implementing
SQL-standard behavior would be depending on an Oracle-ism.  That's
not an argument against this patch, but it is a question about the
SQL/JSON work.

regards, tom lane


TZH and TZM specifiers are required by standard for SQL/JSON item method
.datetime() (Feature F411, “Time zone specification”).  To be fully
compliant, we should also support RR,  and FF1-FF9 specifiers.

.datetime() item method is used for conversion of JSON string items to
SQL/JSON datetime items.  Its optional argument "format" determines
target datetime type:


=# SELECT jsonb '"10-03-2017 12:34 +05:20"' @* 
'$.datetime("DD-MM-").type()';
 ?column?
--
 "date"
(1 row)

=# SELECT jsonb '"10-03-2017 12:34 +05:20"' @* '$.datetime("DD-MM- 
HH24:MI").type()';
   ?column?
---
 "timestamp without time zone"
(1 row)

=# SELECT jsonb '"10-03-2017 12:34 +05:20"' @* '$.datetime("DD-MM- HH24:MI 
TZH:TZM").type()';
  ?column?

 "timestamp with time zone"
(1 row)

-- automatic datetime type recognition for ISO-formatted strings
=# SELECT jsonb '"2017-10-03 12:34:56 +05:20"' @* '$.datetime().type()';
  ?column?

 "timestamp with time zone"
(1 row)



Here are corresponding excerpts from the SQL-2016 standard:

9.44 Datetime templates

 ::=
  {  }...

 ::=
  
  | 

 ::=
  
  | 
  | 
  | 
  | 
  | 
  | 
  | 
  | 
  | 
  | 
  | 
  | 
  | 

 ::=
  
  | 
  | 
  | 
  | 
  | 
  | 
  | 

 ::=  | YYY | YY | Y
 ::=  | RR
 ::= MM
 ::= DD
 ::= DDD
 ::= HH | HH12
 ::= HH24
 ::= MI
 ::= SS
 ::= S
 ::=
  FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
 ::= A.M. | P.M.
 ::= TZH
 ::= TZM


9.39 SQL/JSON path language: syntax and semantics
...
10) If  JDT is specified, then the value of JDT shall
conform to the lexical grammar of a  in the Format of
Subclause 9.44, “Datetime templates”.

  a) If JDT contains
 ,
 ,
 ,
 , or
 ,
 then JDT is dated.

  b) If JDT contains
 ,
 ,
 ,
 ,
 ,
 , or
 ,
 then JDT is timed.

 The fractional seconds precision FSP of JDT is
 Case:

 i) If JDT contains 
FF1, FF2, FF3, FF4, FF5, FF6, FF7, FF8, or FF9,
then 1 (one), 2, 3, 4, 5, 6, 7, 8, or 9, respectively.

 ii) Otherwise, 0 (zero).

  c) If JDT contains
  or
 ,
 then JDT is zoned.

  d) If JDT is zoned, then JDT shall be timed.

  e) JDT shall be dated or timed or both.
  
  f) The implicit datetime data type IDT of JDT is

 Case:
 i) If JDT is dated, timed, and zoned, then TIMESTAMP (FSP) WITH TIME ZONE.
 ii) If JDT is dated, timed, and not zoned, then
 TIMESTAMP (FSP) WITHOUT TIME ZONE.
 iii) If JDT is timed and zoned, then TIME (FSP) WITH TIME ZONE.
 iv) If JDT is timed and not zoned, then TIME (FSP) WITHOUT TIME ZONE.
 v) If JDT is dated but not timed and not zoned, then DATE.
  ...


(RR/ specifiers explanation)

9.43 Converting a formatted character string to a datetime
 ...
  5) Let NOW be the value of CURRENT_TIMESTAMP.
 Let CY be the YEAR field of NOW.
 Let CYLIT be an  of four s whose value is CY.
 Let CM be the MONTH field of NOW.
 Let CMLIT be an  of two s whose value is CM.

  6) Case:
 a) If CT contains a  YY, then:
i) Let YYPOS be an  whose value is the regular
   

Re: Condition variable live lock

2018-01-07 Thread Tom Lane
I wrote:
> Actually ... perhaps a better design would be to have
> ConditionVariable[PrepareTo]Sleep auto-cancel any prepared sleep for
> a different condition variable, analogously to what we just did in
> ConditionVariableBroadcast, on the same theory that whenever control
> returns to the other CV wait loop it can re-establish the relevant
> state easily enough.  I have to think that if the use of CVs grows
> much, the existing restriction is going to become untenable anyway,
> so why not just get rid of it?

Concretely, as per attached.

regards, tom lane

diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c
index 0b9d676..b01864c 100644
--- a/src/backend/storage/lmgr/condition_variable.c
+++ b/src/backend/storage/lmgr/condition_variable.c
@@ -50,10 +50,6 @@ ConditionVariableInit(ConditionVariable *cv)
  * However, if the first test of the exit condition is likely to succeed,
  * it's more efficient to omit the ConditionVariablePrepareToSleep call.
  * See comments in ConditionVariableSleep for more detail.
- *
- * Only one condition variable can be used at a time, ie,
- * ConditionVariableCancelSleep must be called before any attempt is made
- * to sleep on a different condition variable.
  */
 void
 ConditionVariablePrepareToSleep(ConditionVariable *cv)
@@ -76,10 +72,14 @@ ConditionVariablePrepareToSleep(ConditionVariable *cv)
 	}
 
 	/*
-	 * It's not legal to prepare a sleep until the previous sleep has been
-	 * completed or canceled.
+	 * If some other sleep is already prepared, cancel it; this is necessary
+	 * because we have just one static variable tracking the prepared sleep.
+	 * It's okay to do this because whenever control does return to the other
+	 * test-and-sleep loop, its ConditionVariableSleep call will just
+	 * re-establish that sleep as the prepared one.
 	 */
-	Assert(cv_sleep_target == NULL);
+	if (cv_sleep_target != NULL)
+		ConditionVariableCancelSleep();
 
 	/* Record the condition variable on which we will sleep. */
 	cv_sleep_target = cv;
@@ -128,16 +128,16 @@ ConditionVariableSleep(ConditionVariable *cv, uint32 wait_event_info)
 	 * recommended because it avoids manipulations of the wait list, or not
 	 * met initially, in which case preparing first is better because it
 	 * avoids one extra test of the exit condition.
+	 *
+	 * If we are currently prepared to sleep on some other CV, we just cancel
+	 * that and prepare this one; see ConditionVariablePrepareToSleep.
 	 */
-	if (cv_sleep_target == NULL)
+	if (cv_sleep_target != cv)
 	{
 		ConditionVariablePrepareToSleep(cv);
 		return;
 	}
 
-	/* Any earlier condition variable sleep must have been canceled. */
-	Assert(cv_sleep_target == cv);
-
 	do
 	{
 		CHECK_FOR_INTERRUPTS();
diff --git a/src/include/storage/condition_variable.h b/src/include/storage/condition_variable.h
index 7dac477..32e645c 100644
--- a/src/include/storage/condition_variable.h
+++ b/src/include/storage/condition_variable.h
@@ -40,9 +40,7 @@ extern void ConditionVariableInit(ConditionVariable *cv);
  * ConditionVariableSleep.  Spurious wakeups are possible, but should be
  * infrequent.  After exiting the loop, ConditionVariableCancelSleep must
  * be called to ensure that the process is no longer in the wait list for
- * the condition variable.  Only one condition variable can be used at a
- * time, ie, ConditionVariableCancelSleep must be called before any attempt
- * is made to sleep on a different condition variable.
+ * the condition variable.
  */
 extern void ConditionVariableSleep(ConditionVariable *cv, uint32 wait_event_info);
 extern void ConditionVariableCancelSleep(void);


Re: [HACKERS] [PATCH] Generic type subscripting

2018-01-07 Thread Tom Lane
Andres Freund  writes:
> On 2018-01-07 17:39:00 -0500, Tom Lane wrote:
>> Now on the other hand, maybe the right way to go is to embrace a similar
>> approach to what I did for plpgsql param eval, and let the datatype
>> control what gets generated as the expression execution step.

> I'll note that I'm not convinced that the goal this paragraph states and
> having the datatype control the entire expression step are full
> dependent on each other. It seems quite possible to have
> ExecInitSubscriptingRef() call a datatype specific function that returns
> C callbacks.

Yeah, that's a good point.  We could define the compile support function
as simply being allowed to examine the expression tree and give back
the address of the callback function to use, with the rest of the compiled
expression structure being predetermined.  The more general approach would
only be useful if you imagine some sort of high-performance extension that
wants to compile specialized expr steps for its subscripting activity.
Probably the need for that is pretty far away yet.

BTW, if you wanted to change the way plpgsql param callbacks work to be
like this design (ie push the actual generation of the ExprStep back into
core, with plpgsql just supplying a callback address), I wouldn't object.

regards, tom lane



Re: [HACKERS] [PATCH] Generic type subscripting

2018-01-07 Thread Andres Freund
Hi,

Tom pointed me towards this thread. I've not followed the topic, so
I might miss a bit of context while commenting on expression eval
related bits...

On 2018-01-07 17:39:00 -0500, Tom Lane wrote:
> On the executor side of things, I suspect Andres will be unhappy that
> you are making ExprEvalStep part of the API for datatypes --- he
> objected to my exposing it to plpgsql param eval in
> https://postgr.es/m/20171220174243.n4y3hgzf7xd3m...@alap3.anarazel.de
> and there was a lot more reason to do so there than there is here,
> IMO.

Indeed.


> It looks like what you actually need is just the SubscriptingRefState and
> an isnull flag, so it might be better to specify that the fetch and assign
> functions have signatures like
>   Datum fetch(Datum val, SubscriptingRefState *state, bool *isnull)
> (representing both of the last two arguments as INTERNAL at SQL level).

That'd definitely be better.


> Now on the other hand, maybe the right way to go is to embrace a similar
> approach to what I did for plpgsql param eval, and let the datatype
> control what gets generated as the expression execution step.  The main
> point here would be to let the datatype provide the address of a callback
> function that gets executed for a subscripting step, rather than having it
> specify the OID of a pg_proc entry to call.  There would be two big wins
> from that:
> 
> * The callback function would have a plain C call signature, so we would
> not have to go through FunctionCallN, saving a few cycles.  This is
> attractive because it would pretty much eliminate any concern about this
> patch making array access slower at execution time.

I'll note that I'm not convinced that the goal this paragraph states and
having the datatype control the entire expression step are full
dependent on each other. It seems quite possible to have
ExecInitSubscriptingRef() call a datatype specific function that returns
C callbacks.

> The two disadvantages I can see of approaching things this way are:
> 
> * There'd be at least some connection of subscriptable types to
> expression compilation, which is what Andres was objecting to in the
> message I cited above.  Possibly we could alleviate that a bit by
> providing helper functions that mask exactly what goes into the
> expression step structs, but I'm not sure that that gets us far.

Yea, I'm not the greatest fan of that. With plpgsql it's at least
something in-core that's exposed, but I suspect the subscripotion
interface will get used outside of core, and I really want to whack some
of the expression stuff around some more.


> Actually, we could make it *exactly* like that, and have the
> registered handler give back a struct full of function pointers rather
> than doing anything much itself.  Maybe that's an even better way to
> go.

I'd definitely advocate for that.

Greetings,

Andres Freund



Re: Intermittent crashes on brolga in join.sql test

2018-01-07 Thread Andrew Dunstan


On 01/07/2018 03:51 PM, Thomas Munro wrote:
> Hi Andrew,
>
> As mentioned over here, but probably lost in the noise of that thread:
>
> https://www.postgresql.org/message-id/CAEepm%3D2r1svRUfeR1c9Z4UegkPdMw4gxyok4jCWf-h7kdqHHAA%40mail.gmail.com
>
> ... there is an intermittent crash in my new Parallel Hash code that
> occurs only on your 32 bit Windows XP + Cygwin animal "brolga".
> Unfortunately I haven't managed to come up with a theory to explain
> this using only brain power and tea.
>
> Would you mind trying to get a backtrace from that system?  Either
> from a regular make check failure, or perhaps it can be provoked by
> running the attached script?  I don't know anything about Cygwin
> myself but I see that it's possible to get a core file that gdb can
> load with "dumper":
>
> https://stackoverflow.com/questions/320001/using-a-stackdump-from-cygwin-executable
> https://cygwin.com/cygwin-ug-net/dumper.html
>
> Thanks,
>



This machine is now very old, and doesn't have gdb installed. The
chances of my getting a compatible one now are close to zero.

A better bet might be for me to revive my other 32 bit Cygwin instance,
cockatiel.

I'll look into that.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Expression based aggregate transition / combine function invocation

2018-01-07 Thread Andres Freund

> Here's a considerably polished variant of this patch.

And for realz.
>From 3dbc5e9f88ec7d1958b170fa890b7d3d136f709e Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 7 Jan 2018 14:42:04 -0800
Subject: [PATCH] Expression evaluatation based agg transition invocation.

Previously aggregate transition and combination functions were invoked
by special case code in nodeAgg.c, evaluting input and filters
separately using the expression evaluation machinery. That turns out
to not be great for performance for several reasons:

- repeated expression evaluations have some cost
- the transition functions invocations are poorly predicted, as
  commonly there are multiple aggregates in a query, resulting in the
  same callstack invoking different functions.
- filter and input computation had to be done separately
- the special case code made it hard to implement JITing of the whole
  transition function invocation

Address this by building one large expression that computes input,
evaluates filters, and invokes transition functions.

This leads to moderate speedups in queries bottlenecked by aggregate
computations, and enables large speedups for similar cases once JITing
is done.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3...@alap3.anarazel.de
---
 src/backend/executor/execExpr.c   | 426 -
 src/backend/executor/execExprInterp.c | 362 +-
 src/backend/executor/nodeAgg.c| 867 --
 src/include/executor/execExpr.h   |  79 +++-
 src/include/executor/executor.h   |   4 +-
 src/include/executor/nodeAgg.h| 285 +++
 src/include/nodes/execnodes.h |   5 +-
 7 files changed, 1235 insertions(+), 793 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 16f908037c8..73fcd53147f 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -43,6 +43,7 @@
 #include "optimizer/planner.h"
 #include "pgstat.h"
 #include "utils/builtins.h"
+#include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/typcache.h"
 
@@ -61,6 +62,7 @@ static void ExecInitFunc(ExprEvalStep *scratch, Expr *node, List *args,
 			 Oid funcid, Oid inputcollid,
 			 ExprState *state);
 static void ExecInitExprSlots(ExprState *state, Node *node);
+static void ExecPushExprSlots(ExprState *state, LastAttnumInfo *info);
 static bool get_last_attnums_walker(Node *node, LastAttnumInfo *info);
 static void ExecInitWholeRowVar(ExprEvalStep *scratch, Var *variable,
 	ExprState *state);
@@ -71,6 +73,10 @@ static bool isAssignmentIndirectionExpr(Expr *expr);
 static void ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
 	   ExprState *state,
 	   Datum *resv, bool *resnull);
+static void ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
+	  ExprEvalStep *scratch,
+	  FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
+	  int transno, int setno, int setoff, bool ishash);
 
 
 /*
@@ -2250,30 +2256,42 @@ static void
 ExecInitExprSlots(ExprState *state, Node *node)
 {
 	LastAttnumInfo info = {0, 0, 0};
-	ExprEvalStep scratch;
 
 	/*
 	 * Figure out which attributes we're going to need.
 	 */
 	get_last_attnums_walker(node, );
 
+	ExecPushExprSlots(state, );
+}
+
+/*
+ * Add steps deforming the ExprState's inner/out/scan slots as much as
+ * indicated by info. This is useful when building an ExprState covering more
+ * than one expression.
+ */
+static void
+ExecPushExprSlots(ExprState *state, LastAttnumInfo *info)
+{
+	ExprEvalStep scratch;
+
 	/* Emit steps as needed */
-	if (info.last_inner > 0)
+	if (info->last_inner > 0)
 	{
 		scratch.opcode = EEOP_INNER_FETCHSOME;
-		scratch.d.fetch.last_var = info.last_inner;
+		scratch.d.fetch.last_var = info->last_inner;
 		ExprEvalPushStep(state, );
 	}
-	if (info.last_outer > 0)
+	if (info->last_outer > 0)
 	{
 		scratch.opcode = EEOP_OUTER_FETCHSOME;
-		scratch.d.fetch.last_var = info.last_outer;
+		scratch.d.fetch.last_var = info->last_outer;
 		ExprEvalPushStep(state, );
 	}
-	if (info.last_scan > 0)
+	if (info->last_scan > 0)
 	{
 		scratch.opcode = EEOP_SCAN_FETCHSOME;
-		scratch.d.fetch.last_var = info.last_scan;
+		scratch.d.fetch.last_var = info->last_scan;
 		ExprEvalPushStep(state, );
 	}
 }
@@ -2775,3 +2793,397 @@ ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
 		}
 	}
 }
+
+/*
+ * Build transition/combine function invocations for all aggregate transition
+ * / combination function invocations in a grouping sets phase. This has to
+ * invoke all sort based transitions in a phase (if doSort is true), all hash
+ * based transitions (if doHash is true), or both (both true).
+ *
+ * The resulting expression will, for each set of transition values, first
+ * check for filters, evaluate aggregate input, check that that input is not
+ * NULL for a strict transition function, and then finally invoke the
+ * 

Re: Expression based aggregate transition / combine function invocation

2018-01-07 Thread Andres Freund
Hi,

On 2017-11-27 16:31:21 -0800, Andres Freund wrote:
> this is part of my work to make expression evaluation JITable. In a lot
> of analytics queries the major bottleneck is transition function
> invocation (makes sense, hardly anyone wants to see billions of
> rows). Therefore for JITing to be really valuable transition function
> stuff needs to be JITable.
>
> Excerpt from the preliminary commit message:
>
>   Previously aggregate transition and combination functions were invoked
>   by special case code in nodeAgg.c, evaluting input and filters
>   separately using the expression evaluation machinery. That turns out
>   to not be great for performance for several reasons:
>   - repeated expression evaluations have some cost
>   - the transition functions invocations are poorly predicted
>   - filter and input computation had to be done separately
>   - the special case code made it hard to implement JITing of the whole
> transition function invocation
>
>   Address this by building one large expression that computes input,
>   evaluates filters, and invokes transition functions.
>
>   This leads to moderate speedups in queries bottlenecked by aggregate
>   computations, and enables large speedups for similar cases once JITing
>   is done.

> While this gets rid of a substantial amount of duplication between the
> infrastructure for transition and combine functions, it still increases
> codesize a bit.

There's still two callers of advance_transition_function() left, namely
process_ordered_aggregate_{single,multi}. Rearchitecting this so they
also go through expression-ified transition invocation seems like
material for a separate patch, this is complicated enough...


> Todo / open Questions:
> - Location of transition function building functions. Currently they're
>   in execExpr.c. That allows not to expose a bunch of functions local to
>   it, but requires exposing some aggregate structs to the world. We
>   could go the other way round as well.

I've left this as is.


> - Right now we waste a bunch of time by having to access transition
>   states indexed by both grouping set number and the transition state
>   offset therein. It'd be nicer if we could cheaply reduce the number of
>   indirections, but I can't quite see how without adding additional
>   complications.

I've left this as is.


Here's a considerably polished variant of this patch. I plan to do
another round of polishing next week, and then push it, unless somebody
else has comments.

Regards,

Andres



Re: [HACKERS] [PATCH] Generic type subscripting

2018-01-07 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
>> On 4 January 2018 at 03:05, Tom Lane  wrote:
>> I wonder what happened to the plan to separate assignment and fetch into two
>> different node types. I can see that that didn't happen so far as primnodes.h
>> is concerned, but you've made some changes that seem to assume it did happen.

> There was one version of this patch that followed this plan. It turns out that
> it's quite unnatural approach (at least within the current implementation),
> because I had to duplicate or reuse a lot of code for those two node types.

I'm not entirely convinced by that argument.  Sure, there would be a lot of
duplicative code in the node support functions, but that's inherent in our
approach to those functions.  I'm more concerned about readability,
especially given that exposing this feature to extensions is going to set
all these decisions in concrete forever.

> Here is a new rebased version of the patch
> with incorporated improvements that you've mentioned.

I spent a couple of hours looking through this.  I'm still pretty unhappy
with the state of the parser APIs.  In the first place, you haven't
documented what those APIs are in any meaningful way.  I do not think it's
adequate to tell people to go read array_subscript_parse as the first and
only way to understand what a subscript parse function must do.  We do
often expect extension authors to pick up small details that way, but
there should be a textual API spec of some sort --- for example, look at
the index AM API specs in indexam.sgml, which give pretty clear high-level
definitions of what each AM API function has to do.

Part of the reason why I'm insistent on that is that I think it will
expose that the division of labor between the core parser and the
datatype-specific parse function is still a mess.  One particular sore
spot is the question of who decides what the return data type of a
subscripting function is.  Right now you seem to be making that decision
in the core parser, at least for the assignment case, which is pretty
bad from an extensibility standpoint and also leads to all of this
messiness:

* You changed transformArrayType so that it doesn't throw an error if
the given type isn't an array --- without, I note, updating either the
function header comment or the internal comment that you falsified.

* That in turn means that where transformAssignmentSubscripts thinks
it's determining the result type, it may or may not be producing
InvalidOid instead (again, with no comment to warn the reader).

* And then you had to kludge transformAssignmentIndirection horribly
(and I'm not sure you kludged it enough, either, because there are
still a bunch of places where it uses targetTypeId without any concern
for the possibility that that's zero).  It doesn't seem to me to be
acceptable to just ignore coercion failure, as that code does now.
If it's no longer the responsibility of this function to guarantee
that the result is of the right type, why is it attempting coercion
at all?  In any case you failed to update its header comment to
explain what it's doing differently than before.

In short the division of labor in this area still needs to be thought
about.  I don't think we really want transformAssignmentSubscripts
determining the required input data type at all; that should be
farmed out to the datatype-specific code.  I'm also pretty unconvinced
about refnestedfunc --- why do we need that?

I also notice that you still haven't done anything about the problem
of the subscripting operation possibly yielding a different typmod
or collation than the container type has.  It was okay to let that
go for awhile, but we're not shipping it like this, because it's
going to be awfully hard to change that struct type once extensions
are building them.

While I'm on the topic, I am not really happy with s/array/container/
as you've done in some of this code.  To my mind, "container type"
includes composite types.  Particularly in the parse_target code, where
we're currently dealing with either composites or arrays, making it say
that we're dealing with either composites or containers is just a recipe
for confusion.  Unfortunately I can't think of a better word offhand,
but some attention to this is needed.  As far as the comments go,
we might be able to use the term "subscriptable type", but not sure if
that will work for function/variable names.

On the executor side of things, I suspect Andres will be unhappy that
you are making ExprEvalStep part of the API for datatypes --- he
objected to my exposing it to plpgsql param eval in
https://postgr.es/m/20171220174243.n4y3hgzf7xd3m...@alap3.anarazel.de
and there was a lot more reason to do so there than there is here, IMO.
It looks like what you actually need is just the SubscriptingRefState and
an isnull flag, so it might be better to specify that the fetch and assign
functions have signatures like
Datum fetch(Datum val, 

Re: proposal: alternative psql commands quit and exit

2018-01-07 Thread Everaldo Canuto
Change my patch will make psql different from other sql clients I use
(sqlplus and mysql).

I am happy with my custom version of psql so you can cancel/delete/ignore
my patch.


On Sun, Jan 7, 2018 at 12:36 AM, Tom Lane  wrote:

> Everaldo Canuto  writes:
> > Can we proceed with current behavior and change some day when we have a
> > consensus of a better way to handle this and then change quit/exit and
> help
> > together?
>
> I think it's quite clear that there is *not* a consensus for your
> patch as submitted.  Please adjust it as per this discussion.
>
> regards, tom lane
>



-- 
Everaldo Canuto
+55 11 953-908-387 | everaldo.can...@gmail.com

01100101 01110110 01100101 01110010
0111 01101100 01100100 0110


Intermittent crashes on brolga in join.sql test

2018-01-07 Thread Thomas Munro
Hi Andrew,

As mentioned over here, but probably lost in the noise of that thread:

https://www.postgresql.org/message-id/CAEepm%3D2r1svRUfeR1c9Z4UegkPdMw4gxyok4jCWf-h7kdqHHAA%40mail.gmail.com

... there is an intermittent crash in my new Parallel Hash code that
occurs only on your 32 bit Windows XP + Cygwin animal "brolga".
Unfortunately I haven't managed to come up with a theory to explain
this using only brain power and tea.

Would you mind trying to get a backtrace from that system?  Either
from a regular make check failure, or perhaps it can be provoked by
running the attached script?  I don't know anything about Cygwin
myself but I see that it's possible to get a core file that gdb can
load with "dumper":

https://stackoverflow.com/questions/320001/using-a-stackdump-from-cygwin-executable
https://cygwin.com/cygwin-ug-net/dumper.html

Thanks,

-- 
Thomas Munro
http://www.enterprisedb.com


lots-of-parallel-hash-rescans.sh
Description: Bourne shell script


Re: jsonpath

2018-01-07 Thread Pavel Stehule
2018-01-07 21:20 GMT+01:00 Andrew Dunstan :

>
>
> On 01/07/2018 03:16 PM, Pavel Stehule wrote:
> >
> >
> > 2018-01-07 21:14 GMT+01:00 Andrew Dunstan
> >  >>:
> >
> >
> >
> > On 01/07/2018 03:00 PM, Pavel Stehule wrote:
> > >
> > >
> > > 2018-01-07 20:31 GMT+01:00 Andrew Dunstan
> > >  > 
> >  > >>:
> > >
> > >
> > > Next up in the proposed SQL/JSON feature set I will review
> > the three
> > > jsonpath patches. These are attached, extracted from the
> > patch set
> > > Nikita posted on Jan 2nd.
> > >
> > >
> > > Note that they depend on the TZH/TZM patch (see separate email
> > > thread).
> > > I'd like to be able to commit that patch pretty soon.
> > >
> > >
> > > I did few tests - and it is working very well.
> > >
> > >
> >
> >
> > You mean on the revised TZH/TZM patch that I posted?
> >
> >
> > no -  I tested jsonpath implementation
> >
> >
>
>
> OK. You can help by also reviewing and testing the small patch at
>  02b0-41e50a1c3023%402ndQuadrant.com>
>

I'll look there

Regards

Pavel


> thanks
>
> cheers
>
> andrew
>
> --
> Andrew Dunstanhttps://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>


Re: jsonpath

2018-01-07 Thread Andrew Dunstan


On 01/07/2018 03:00 PM, Pavel Stehule wrote:
>
>
> 2018-01-07 20:31 GMT+01:00 Andrew Dunstan
> >:
>
>
> Next up in the proposed SQL/JSON feature set I will review the three
> jsonpath patches. These are attached, extracted from the patch set
> Nikita posted on Jan 2nd.
>
>
> Note that they depend on the TZH/TZM patch (see separate email
> thread).
> I'd like to be able to commit that patch pretty soon.
>
>
> I did few tests - and it is working very well.
>
>


You mean on the revised TZH/TZM patch that I posted?

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: jsonpath

2018-01-07 Thread Pavel Stehule
2018-01-07 20:31 GMT+01:00 Andrew Dunstan :

>
> Next up in the proposed SQL/JSON feature set I will review the three
> jsonpath patches. These are attached, extracted from the patch set
> Nikita posted on Jan 2nd.
>
>
> Note that they depend on the TZH/TZM patch (see separate email thread).
> I'd like to be able to commit that patch pretty soon.
>

I did few tests - and it is working very well.

regards

Pavel


>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstanhttps://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>


Re: [PROPOSAL] Shared Ispell dictionaries

2018-01-07 Thread Arthur Zakirov
On Sun, Dec 31, 2017 at 06:28:13PM +0300, Arthur Zakirov wrote:
> 
> There are issues to do:
> - add the GUC-variable for hash table limit
> - fix bugs
> - improve comments
> - performance testing
> 

Here is the second version of the patch.

0002-Retreive-shmem-location-for-ispell-v2.patch:

Fixed some bugs and added the GUC variable "shared_dictionaries".

Added documentation for it. I'm not sure about the order of configuration 
parameters in section "19.4.1.
Memory". Now "shared_dictionaries" goes after "shared_buffers". Maybe it
will be good to make a patch wich will sort parameters in alphabetical
order?

0003-Store-ispell-structures-in-shmem-v2.patch:

Fixed some bugs, regression tests pass now. I added more comments
and fixed old. I also tested with Hunspell dictionaries [1]. They are
good too.

Results of performance testing of Ispell and Hunspell dictionaries will
be ready soon.


1 - github.com/postgrespro/hunspell_dicts

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index b9fdd77e19..25614f2d31 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -498,7 +498,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char 
*flag)
Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + 
strlen(word) + 1);
strcpy(Conf->Spell[Conf->nspell]->word, word);
Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0')
-   ? cpstrdup(Conf, flag) : VoidString;
+   ? MemoryContextStrdup(Conf->buildCxt, flag) : VoidString;
Conf->nspell++;
 }
 
@@ -1040,7 +1040,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, 
CompoundAffixFlag *entry,
entry->flag.i = i;
}
else
-   entry->flag.s = cpstrdup(Conf, s);
+   entry->flag.s = MemoryContextStrdup(Conf->buildCxt, s);
 
entry->flagMode = Conf->flagMode;
entry->value = val;
@@ -1536,6 +1536,9 @@ nextline:
return;
 
 isnewformat:
+   pfree(recoded);
+   pfree(pstr);
+
if (oldformat)
ereport(ERROR,
(errcode(ERRCODE_CONFIG_FILE_ERROR),
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e4a01699e4..858423354e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1355,6 +1355,36 @@ include_dir 'conf.d'
   
  
 
+ 
+  shared_dictionaries (integer)
+  
+   shared_dictionaries configuration 
parameter
+  
+  
+  
+   
+Sets the maximum number of text search dictionaries loaded into shared
+memory.  The default is 10 dictionaries.
+   
+
+   
+Currently controls only loading of Ispell
+dictionaries (see ).
+After compiling the dictionary it will be copied into shared memory.
+Another backends on first use of the dictionary will use it from shared
+memory, so it doesn't need to compile the dictionary second time.
+DictFile and AffFile are used to
+search the dictionary in shared memory.
+   
+
+   
+If the number of simultaneously loaded dictionaries reaches the maximum
+allowed number then a new dictionary will be loaded into local memory 
of
+a backend.
+   
+  
+ 
+
  
   huge_pages (enum)
   
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..c7dce8cac5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -44,6 +44,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
+#include "tsearch/ts_shared.h"
 #include "utils/backend_random.h"
 #include "utils/snapmgr.h"
 
@@ -150,6 +151,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
size = add_size(size, SyncScanShmemSize());
size = add_size(size, AsyncShmemSize());
size = add_size(size, BackendRandomShmemSize());
+   size = add_size(size, TsearchShmemSize());
 #ifdef EXEC_BACKEND
size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -271,6 +273,11 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
AsyncShmemInit();
BackendRandomShmemInit();
 
+   /*
+* Set up shared memory to tsearch
+*/
+   TsearchShmemInit();
+
 #ifdef EXEC_BACKEND
 
/*
diff --git a/src/backend/storage/lmgr/lwlock.c 
b/src/backend/storage/lmgr/lwlock.c
index 71caac1a1f..2446db7266 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -520,6 +520,7 @@ RegisterLWLockTranches(void)
  "shared_tuplestore");
LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
+   

Re: WIP: BRIN multi-range indexes

2018-01-07 Thread Tomas Vondra


On 12/20/2017 03:37 AM, Mark Dilger wrote:
> 
>> On Dec 19, 2017, at 5:16 PM, Tomas Vondra  
>> wrote:
>>
>>
>>
>> On 12/19/2017 08:38 PM, Mark Dilger wrote:
>>>
 On Nov 18, 2017, at 12:45 PM, Tomas Vondra  
 wrote:

 Hi,

 Apparently there was some minor breakage due to duplicate OIDs, so here
 is the patch series updated to current master.

 regards

 -- 
 Tomas Vondra  http://www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
 <0001-Pass-all-keys-to-BRIN-consistent-function-at-once.patch.gz><0002-BRIN-bloom-indexes.patch.gz><0003-BRIN-multi-range-minmax-indexes.patch.gz><0004-Move-IS-NOT-NULL-checks-to-bringetbitmap.patch.gz>
>>>
>>>
>>> After applying these four patches to my copy of master, the regression
>>> tests fail for F_SATISFIES_HASH_PARTITION 5028 as attached.
>>>
>>
>> D'oh! There was an incorrect OID referenced in pg_opclass, which was
>> also used by the satisfies_hash_partition() function. Fixed patches
>> attached.
> 
> Your use of type ScanKey in src/backend/access/brin/brin.c is a bit 
> confusing.  A
> ScanKey is defined elsewhere as a pointer to ScanKeyData.  When you define
> an array of ScanKeys, you use pointer-to-pointer style:
> 
> +   ScanKey   **keys,
> +  **nullkeys;
> 
> But when you allocate space for the array, you don't treat it that way:
> 
> +   keys = palloc0(sizeof(ScanKey) * bdesc->bd_tupdesc->natts);
> +   nullkeys = palloc0(sizeof(ScanKey) * bdesc->bd_tupdesc->natts);
> 
> But then again when you use nullkeys, you treat it as a two-dimensional array:
> 
> +   nullkeys[keyattno - 1][nnullkeys[keyattno - 1]] = key;
> 
> and likewise when you allocate space within keys:
> 
> +keys[keyattno - 1] = palloc0(sizeof(ScanKey) * scan->numberOfKeys);
> 
> Could you please clarify?  I have been awake a bit too long; hopefully, I am
> not merely missing the obvious.
> 

Yeah, that's wrong - it should be "sizeof(ScanKey *)" instead. It's
harmless, though, because ScanKey itself is a pointer, so the size is
the same.

Attached is an updated version of the patch series, significantly
reworking and improving the multi-minmax part (the rest of the patch is
mostly as it was before).

I've significantly refactored and cleaned up the multi-minmax part, and
I'm much happier with it - no doubt there's room for further improvement
but overall it's much better.

I've also added proper sgml docs for this part, and support for more
data types including variable-length ones (all integer types, numeric,
float-based types including timestamps, uuid, and a couple of others).

At the API level, I needed to add one extra support procedure that
measures distance between two values of the data type. This is needed so
because we only keep a limited number of intervals for each range, and
once in a while we need to decide which of them to "merge" (and we
simply merge the closest ones).

I've passed the indexes through significant testing and fixed a couple
of silly bugs / memory leaks. Let's see if there are more.

Performance-wise, the CREATE INDEX seems a bit slow - it's about an
order of magnitude slower than regular BRIN. Some of that is expected
(we simply need to do more stuff to maintain multiple ranges), but
perhaps there's room for additional improvements - that's something I'd
like to work on next.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Pass-all-keys-to-BRIN-consistent-function-at-once.patch.gz
Description: application/gzip


0002-BRIN-bloom-indexes.patch.gz
Description: application/gzip


0003-BRIN-multi-range-minmax-indexes.patch.gz
Description: application/gzip


0004-Move-IS-NOT-NULL-checks-to-bringetbitmap.patch.gz
Description: application/gzip


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2018-01-07 Thread Masahiko Sawada
On Fri, Jan 5, 2018 at 1:39 AM, Robert Haas  wrote:
> On Tue, Jan 2, 2018 at 1:09 AM, Mithun Cy  wrote:
>> So in case of  N_RELEXTLOCK_ENTS = 1 we can see regression as high 25%. ?

Thank you for the performance measurement!

> So now the question is: what do these results mean for this patch?
>
> I think that the chances of someone simultaneously bulk-loading 16 or
> more relations that all happen to hash to the same relation extension
> lock bucket is pretty darn small.  Most people aren't going to be
> running 16 bulk loads at the same time in the first place, and if they
> are, then there's a good chance that at least some of those loads are
> either actually to the same relation, or that many or all of the loads
> are targeting the same filesystem and the bottleneck will occur at
> that level, or that the loads are to relations which hash to different
> buckets.  Now, if we want to reduce the chances of hash collisions, we
> could boost the default value of N_RELEXTLOCK_ENTS to 2048 or 4096.
>
> However, if we take the position that no hash collision probability is
> low enough and that we must eliminate all chance of false collisions,
> except perhaps when the table is full, then we have to make this
> locking mechanism a whole lot more complicated.  We can no longer
> compute the location of the lock we need without first taking some
> other kind of lock that protects the mapping from {db_oid, rel_oid} ->
> {memory address of the relevant lock}.  We can no longer cache the
> location where we found the lock last time so that we can retake it.
> If we do that, we're adding extra cycles and extra atomics and extra
> code that can harbor bugs to every relation extension to guard against
> something which I'm not sure is really going to happen.  Something
> that's 3-8% faster in a case that occurs all the time and as much as
> 25% slower in a case that virtually never arises seems like it might
> be a win overall.
>
> However, it's quite possible that I'm not seeing the whole picture
> here.  Thoughts?
>

I agree that the chances of the case where through-put got worse is
pretty small and we can get performance improvement in common cases.
Also, we could mistakenly overestimate the number of blocks we need to
add by false collisions. Thereby the performance might got worse and
we extend a relation more than necessary but I think the chances are
small. Considering the further parallel operations (e.g. parallel
loading, parallel index creation etc) multiple processes will be
taking a relext lock of the same relation. Thinking of that, the
benefit of this patch that improves the speeds of acquiring/releasing
the lock would be effective.

In short I personally think the current patch is simple and the result
is not a bad. But If community cannot accept these degradations we
have to deal with the problem. For example, we could make the length
of relext lock array configurable by users. That way, users can reduce
the possibility of collisions. Or we could improve the relext lock
manager to eliminate false collision by changing it to a
open-addressing hash table. The code would get complex but false
collisions don't happen unless the array is not full.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2018-01-07 Thread Michael Paquier
On Sun, Jan 7, 2018 at 9:38 AM, Tom Lane  wrote:
> ISTM that if this patch gets rid of a large fraction of the uses of
> pg_strcasecmp in the backend, which is what I expect it should, then
> it might not be out of reach to go through all the surviving ones to
> make sure they are not processing strings that originate in the parser.

Yeah, that's why I think that it is important for this patch to
provide as well tests to make sure that all the code paths are working
as they should with this patch.
-- 
Michael



Re: [HACKERS] Runtime Partition Pruning

2018-01-07 Thread David Rowley
On 7 January 2018 at 00:03, David Rowley  wrote:
> I've fixed this in the attached, but I did so by calling
> adjust_appendrel_attrs() from the nodeAppend.c, which did, of course,
> mean that the AppendRelInfo needed to be given to the executor. I was
> also a bit unsure what exactly I should be doing in primnodes.h, since
> I've put PartitionPruneInfo in there, but AppendRelInfo is not. I
> stuck a quick declaration of AppendRelInfo in primnode.h with an XXX
> comment so we don't forget to think about that again.

Actually, this was not a very smart fix for the problem. It seems much
better to make the prune qual part of PartitionPruneInfo and just have
the planner translate the qual to what's required for the partition
that the PartitionPruneInfo belongs to. This means we no longer need
to use the Append's qual to store the prune qual and that all the
pruning information for one partition is now neatly in a single
struct.

I've attached a patch which does things like this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


runtime_prune_drowley_v7.patch
Description: Binary data


Re: MCV lists for highly skewed distributions

2018-01-07 Thread John Naylor
I wrote:

> I'll be travelling for a few days, but I'll do some testing on some
> data sets soon.

I've attached some preliminary test methods and results. Probably not
detailed or rigorous enough, but I wanted to share this much before
digging deeper. I created some test data, ran analyze a few times and
eyeballed the results, of which I give a typical example. I used a low
stat target to make it easier to see the general picture. Suggestions
welcome.

-John Naylor


test_analyze_highly_skewed_v1.sql
Description: application/sql


Re: [HACKERS] Replication status in logical replication

2018-01-07 Thread Simon Riggs
On 26 December 2017 at 00:26, Masahiko Sawada  wrote:
> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek
>  wrote:
>> On 21/11/17 22:06, Masahiko Sawada wrote:
>>>
>>> After investigation, I found out that my previous patch was wrong
>>> direction. I should have changed XLogSendLogical() so that we can
>>> check the read LSN and set WalSndCaughtUp = true even after read a
>>> record without wait. Attached updated patch passed 'make check-world'.
>>> Please review it.
>>>
>>
>> Hi,
>>
>> This version looks good to me and seems to be in line with what we do in
>> physical replication.
>>
>> Marking as ready for committer.
>>
>
> Thank you for reviewing this patch!

The patch calls this AFTER processing the record
   if (sentPtr >= GetFlushRecPtr())
but it seems better to call GetFlushRecPtr() before we process the
record, otherwise the flush pointer might have moved forwards while we
process the record and it wouldn't catch up. (Physical replication
works like that).

New patch version attached for discussion before commit. (v4)

I'd rather not call it at all at that point though, so if we made
RecentFlushPtr static at the module level rather than within
WalSndWaitForWal we could use it here also. That's a bit more invasive
for backpatching, so not implemented that here.

Overall, I find setting WalSndCaughtUp = false at the top of
XLogSendLogical() to be incredibly ugly and I would like to remove it.
It can't be correct to have a static status variable that oscillates
between false and true with every record. This seems to be done
because of the lack of a logical initialization call. Petr? Peter?
Version with this removed (v4alt2)

I've removed the edit that fusses over English grammar: both ways are correct.

> I think this patch can be
> back-patched to 9.4 as Simon mentioned.

This patch appears to cause this DEBUG1 message

"standby \"%s\" has now caught up with primary"

which probably isn't the right message, but might be OK to backpatch.

Thoughts on better wording?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


logical_repl_caught_up_v4.patch
Description: Binary data


logical_repl_caught_up_v4alt2.patch
Description: Binary data


Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2018-01-07 Thread Simon Riggs
On 30 March 2017 at 13:11, Peter Moser  wrote:
> 2017-03-01 10:56 GMT+01:00 Peter Moser :
>> A similar walkthrough for ALIGN will follow soon.
>>
>> We are thankful for any suggestion or ideas, to be used to write a
>> good SGML documentation.
>
> The attached README explains the ALIGN operation step-by-step with a
> TEMPORAL LEFT OUTER JOIN example. That is, we start from a query
> input, show how we rewrite it during parser stage, and show how the
> final execution generates result tuples.

Sorry, this was too complex for me.

Can we get a much simpler example please?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] plpgsql - additional extra checks

2018-01-07 Thread Pavel Stehule
Hi

2018-01-07 0:59 GMT+01:00 Stephen Frost :

> Greetings Pavel,
>
> * Pavel Stehule (pavel.steh...@gmail.com) wrote:
> > 2017-11-30 3:44 GMT+01:00 Michael Paquier :
> > > At least documentation needs patching, or this is going to generate
> > > warnings on HEAD at compilation. I am moving this to next CF for lack
> > > of reviews, and the status is waiting on author as this needs at least
> > > a couple of doc fixes.
> >
> > fixed doc attached
>
> Looks like this patch should have been in "Needs Review" state, not
> "Waiting for author", since it does apply, build and pass make
> check-world, but as I'm signed up to review it, I'll do so here:
>
> diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
> index 7d23ed437e..efa48bc13c 100644
> --- a/doc/src/sgml/plpgsql.sgml
> +++ b/doc/src/sgml/plpgsql.sgml
> @@ -4963,6 +4963,11 @@ a_output := a_output || $$ if v_$$ ||
> referrer_keys.kind || $$ like '$$
>  so you are advised to test in a separate development environment.
> 
>
> +   
> +The setting plpgsql.extra_warnings to
> all is a
> +good idea in developer or test environments.
> +   
>
> Better language for this would be:
>
> Setting plpgsql.extra_warnings, or
> plpgsql.extra_errors, as appropriate, to
> all is encouraged in development and/or testing
> environments.
>
> @@ -4979,6 +4984,30 @@ a_output := a_output || $$ if v_$$ ||
> referrer_keys.kind || $$ like '$$
>   
>  
> 
> +
> +   
> +strict_multi_assignment
> +
> + 
> +  Some PL/PgSQL commands allows to assign
> a values to
> +  more than one variable. The number of target variables should not be
> +  equal to number of source values. Missing values are replaced by
> NULL
> +  value, spare values are ignored. More times this situation signalize
> +  some error.
> + 
> +
> +   
>
> Better language:
>
> Some PL/PgSQL commands allow assigning values
> to more than one variable at a time, such as SELECT INTO.  Typically,
> the number of target variables and the number of source variables should
> match, though PL/PgSQL will use NULL for
> missing values and extra variables are ignored.  Enabling this check
> will cause PL/PgSQL to throw a WARNING or
> ERROR whenever the number of target variables and the number of source
> variables are different.
>
> +   
> +too_many_rows
> +
> + 
> +  When result is assigned to a variable by INTO
> clause,
> +  checks if query returns more than one row. In this case the
> assignment
> +  is not deterministic usually - and it can be signal some issues in
> design.
> + 
> +
> +   
>
>
> Better language:
>
> Enabling this check will cause PL/PgSQL to
> check if a given query returns more than one row when an
> INTO clause is used.  As an INTO statement will only
> ever use one row, having a query return multiple rows is generally
> either inefficient and/or nondeterministic and therefore is likely an
> error.
>
> @@ -4997,6 +5026,34 @@ WARNING:  variable "f1" shadows a previously
> defined variable
>  LINE 3: f1 int;
>  ^
>  CREATE FUNCTION
> +
> +
> +  The another example shows the effect of plpgsql.extra_
> warnings
> +  set to strict_multi_assignment:
> +
>
> Better language:
>
> The below example shows the effects of setting
> plpgsql.extra_warnings to
> strict_multi_assignment:
>
> diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
> index ec480cb0ba..0879e84cd2 100644
> --- a/src/pl/plpgsql/src/pl_exec.c
> +++ b/src/pl/plpgsql/src/pl_exec.c
> @@ -3629,6 +3629,24 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
> longtcount;
> int rc;
> PLpgSQL_expr *expr = stmt->sqlstmt;
> +   booltoo_many_rows_check;
> +   int too_many_rows_level;
> +
> +   if (plpgsql_extra_errors & PLPGSQL_XCHECK_TOOMANYROWS)
> +   {
> +   too_many_rows_check = true;
> +   too_many_rows_level = ERROR;
> +   }
> +   else if (plpgsql_extra_warnings & PLPGSQL_XCHECK_TOOMANYROWS)
> +   {
> +   too_many_rows_check = true;
> +   too_many_rows_level = WARNING;
> +   }
> +   else
> +   {
> +   too_many_rows_check = false;
> +   too_many_rows_level = NOTICE;
> +   }
>
>
> I'm not sure why we need two variables here- couldn't we simply look at
> too_many_rows_level?  eg: too_many_rows_level >= WARNING ? ...
>
> Not as big a deal, but I would change it to be 'check_too_many_rows' as
> a variable name too.
>
> @@ -3678,7 +3696,7 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
>  */
> if (stmt->into)
> {
> -   if (stmt->strict || stmt->mod_stmt)
> +   if (stmt->strict || stmt->mod_stmt || too_many_rows_check)
> tcount = 2;
> else
> tcount = 1;
>
> The