Re: proposal: alternative psql commands quit and exit

2017-12-07 Thread Sergei Kornilov
Why not just use ctrl+D shortcut? This EOF signal works both in bash, mysql, 
psql, any CLI tool which I remember



Is it possible and worthy to optimize scanRTEForColumn()?

2017-12-07 Thread Rui Hai Jiang
Hello,

When I run a select query, e.g. select id from t,  all columns in table "t" are 
checked to see if a column named "id" exists or not, and a Var is created for 
"id" if the column does exist.

Function scanRTEForColumn() does this job.

But I see in scanRTEForColumn(), the loop does not stop when a match is found, 
it continues to compare all other columns. And this will waste lots of 
computing.


I guess there may be some reasons for this. But I don't know yet.  I just 
wonder if it is possible and worthy to optimize this. And please note, "select 
*" does not call this function.


Node *
scanRTEForColumn(ParseState *pstate, RangeTblEntry *rte, char *colname,
 int location, int fuzzy_rte_penalty,
 FuzzyAttrMatchState *fuzzystate)
{
  foreach(c, rte->eref->colnames)
{
const char *attcolname = strVal(lfirst(c));
attnum++;
if (strcmp(attcolname, colname) == 0)
{
 if (result)
ereport(ERROR,

(errcode(ERRCODE_AMBIGUOUS_COLUMN),
 errmsg("column reference 
\"%s\" is ambiguous",
colname),
 parser_errposition(pstate, 
location)));
var = make_var(pstate, rte, attnum, location);
/* Require read access to the column */
markVarForSelectPriv(pstate, var, rte);
result = (Node *) var;
}

/* Updating fuzzy match state, if provided. */
if (fuzzystate != NULL)
updateFuzzyAttrMatchState(fuzzy_rte_penalty, fuzzystate,
  rte, 
attcolname, colname, attnum);
}

/*
 * If we have a unique match, return it.  Note that this allows a user
 * alias to override a system column name (such as OID) without error.
 */
if (result)
return result;


.
}


Re: Speeding up pg_upgrade

2017-12-07 Thread Bruce Momjian
On Thu, Dec  7, 2017 at 10:37:30AM -0500, Stephen Frost wrote:
> Alexander,
> 
> * Alexander Kukushkin (cyberd...@gmail.com) wrote:
> > Couple of months ago we at Zalando upgraded a few databases of different
> > sizes to 9.6.
> 
> Thanks for sharing your experience!
> 
> > During preparations to the I've found 2.5 pain-points:
> > 
> > 1. We are using schema-based api deployment. Basically ~every week we
> > create a new schema in the database and hundreds of stored procedures in it.
> > Off course we remove old API schemas and trying not to keep more than
> > last 10. Before the upgrade we basically dropped all API schemas except the
> > one used in production.
> > And even in this case dump-restore phase was taking much more time than
> > relinking of datafiles.
> > Unfortunately I don't have any numbers right now, but usually run of
> > pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was
> > spend in dump-restore.
> 
> Ok, so eliminating 2/3 of the time would mean bringing it down to more
> like 10 seconds.  That certainly seems worthwhile to me.  With the
> linking time being much less than the dump/restore, we could at least
> consider moving forward with Bruce's original idea where we do the
> dump/restore while the system is online but then the linking with it
> offline and get a serious performance boost out of it.  That also avoids
> the issue with new files showing up while the system is running that I
> brought up when we were talking about having the linking done with the
> system online.
> 
> > 2 ANALYZE phase is a pain. I think everybody agrees with it.
> > 
> > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
> > reasonable, except one case: some of the columns might have non default
> > statistics target.
> 
> Ok, if the stage-1 is very fast and performance is reasonable enough
> after that then perhaps it's not so bad to keep it as-is for now and
> focus on the dump/restore time.  That said, we should certainly also
> work on improving this too.

I think the big problem with two-stage pg_upgrade is that the user steps
are more complex, so what percentage of users are going use the
two-stage method.  The bad news is that only a small percentage of users
who will benefit from it will use it, and some who will not benefit it
will use it.  Also, this is going to require significant server changes,
which have to be maintained.

I think we need some statistics on how many users are going to benefit
from this, and how are users suppose to figure out if they will benefit
from it?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



plpgsql: remove useless distinctions between record and row cases

2017-12-07 Thread Tom Lane
I've been fooling around with rewriting plpgsql's composite-variable
handling, along the way to getting it to handle domains over composite.
I noticed that there's some really unnecessary complication in places
where it insists on separating "row" variables from "record" variables.
More usually, we'd handle that by using a generically-typed pointer and
then doing a node-type check where it's actually necessary to distinguish;
which it turns out is exactly one place, exec_move_row().  So attached is
a simple patch that eliminates the duplicative coding.

Barring objection, I'd like to push this shortly.

regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index ec480cb..1959d6d 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -272,8 +272,7 @@ static ParamListInfo setup_unshared_param_list(PLpgSQL_execstate *estate,
 		  PLpgSQL_expr *expr);
 static void plpgsql_param_fetch(ParamListInfo params, int paramid);
 static void exec_move_row(PLpgSQL_execstate *estate,
-			  PLpgSQL_rec *rec,
-			  PLpgSQL_row *row,
+			  PLpgSQL_variable *target,
 			  HeapTuple tup, TupleDesc tupdesc);
 static HeapTuple make_tuple_from_row(PLpgSQL_execstate *estate,
 	PLpgSQL_row *row,
@@ -281,8 +280,7 @@ static HeapTuple make_tuple_from_row(PLpgSQL_execstate *estate,
 static HeapTuple get_tuple_from_datum(Datum value);
 static TupleDesc get_tupdesc_from_datum(Datum value);
 static void exec_move_row_from_datum(PLpgSQL_execstate *estate,
-		 PLpgSQL_rec *rec,
-		 PLpgSQL_row *row,
+		 PLpgSQL_variable *target,
 		 Datum value);
 static char *convert_value_to_string(PLpgSQL_execstate *estate,
 		Datum value, Oid valtype);
@@ -425,13 +423,15 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo,
 	if (!fcinfo->argnull[i])
 	{
 		/* Assign row value from composite datum */
-		exec_move_row_from_datum(&estate, NULL, row,
+		exec_move_row_from_datum(&estate,
+ (PLpgSQL_variable *) row,
  fcinfo->arg[i]);
 	}
 	else
 	{
 		/* If arg is null, treat it as an empty row */
-		exec_move_row(&estate, NULL, row, NULL, NULL);
+		exec_move_row(&estate, (PLpgSQL_variable *) row,
+	  NULL, NULL);
 	}
 	/* clean up after exec_move_row() */
 	exec_eval_cleanup(&estate);
@@ -2327,7 +2327,7 @@ exec_stmt_forc(PLpgSQL_execstate *estate, PLpgSQL_stmt_forc *stmt)
 		set_args.sqlstmt = stmt->argquery;
 		set_args.into = true;
 		/* XXX historically this has not been STRICT */
-		set_args.row = (PLpgSQL_row *)
+		set_args.target = (PLpgSQL_variable *)
 			(estate->datums[curvar->cursor_explicit_argrow]);
 
 		if (exec_stmt_execsql(estate, &set_args) != PLPGSQL_RC_OK)
@@ -3755,8 +3755,7 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 	{
 		SPITupleTable *tuptab = SPI_tuptable;
 		uint64		n = SPI_processed;
-		PLpgSQL_rec *rec = NULL;
-		PLpgSQL_row *row = NULL;
+		PLpgSQL_variable *target;
 
 		/* If the statement did not return a tuple table, complain */
 		if (tuptab == NULL)
@@ -3764,13 +3763,8 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 	(errcode(ERRCODE_SYNTAX_ERROR),
 	 errmsg("INTO used with a command that cannot return data")));
 
-		/* Determine if we assign to a record or a row */
-		if (stmt->rec != NULL)
-			rec = (PLpgSQL_rec *) (estate->datums[stmt->rec->dno]);
-		else if (stmt->row != NULL)
-			row = (PLpgSQL_row *) (estate->datums[stmt->row->dno]);
-		else
-			elog(ERROR, "unsupported target");
+		/* Fetch target's datum entry */
+		target = (PLpgSQL_variable *) estate->datums[stmt->target->dno];
 
 		/*
 		 * If SELECT ... INTO specified STRICT, and the query didn't find
@@ -3794,7 +3788,7 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 		 errdetail ? errdetail_internal("parameters: %s", errdetail) : 0));
 			}
 			/* set the target to NULL(s) */
-			exec_move_row(estate, rec, row, NULL, tuptab->tupdesc);
+			exec_move_row(estate, target, NULL, tuptab->tupdesc);
 		}
 		else
 		{
@@ -3813,7 +3807,7 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 		 errdetail ? errdetail_internal("parameters: %s", errdetail) : 0));
 			}
 			/* Put the first result row into the target */
-			exec_move_row(estate, rec, row, tuptab->vals[0], tuptab->tupdesc);
+			exec_move_row(estate, target, tuptab->vals[0], tuptab->tupdesc);
 		}
 
 		/* Clean up */
@@ -3946,8 +3940,7 @@ exec_stmt_dynexecute(PLpgSQL_execstate *estate,
 	{
 		SPITupleTable *tuptab = SPI_tuptable;
 		uint64		n = SPI_processed;
-		PLpgSQL_rec *rec = NULL;
-		PLpgSQL_row *row = NULL;
+		PLpgSQL_variable *target;
 
 		/* If the statement did not return a tuple table, complain */
 		if (tuptab == NULL)
@@ -3955,13 +3948,8 @@ exec_stmt_dynexecute(PLpgSQL_execstate *estate,
 	(errcode(ERRCODE_SYNTAX_ERROR),
 	 errmsg("INTO used with a command that cannot return data")));
 
-		/* Determine if we assign to a record or a row */
-		if (stmt->rec 

Re: proposal: alternative psql commands quit and exit

2017-12-07 Thread Laurenz Albe
Everaldo Canuto wrote:
> Some of us unfortunately have to work with multiple databases like Oracle or 
> MySQL.
> Their respective clients mysql and sqlplus uses "quit" or "exit" to exit sql 
> client.
> 
> Oracle's sqlplus uses "quit" or "exit" and MySQL client can be exited using
> "quit" and "exit" but for compatibility with psql, it also supports "\q" and 
> "\quit".
> 
> Postgres psql already support "\q" and "\quit" but I think that could be cool
> if it supports "exit" and "quit", talking to friends I saw that I am the only
> that sometimes try to exit psql with "exit'.
> The attached patch implements this way to exit psql.

I am -1 on that, because I think that it is not good to break the simple rule
that everything that is a psql command starts with a backslash.

It might reach out to newcomers, but it will confuse people who know psql.

And frankly, anybody who has used sqlplus is used to suffering anyway.

Yours,
Laurenz Albe



Re: Partition-wise aggregation/grouping

2017-12-07 Thread Jeevan Chalke
On Fri, Dec 8, 2017 at 3:08 AM, legrand legrand  wrote:

> Hello,
>
> I'm testing Partition wise and found a strange result in V8 patch
> see  init_star_schema_agg.sql
> 
>
> Explain gives
>   ->  Seq Scan on facts_p2  (...) (actual time=0.012..0.012 rows=1 loops=1)
> for partitions that are joined with empty partitions
>
> I was expecting a message saying that partition facts_p2 was not accessed
>
> Am I wrong ?
>

Is this related to partition-wise aggregation as you saying you found this
behaviour on v8 patch?

I have tried your testcase on master and see similar plan.

I had a look over the plan and I did not see any empty relation at the
planning time.


> Regards
> PAscal
>
>
>
> --
> Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-
> f1928748.html
>
>


-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


RE: User defined data types in Logical Replication

2017-12-07 Thread Huong Dangminh
Hi Sawada-san,

> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada 
> wrote:
> > On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong 
> wrote:
> >> Hi Sawada-san,
> >>
> >> Sorry for my late response.
> >>
> >> On 2017/12/05 0:11, Masahiko Sawada wrote:
> >>
> >> There is one more case that user-defined data type is not supported
> >> in Logical Replication.
> >> That is when remote data type's name does not exist in SUBSCRIBE.
> >>
> >> In relation.c:logicalrep_typmap_gettypname
> >> We search OID in syscache by remote's data type name and mapping it,
> >> if it does not exist in syscache We will be faced with the bellow
> >> error.
> >>
> >> if (!OidIsValid(entry->typoid))
> >> ereport(ERROR,
> >>
> >> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >>  errmsg("data type \"%s.%s\" required
> >> for logical replication does not exist",
> >> entry->nspname,
> >> entry->typname)));
> >>
> >> I think, it is not necessary to check typoid here in order to avoid
> >> above case, is that right?
> >>
> >> I think it's not right. We should end up with an error in the case
> >> where the same type name doesn't exist on subscriber. With your
> >> proposed patch,
> >> logicalrep_typmap_gettypname() can return an invalid string
> >> (entry->typname) in that case, which can be a cause of SEGV.
> >>
> >> Thanks, I think you are right.
> >> # I thought that entry->typname was received from walsender, and it
> >> is already be qualified in logicalrep_write_typ.
> >> # But we also need check it in subscriber, because it could be lost
> >> info in transmission.
> >
> > Oops, the last sentence of my previous mail was wrong.
> > logicalrep_typmap_gettypname() doesn't return an invalid string since
> > entry->typname is initialized with a type name got from wal sender.
> >
> > After more thought, we might not need to raise an error even if there
> > is not the same data type on both publisher and subscriber. Because
> > data is sent after converted to the text representation and is
> > converted to a data type according to the local table definition
> > subscribers don't always need to have the same data type. If it's
> > right, slot_store_error_callback() doesn't need to find a
> > corresponding local data type OID but just finds the corresponding
> > type name by remote data type OID. What do you think?
> >
> > --- a/src/backend/replication/logical/worker.c
> > +++ b/src/backend/replication/logical/worker.c
> > @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
> > DLIST_STATIC_INIT(lsn_mapping);
> >
> >  typedef struct SlotErrCallbackArg
> >  {
> > -LogicalRepRelation *rel;
> > -intattnum;
> > +LogicalRepRelMapEntry *rel;
> > +intremote_attnum;
> > +intlocal_attnum;
> >  } SlotErrCallbackArg;
> >
> > Since LogicalRepRelMapEntry has a map of local attributes to remote
> > ones we don't need to have two attribute numbers.
> >

Sorry for the late reply.

> Attached the patch incorporated what I have on mind. Please review it.

Thanks for the patch, I will do it at this weekend.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/


Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com

2017-12-07 Thread Amit Kapila
On Fri, Dec 8, 2017 at 12:06 AM, Robert Haas  wrote:
> On Wed, Dec 6, 2017 at 1:05 AM, Amit Kapila  wrote:
>> Right and seeing that I have prepared the patch (posted above [1])
>> which fixes it such that it will resemble the non-parallel case.
>> Ideally, it would have obviated the need for my previous patch which
>> got committed as 778e78ae.  However, now that is committed, I could
>> think of below options:
>>
>> 1. I shall rebase it atop what is committed and actually, I have done
>> that in the attached patch.  I have also prepared a regression test
>> case patch just to show the output with and without the patch.
>> 2. For sort node, we can fix it by having some local_info same as
>> shared_info in sort node and copy the shared_info in that or we could
>> reinstate the pointer to the DSM in ExecSortReInitializeDSM() by
>> looking it up in the TOC as suggested by Thomas. If we go this way,
>> then we need a similar fix for hash node as well.
>
> Well, the patch you've actually attached makes the bug go away by
> removing a net of 53 lines of code.  The other approach would probably
> add code.  So I am tempted to go with the approach you have here.  I
> would probably change the T_HashState and T_SortState cases in
> ExecParallelReInitializeDSM so that they still exist, but just do
> something like this:
>
> case T_HashState:
> case T_SortState:
> /* these nodes have DSM state, but no reinitialization is required */
> break;
>
> That way, it will be more clear to future readers of this code that
> the lack of a reinitialize function is not an oversight, and the
> compiler should optimize these cases away, merging them with the
> default case.
>

Okay, I have adjusted the patch accordingly.  I have also added a
regression test which should produce the same result across different
runs, see if that looks okay to you, then it is better to add such a
test as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_accum_instr_parallel_workers_v6.patch
Description: Binary data


Re: User defined data types in Logical Replication

2017-12-07 Thread Masahiko Sawada
On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada  wrote:
> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong  wrote:
>> Hi Sawada-san,
>>
>> Sorry for my late response.
>>
>> On 2017/12/05 0:11, Masahiko Sawada wrote:
>>
>> There is one more case that user-defined data type is not supported in
>> Logical Replication.
>> That is when remote data type's name does not exist in SUBSCRIBE.
>>
>> In relation.c:logicalrep_typmap_gettypname
>> We search OID in syscache by remote's data type name and mapping it, if it
>> does not exist in syscache
>> We will be faced with the bellow error.
>>
>> if (!OidIsValid(entry->typoid))
>> ereport(ERROR,
>>
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>  errmsg("data type \"%s.%s\" required for
>> logical replication does not exist",
>> entry->nspname,
>> entry->typname)));
>>
>> I think, it is not necessary to check typoid here in order to avoid above
>> case, is that right?
>>
>> I think it's not right. We should end up with an error in the case where the
>> same type name doesn't exist on subscriber. With your proposed patch,
>> logicalrep_typmap_gettypname() can return an invalid string (entry->typname)
>> in that case, which can be a cause of SEGV.
>>
>> Thanks, I think you are right.
>> # I thought that entry->typname was received from walsender, and it is
>> already be qualified in logicalrep_write_typ.
>> # But we also need check it in subscriber, because it could be lost info in
>> transmission.
>
> Oops, the last sentence of my previous mail was wrong.
> logicalrep_typmap_gettypname() doesn't return an invalid string since
> entry->typname is initialized with a type name got from wal sender.
>
> After more thought, we might not need to raise an error even if there
> is not the same data type on both publisher and subscriber. Because
> data is sent after converted to the text representation and is
> converted to a data type according to the local table definition
> subscribers don't always need to have the same data type. If it's
> right, slot_store_error_callback() doesn't need to find a
> corresponding local data type OID but just finds the corresponding
> type name by remote data type OID. What do you think?
>
> --- a/src/backend/replication/logical/worker.c
> +++ b/src/backend/replication/logical/worker.c
> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
> DLIST_STATIC_INIT(lsn_mapping);
>
>  typedef struct SlotErrCallbackArg
>  {
> -LogicalRepRelation *rel;
> -intattnum;
> +LogicalRepRelMapEntry *rel;
> +intremote_attnum;
> +intlocal_attnum;
>  } SlotErrCallbackArg;
>
> Since LogicalRepRelMapEntry has a map of local attributes to remote
> ones we don't need to have two attribute numbers.
>

Attached the patch incorporated what I have on mind. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_slot_store_error_callback_v4.patch
Description: Binary data


Re: How to use set/reset role in contrib_regression test?

2017-12-07 Thread Michael Paquier
On Fri, Dec 8, 2017 at 8:12 AM, David G. Johnston
 wrote:
> Probably -bugs (or -general if you are unsure about the buggy-ness) would
> have been a more appropriate list since this is all direct SQL that looks to
> be broken.  No use moving it now, though.

On HEAD, REL_10_STABLE or REL9_6_STABLE, say with two users:
=# create role popo1 superuser login;
CREATE ROLE
Time: 9.400 ms
=# create role popo2 login;
CREATE ROLE

And then by creating objects:
=# select current_role;
 current_user
--
 popo1
(1 row)
=# set role popo2;
SET
=> select current_role;
 current_user
--
 popo2
(1 row)
=> create table aa (a int);
CREATE TABLE
=> select current_role;
 current_user
--
 popo2
(1 row)
=> reset role;
RESET
=# select current_role;
 current_user
--
 popo1
(1 row)

Per the information you are giving, I would have seen "popo2" as user
for the last query even after issuing a RESET ROLE. However I do not
see any problems.

> Reporting the version(s) you are running would be helpful.

Indeed.
-- 
Michael



Re: Usage of epoch in txid_current

2017-12-07 Thread Amit Kapila
On Wed, Dec 6, 2017 at 11:26 PM, Andres Freund  wrote:
>
>> Either way, it is not clear to me how we will keep it
>> updated after recovery.  Right now, the mechanism is quite simple, at
>> the beginning of a recovery we take the value of nextXid from
>> checkpoint record and then if any xlog record indicates xid that
>> follows nextXid, we advance it.  Here, the point to note is that we
>> take the xid from the WAL record (which means that it assumes xids are
>> non-contiguous or some xids are consumed without being logged) and
>> increment it.  Unless we plan to change something in that logic (like
>> storing 64-bit xids in WAL records), it is not clear to me how to make
>> it work.  OTOH, recovering value of epoch which increments only at
>> wraparound seems fairly straightforward as described in my previous
>> email.
>
> I think it should be fairly simple if simply add the 64bit xid to the
> existing clog extension WAL records.
>

IIUC, you mean to say that we should log the 64bit xid value in
CLOG_ZEROPAGE record while extending clog and that too we can do only
at wraparound.  Now, maybe doing it every time also doesn't hurt, but
I think doing it at wraparound should be sufficient.

Just to be clear, I am not planning to pursue writing a patch for this
at the moment.  So, if anybody else is interested or if Andres wants
to write it, I can help in the review.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Logical replication without a Primary Key

2017-12-07 Thread Petr Jelinek
On 07/12/17 21:19, Robert Haas wrote:
> On Thu, Dec 7, 2017 at 2:53 PM, Chapman Flack  wrote:
>> On 12/07/2017 02:38 PM, Joshua D. Drake wrote:
>>> AB   C
>>> foo,bar,baz
>>> foo,bar,baz
>>>
>>> And then I say:
>>>
>>> UPDATE test set A = 1 where C = baz
>>>
>>> I have updated two rows because there is no primary key to identify the
>>> differences. Both of those rows should be updated and thus replicated
>>
>> Would the subscriber see two records reporting update of a
>> foo,bar,baz row to 1, so it would do that to (arbitrarily)
>> one of them the first time, and (necessarily) the other, the
>> second time?
> 
> Exactly.
> 
> (I think.)
> 

Yes, that how it was designed to work.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-12-07 Thread Thomas Munro
On Fri, Dec 8, 2017 at 2:23 PM, Thomas Munro
 wrote:
> On Fri, Dec 8, 2017 at 1:57 PM, Peter Geoghegan  wrote:
>> 1. Thomas' barrier abstraction was added by commit 1145acc7. I think
>> that you should use a static barrier in tuplesort.c now, and rip out
>> the ConditionVariable fields in the Sharedsort struct.
>
> ... So I think we'd need to add an extra barrier
> function that lets you change the party size of a static barrier.

Something like the attached (untested), which would allow
_bt_begin_parallel() to call BarrierInit(&barrier, request + 1), then
BarrierForgetParticipants(&barrier, request -
pcxt->nworkers_launched), and then all the condition variable loop
stuff can be replaced with a well placed call to
BarrierArriveAndWait(&barrier, WAIT_EVENT_SOMETHING_SOMETHING).

-- 
Thomas Munro
http://www.enterprisedb.com


barrier-forget-participants-v1.patch
Description: Binary data


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-12-07 Thread Thomas Munro
On Fri, Dec 8, 2017 at 1:57 PM, Peter Geoghegan  wrote:
> 1. Thomas' barrier abstraction was added by commit 1145acc7. I think
> that you should use a static barrier in tuplesort.c now, and rip out
> the ConditionVariable fields in the Sharedsort struct. It's only a
> slightly higher level of abstraction for tuplesort.c, which makes only
> a small difference given the simple requirements of tuplesort.c.
> However, I see no reason to not go that way if that's the new
> standard, which it is. This looks like it will be fairly easy.

I thought about this too.  A static barrier seems ideal for it, except
for one tiny detail.  We'd initialise the barrier with the number of
participants, and then after launching we get to find out how many
workers were really launched using pcxt->nworkers_launched, which may
be a smaller number.  If it's a smaller number, we need to adjust the
barrier to the smaller party size.  We can't do that by calling
BarrierDetach() n times, because Andres convinced me to assert that
you didn't try to detach from a static barrier (entirely reasonably)
and I don't really want a process to be 'detaching' on behalf of
someone else anyway.  So I think we'd need to add an extra barrier
function that lets you change the party size of a static barrier.
Yeah, that sounds like a contradiction...  but it's not the same as
the attach/detach workflow because static parties *start out
attached*, which is a very important distinction (it means that client
code doesn't have to futz about with phases, or in other words the
worker doesn't have to consider the possibility that it started up
late and missed all the action and the sort is finished).  The tidiest
way to provide this new API would, I think, be to change the internal
function BarrierDetachImpl() to take a parameter n and reduce
barrier->participants by that number, and then add a function
BarrierForgetParticipants(barrier, n) [insert better name] and have it
call BarrierDetachImpl().  Then the latter's assertion that
!static_party could move out to BarrierDetach() and
BarrierArriveAndDetach().  Alternatively, we could use the dynamic API
(see earlier parentheses about phases).

The end goal would be that code like this can use
BarrierInit(&barrier, participants), then (if necessary)
BarrierForgetParticipants(&barrier, nonstarters), and then they all
just have to call BarrierArriveAndWait() at the right time and that's
all.  Nice and tidy.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Postgres with pthread

2017-12-07 Thread Craig Ringer
On 8 December 2017 at 03:58, Andres Freund  wrote:

> On 2017-12-07 11:26:07 +0800, Craig Ringer wrote:
> > PostgreSQL's architecture conflates "connection", "session" and
> "executor"
> > into one somewhat muddled mess.
>
> How is the executor entangled in the other two?
>
>
Executor in the postgres sense isn't, so I chose the word poorly.

"Engine of execution" maybe. What I'm getting at is that we tie up more
resources than should ideally be necessary when a session is idle,
especially idle in transaction. But I guess a lot of that is really down to
memory allocated and not returned to the OS (because like other C programs
we can't do that), etc. The key resources like PGXACT entries aren't
something we can release while idle in a transaction after all.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-12-07 Thread Peter Geoghegan
On Thu, Dec 7, 2017 at 12:25 AM, Rushabh Lathia
 wrote:
> 0001-Add-parallel-B-tree-index-build-sorting_v14.patch

Cool. I'm glad that we now have a patch that applies cleanly against
master, while adding very little to buffile.c. It feels like we're
getting very close here.

>> * You didn't point out the randomAccess restriction in tuplesort.h.
>>
>
> I did, it's there in the file header comments.

I see what you wrote in tuplesort.h here:

> + * algorithm, and are typically only used for large amounts of data. Note
> + * that parallel sorts is not support for random access to the sort result.

This should say "...are not support when random access is requested".

> Added handling for parallel_leader_participation as well as deleted
> compile time option force_single_worker.

I still see this:

> +
> +/*
> + * A parallel sort with one worker process, and without any leader-as-worker
> + * state may be used for testing the parallel tuplesort infrastructure.
> + */
> +#ifdef NOT_USED
> +#define FORCE_SINGLE_WORKER
> +#endif

Looks like you missed this FORCE_SINGLE_WORKER hunk -- please remove it, too.

>> The parallel_leader_participation docs will also need to be updated.
>>
>
> Done.

I don't see this. There is no reference to
parallel_leader_participation in the CREATE INDEX docs, nor is there a
reference to CREATE INDEX in the parallel_leader_participation docs.

> Also performed more testing with the patch, with
> parallel_leader_participation
> ON and OFF.  Found one issue, where earlier we always used to call
> _bt_leader_sort_as_worker() but now need to skip the call if
> parallel_leader_participation
> is OFF.

Hmm. I think the local variable within _bt_heapscan() should go back.
Its value should be directly taken from parallel_leader_participation
assignment, once. There might be some bizarre circumstances where it
is possible for the value of parallel_leader_participation to change
in flight, causing a race condition: we start with the leader as a
participant, and change our mind later within
_bt_leader_sort_as_worker(), causing the whole CREATE INDEX to hang
forever.

Even if that's impossible, it seems like an improvement in style to go
back to one local variable controlling everything.

Style issue here:

> + long start_block = file->numFiles * BUFFILE_SEG_SIZE;
> + int newNumFiles = file->numFiles + source->numFiles;

Shouldn't start_block conform to the surrounding camelCase style?

Finally, two new thoughts on the patch, that are not responses to
anything you did in v14:

1. Thomas' barrier abstraction was added by commit 1145acc7. I think
that you should use a static barrier in tuplesort.c now, and rip out
the ConditionVariable fields in the Sharedsort struct. It's only a
slightly higher level of abstraction for tuplesort.c, which makes only
a small difference given the simple requirements of tuplesort.c.
However, I see no reason to not go that way if that's the new
standard, which it is. This looks like it will be fairly easy.

2. Does the plan_create_index_workers() cost model need to account for
parallel_leader_participation, too, when capping workers? I think that
it does.

The relevant planner code is:

> +   /*
> +* Cap workers based on available maintenance_work_mem as needed.
> +*
> +* Note that each tuplesort participant receives an even share of the
> +* total maintenance_work_mem budget.  Aim to leave workers (where
> +* leader-as-worker Tuplesortstate counts as a worker) with no less than
> +* 32MB of memory.  This leaves cases where maintenance_work_mem is set to
> +* 64MB immediately past the threshold of being capable of launching a
> +* single parallel worker to sort.
> +*/
> +   sort_mem_blocks = (maintenance_work_mem * 1024L) / BLCKSZ;
> +   min_sort_mem_blocks = (32768L * 1024L) / BLCKSZ;
> +   while (parallel_workers > min_parallel_workers &&
> +  sort_mem_blocks / (parallel_workers + 1) < min_sort_mem_blocks)
> +   parallel_workers--;

This parallel CREATE INDEX planner code snippet is about the need to
have low per-worker maintenance_work_mem availability prevent more
parallel workers from being added to the number that we plan to
launch. Each worker tuplesort state needs at least 32MB. We clearly
need to do something here.

While it's always true that "leader-as-worker Tuplesortstate counts as
a worker" in v14, I think that it should only be true in the next
revision of the patch when parallel_leader_participation is actually
true (IOW, we should only add 1 to parallel_workers within the loop
invariant in that case). The reason why we need to consider
parallel_leader_participation within this plan_create_index_workers()
code is simple: During execution, _bt_leader_sort_as_worker() uses
"worker tuplesort states"/btshared->scantuplesortstates to determine
how much of a share of maintenance_work_mem each worker tuplesort
gets. Our planner code needs to take that into account, now that the
nbtsort.c parallel_leader_participa

Re: Add %r substitution for psql prompts to show recovery status

2017-12-07 Thread Tatsuo Ishii
> On Wed, Dec 6, 2017 at 9:19 PM, Ian Barwick  
> wrote:
>> Note this substitution sends a "pg_is_in_recovery()" query to the server
>> each time it's encountered; unless there's something I'm overlooking I
>> think that's the only reliable way to determine current recovery status.
> 
> That seems kinda painful.
> 
> And what happens in an aborted transaction?

Yeah. I think we need some from help backend for this. For example, a
parameter status message can be used here.  If PostgreSQL moves to the
recovery state or vice versa, PostgreSQL sends the parameter status
message to frontend.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-12-07 Thread Amit Langote
Hi David.

On 2017/12/07 19:48, David Rowley wrote:
> On 30 November 2017 at 11:15, Robert Haas  wrote:
>> Committed 0004 after reviewing the code and testing that it seems to
>> work as advertised.
>>
>> 0005 looks like it might need to be split into smaller patches.  More
>> broadly, the commit messages you wrote for for 0005, 0006, and 0008
>> don't seem to me to do a great job explaining the motivation for the
>> changes which they make.  They tell me what the patches do, but not
>> why they are doing it.  If there's an email in this thread that
>> explains that stuff, please point me to it and I'll go back and reread
>> it more carefully; if not, I think I definitely need some more
>> explanation both of the mission of each patch and the reason why the
>> patch set is divided up in the way that it is.
> 
> Hi Amit,
> 
> It looks like just 0005 to 0008 remain of this and I see that the v13
> 0005 patch no longer applies to current master.
> 
> Are you working on splitting this up as requested by Robert above?
> 
> I can continue reviewing this once patches are available that apply to
> current master.

I'm still working on that.  I will be able to submit a new version
sometime early in the next week, that is, if I don't manage to do it by
today evening (Japan time).  Sorry that it's taking a bit longer.

Thanks,
Amit




Re: [HACKERS] SERIALIZABLE with parallel query

2017-12-07 Thread Thomas Munro
On Thu, Nov 30, 2017 at 2:44 PM, Thomas Munro
 wrote:
> On Thu, Nov 30, 2017 at 2:32 PM, Michael Paquier
>  wrote:
>> Could this question be answered? The patch still applies so I am
>> moving it to next CF.

Rebased, 'cause it broke.

-- 
Thomas Munro
http://www.enterprisedb.com


ssi-parallel-v9.patch
Description: Binary data


Re: [HACKERS] Parallel Hash take II

2017-12-07 Thread Andres Freund
On 2017-12-08 12:07:04 +1300, Thomas Munro wrote:
> I suppose if we wanted to make this type of problem go away, but still
> keep files for forensic purposes, we could introduce a "restart
> number", and stick it into the top level temporary directory's name.
> That way you'd never be able to collide with files created before a
> crash-restart, and we could add O_EXCL so it'd become an error to try
> to create the same filename again.

I'm deeply unconvinced by the "forensic" argument to not do temp file
cleanup after crash restarts. That causes problems like the one we're
debating upthread in the first place, so I'm wholly unconvinced we
should add to that further by adding another layer of complexity.

My personal opinion is that we should just do temp file cleanup after
crash restarts, and document restart_after_crash = false as the solution
for investigating crashes.  I don't want to hold up this patch with a
discussion of that however, so I'm ok with your fix.

Greetings,

Andres Freund



Re: [HACKERS] Transaction control in procedures

2017-12-07 Thread Andrew Dunstan


On 12/06/2017 09:41 AM, Peter Eisentraut wrote:
> On 12/5/17 13:33, Robert Haas wrote:
>> On Tue, Dec 5, 2017 at 1:25 PM, Peter Eisentraut
>>  wrote:
>>> I think ROLLBACK in a cursor loop might not make sense, because the
>>> cursor query itself could have side effects, so a rollback would have to
>>> roll back the entire loop.  That might need more refined analysis before
>>> it could be allowed.
>> COMMIT really has the same problem; if the cursor query has side
>> effects, you can't commit those side effects piecemeal as the loop
>> executed and have things behave sanely.
> The first COMMIT inside the loop would commit the cursor query.  This
> isn't all that different from what you'd get now if you coded this
> manually using holdable cursors or just plain client code.  Clearly, you
> can create a mess if the loop body interacts with the loop expression,
> but that's already the case.
>
> But if you coded something like this yourself now and ran a ROLLBACK
> inside the loop, the holdable cursor would disappear (unless previously
> committed), so you couldn't proceed with the loop.
>
> The SQL standard for persistent stored modules explicitly prohibits
> COMMIT and ROLLBACK in cursor loop bodies.  But I think people will
> eventually want it.
>
 - COMMIT or ROLLBACK inside a procedure with a SET clause attached,
>>> That also needs to be prohibited because of the way the GUC nesting
>>> currently works.  It's probably possible to fix it, but it would be a
>>> separate effort.
>>>
 and/or while SET LOCAL is in effect either at the inner or outer
 level.
>>> That seems to work fine.
>> These two are related -- if you don't permit anything that makes
>> temporary changes to GUCs at all, like SET clauses attached to
>> functions, then SET LOCAL won't cause any problems.  The problem is if
>> you do a transaction operation when something set locally is in the
>> stack of values, but not at the top.
> Yes, that's exactly the problem.  So right now I'm just preventing the
> problematic scenario.  So fix that, one would possibly have to replace
> the stack by something not quite a stack.
>
>
> New patch attached.
>



In general this looks good. However, I'm getting runtime errors in
plperl_elog.c on two different Linux platforms (Fedora 25, Ubuntu
16.04). There seems to be something funky going on. And we do need to
work out why the commented out plperl test isn't working.


Detailed comments:

Referring to anonymous blocks might be a bit mystifying for some
readers, unless we note that they are invoked via DO.

I think this sentence should probably be worded a bit differently:

(And of course, BEGIN and END have different meanings in PL/pgSQL.)

Perhaps "Note that" instead of "And of course,".

Why aren't the SPI functions that error out or return 0 just void
instead of returning an int?

The docs say SPI_set_non_atomic() returns 0 on success, but the code
says otherwise.

This sentence in the comment in SPI_Commit() is slightly odd:

This restriction is particular to PLs implemented on top of SPI.

Perhaps "required by" rather than "particular to" would make it read better.

The empty free_commit() and free_rollback() functions in pl_funcs.c look
slightly odd. I realize that the compiler should optimize the calls
away, but it seems an odd style.


One other thing I wondered about was what if a PL function (say plperl)
used SPI to open an explicit cursor and then looped over it? If there
were a commit or rollback inside that loop we wouldn't have the same
protections we have in plpgsql, ISTM. I haven't tried this yet, so I'm
just speculating about what might happen.



cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




proposal: alternative psql commands quit and exit

2017-12-07 Thread Everaldo Canuto
Some of us unfortunately have to work with multiple databases like Oracle
or MySQL. Their respective clients mysql and sqlplus uses "quit" or "exit"
to exit sql client.

Oracle's sqlplus uses "quit" or "exit" and MySQL client can be exited using
"quit" and "exit" but for compatibility with psql, it also supports "\q"
and "\quit".

Postgres psql already support "\q" and "\quit" but I think that could be
cool if it supports "exit" and "quit", talking to friends I saw that I am
the only that sometimes try to exit psql with "exit'.

The attached patch implements this way to exit psql.

Regards

-- 
Everaldo Canuto


psql-exit-quit.patch
Description: Binary data


Re: [HACKERS] Parallel Hash take II

2017-12-07 Thread Thomas Munro
On Fri, Dec 8, 2017 at 12:07 PM, Thomas Munro
 wrote:
> I suppose if we wanted to make this type of problem go away, but still
> keep files for forensic purposes, we could introduce a "restart
> number", and stick it into the top level temporary directory's name.
> That way you'd never be able to collide with files created before a
> crash-restart, and we could add O_EXCL so it'd become an error to try
> to create the same filename again.

Or we could teach crash-restart to move the top level directory (in
each tablespace) to pgsql_tmp.old, so we'd keep the temporary files
from one previous lifetime only.  That'd prevent unlimited space
eating in multiple crash scenarios, and we could more comfortably say
that it's entirely safe to delete that directory manually in cases
like this:

https://www.postgresql.org/message-id/flat/4f3c89a224ff4660baa62a2b79fb0f1d%40ITUPW-EXMBOX3B.UniNet.unisa.edu.au

-- 
Thomas Munro
http://www.enterprisedb.com



Re: How to use set/reset role in contrib_regression test?

2017-12-07 Thread David G. Johnston
On Thu, Dec 7, 2017 at 3:55 PM, Jeremy Finzel  wrote:

> Hello!  I hope this is the right list for extension dev questions?
>
> SELECT CURRENT_ROLE;
>  current_user
> --
>  jfinzel
> (1 row)
>
> SET ROLE test_pgl_ddl_deploy;
> CREATE SCHEMA special;
> CREATE TABLE special.foo (id serial primary key, foo text, bar text);
> CREATE TABLE special.bar (id serial primary key, super text, man text);
>
> RESET ROLE;
> SELECT CURRENT_ROLE;
> current_user
> -
>  test_pgl_ddl_deploy
> (1 row)
>
> What am I missing here?  Any comments much appreciated.
>
>
​Probably -bugs (or -general if you are unsure about the buggy-ness) would
have been a more appropriate list since this is all direct SQL that looks
to be broken.  No use moving it now, though.

Reporting the version(s) you are running would be helpful.

David J.
​


Re: [HACKERS] Parallel Hash take II

2017-12-07 Thread Thomas Munro
On Sat, Dec 2, 2017 at 4:46 PM, Thomas Munro
 wrote:
> On Sat, Dec 2, 2017 at 3:54 PM, Thomas Munro
>  wrote:
>> On Sat, Dec 2, 2017 at 1:55 PM, Andres Freund  wrote:
>>> - As we don't clean temp files after crash-restarts it isn't impossible
>>>   to have a bunch of crash-restarts and end up with pids *and* per-pid
>>>   shared file set counters reused. Which'd lead to conflicts. Do we care?
>>
>> We don't care.  PathNameCreateTemporaryDir() on a path that already
>> exists will return silently.  PathNameCreateTemporaryFile() on a path
>> that already exists, as mentioned in a comment and following an
>> existing convention, will open and truncate it.  So either it was
>> really an orphan and that is a continuation of our traditional
>> recycling behaviour, or the calling code has a bug and used the same
>> path twice and it's not going to end well.
>
> On further reflection, there are problems with that higher up.  (1)
> Even though PathNameCreateTemporaryFile() will happily truncate and
> reuse an orphaned file when BufFileCreateShared() calls it,
> BufFileOpenShared() could get confused by the orphaned files.  It
> could believe that XXX.1 is a continuation of XXX.0, when in fact it
> is junk left over from a crash restart.  Perhaps BufFileCreateShared()
> needs to delete XXX.{N+1} if it exists, whenever it creates XXX.{N}.
> That will create a gap in the series of existing files that will cause
> BufFileOpenShared()'s search to terminate.  (2) BufFileOpenShared()
> thinks that the absence of an XXX.0 file means there is no BufFile by
> this name, when it could mistakenly open pre-crash junk due to a
> colliding name.  I use that condition as information, but I think I
> can fix that easily by using SharedTuplestoreParticipant::npage == 0
> to detect that there should be no file, rather than trying to open it,
> and then I can define this problem away by declaring that
> BufFileOpenShared() on a name that you didn't call
> BufFileCreateShared() on invokes undefined behaviour.  I will make it
> so.

Here is a patch to deal with that problem.  Thoughts?

I suppose if we wanted to make this type of problem go away, but still
keep files for forensic purposes, we could introduce a "restart
number", and stick it into the top level temporary directory's name.
That way you'd never be able to collide with files created before a
crash-restart, and we could add O_EXCL so it'd become an error to try
to create the same filename again.

I'll post a new SharedTuplestore and Parallel Hash patch set shortly.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Add-defenses-against-pre-crash-files-to-BufFileOpenS.patch
Description: Binary data


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-07 Thread David Rowley
On 8 December 2017 at 11:23, Robert Haas  wrote:
> On Thu, Dec 7, 2017 at 4:57 PM, David Rowley
>  wrote:
>> That's true, but is it worth inventing/maintaining an ATTACH syntax
>> for that? It's not a very common case that multiple matching indexes
>> existing. If it is worth it, do we need DETACH at all?
>
> I think it is totally worth it.  A little piece of extra DDL syntax
> isn't that really costing us anything.  Your proposed approach
> probably still has obscure failure cases - e.g. I bet the
> deadlock-avoidance logic in parallel restore won't know about the
> dependency on a seemingly-unrelated object.  Also, the use of a
> seemingly-useless REPLACE syntax in every dump file will probably
> confuse at least a few users, and maybe a few developers, too.  I
> think there is considerable value, both for the system and for the
> humans who use it, in having something that has *exactly* the
> semantics we want rather than *almost* the semantics we want.
>
> I suppose if we want to get cute, we could have ONLY the ATTACH
> syntax; if you attach an index for a partition that already has an
> index attached, that could mean attach to the new one instead of the
> old one (i.e. replace).  But I would just add support for both ATTACH
> and REPLACE and call it good.

ATTACH/REPLACE sounds fine. My objection was more about the
DETACH/ATTACH method to replace an index.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



How to use set/reset role in contrib_regression test?

2017-12-07 Thread Jeremy Finzel
Hello!  I hope this is the right list for extension dev questions?

I am finding odd behavior running make installcheck for a postgres
extension.  The user running the suite is a superuser ("jfinzel").  Once I
create any object as a test role, reset role does not work, although it
does work to do set role jfinzel.  See:

SELECT CURRENT_ROLE;
 current_user
--
 jfinzel
(1 row)

SET ROLE test_pgl_ddl_deploy;
SELECT CURRENT_ROLE;
current_user
-
 test_pgl_ddl_deploy
(1 row)

RESET ROLE;
SELECT CURRENT_ROLE;
 current_user
--
 jfinzel
(1 row)

SET ROLE test_pgl_ddl_deploy;
CREATE SCHEMA special;
CREATE TABLE special.foo (id serial primary key, foo text, bar text);
CREATE TABLE special.bar (id serial primary key, super text, man text);
SELECT CURRENT_ROLE;
current_user
-
 test_pgl_ddl_deploy
(1 row)

RESET ROLE;
SELECT CURRENT_ROLE;
current_user
-
 test_pgl_ddl_deploy
(1 row)

SET SESSION AUTHORIZATION DEFAULT;
RESET ROLE;
SELECT CURRENT_ROLE;
current_user
-
 test_pgl_ddl_deploy
(1 row)


What am I missing here?  Any comments much appreciated.

Thanks,
Jeremy Finzel


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-07 Thread Robert Haas
On Thu, Dec 7, 2017 at 4:57 PM, David Rowley
 wrote:
>> Sure, that would fix the problem I'm concerned about, but creating the
>> parent index first, as a shell, and then creating each child and
>> attaching it to the parent *also* fixes the problem I'm concerned
>> about, and without dragging any bystander objects into the fray.
>
> That's true, but is it worth inventing/maintaining an ATTACH syntax
> for that? It's not a very common case that multiple matching indexes
> existing. If it is worth it, do we need DETACH at all?

I think it is totally worth it.  A little piece of extra DDL syntax
isn't that really costing us anything.  Your proposed approach
probably still has obscure failure cases - e.g. I bet the
deadlock-avoidance logic in parallel restore won't know about the
dependency on a seemingly-unrelated object.  Also, the use of a
seemingly-useless REPLACE syntax in every dump file will probably
confuse at least a few users, and maybe a few developers, too.  I
think there is considerable value, both for the system and for the
humans who use it, in having something that has *exactly* the
semantics we want rather than *almost* the semantics we want.

I suppose if we want to get cute, we could have ONLY the ATTACH
syntax; if you attach an index for a partition that already has an
index attached, that could mean attach to the new one instead of the
old one (i.e. replace).  But I would just add support for both ATTACH
and REPLACE and call it good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2017-12-07 Thread David Rowley
On 7 December 2017 at 23:56, Beena Emerson  wrote:
> Currently Amit's v13 patches do not apply on the HEAD and I was
> working on 487a0c1518af2f3ae2d05b7fd23d636d687f28f3 which is the last
> commit where all Amit's v13 patches applies cleanly.

Thanks.

I was just looking over this and was wondering about the following case:

drop table if exists p;
create table p (a int not null, b int not null) partition by range (a);
create table p1 partition of p for values from (0) to (1000);
create table p2 partition of p for values from (1000) to (2000);
create table p3 partition of p for values from (2000) to (3000);
create table p4 partition of p for values from (3000) to (4000);

create index on p1 (a);
create index on p2 (a);
create index on p3 (a);
create index on p4 (a);


insert into p select x,x from generate_series(1,3999) x;

drop table if exists t;
create table t (a int not null);

insert into t select generate_Series(1,10);

analyze p;

analyze t;

set enable_mergejoin=0;
set enable_hashjoin=0;

explain analyze select * from p inner join t on p.a = t.a;

The patch gives me:

   QUERY PLAN

 Nested Loop (actual time=0.032..0.159 rows=10 loops=1)
   ->  Seq Scan on t (actual time=0.012..0.013 rows=10 loops=1)
   ->  Append (actual time=0.004..0.013 rows=1 loops=10)
 ->  Index Scan using p1_a_idx on p1 (actual time=0.004..0.004
rows=1 loops=10)
   Index Cond: (a = t.a)
 ->  Index Scan using p2_a_idx on p2 (actual time=0.003..0.003
rows=0 loops=10)
   Index Cond: (a = t.a)
 ->  Index Scan using p3_a_idx on p3 (actual time=0.002..0.002
rows=0 loops=10)
   Index Cond: (a = t.a)
 ->  Index Scan using p4_a_idx on p4 (actual time=0.003..0.003
rows=0 loops=10)
   Index Cond: (a = t.a)
 Planning time: 0.472 ms
 Execution time: 0.241 ms
(13 rows)

but I expected to get (never executed) for p2, p3 and p4.

The following code makes me think you intend this to work:

@@ -280,6 +438,10 @@ ExecReScanAppend(AppendState *node)
 {
  int i;

+ /* Determine subplans to scan based on the new Params */
+ if (node->ps.chgParam != NULL && node->join_clauses)
+ set_append_subplan_indexes(node, node->join_clauses);
+

It just does not due to the node->join_clauses being NULL.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-07 Thread David Rowley
On 8 December 2017 at 10:51, Robert Haas  wrote:
> On Thu, Dec 7, 2017 at 4:43 PM, David Rowley
>  wrote:
>> And yeah, this does nothing for making sure we choose the correct
>> index if more than one matching index exists on the leaf partition,
>> but perhaps we can dump a series of
>>
>> ALTER INDEX p_a_idx REPLACE INDEX FOR p1 WITH p1_a_idx;
>> ALTER INDEX p_a_idx REPLACE INDEX FOR p2 WITH p2_a_idx;
>>
>> ... which would be no-ops most of the time, but at least would ensure
>> we use the correct index. (Likely we could fix the FOREIGN KEY
>> constraint choosing the first matching index with some variation of
>> this syntax)
>
> Sure, that would fix the problem I'm concerned about, but creating the
> parent index first, as a shell, and then creating each child and
> attaching it to the parent *also* fixes the problem I'm concerned
> about, and without dragging any bystander objects into the fray.

That's true, but is it worth inventing/maintaining an ATTACH syntax
for that? It's not a very common case that multiple matching indexes
existing. If it is worth it, do we need DETACH at all?


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-07 Thread Robert Haas
On Thu, Dec 7, 2017 at 4:43 PM, David Rowley
 wrote:
> And yeah, this does nothing for making sure we choose the correct
> index if more than one matching index exists on the leaf partition,
> but perhaps we can dump a series of
>
> ALTER INDEX p_a_idx REPLACE INDEX FOR p1 WITH p1_a_idx;
> ALTER INDEX p_a_idx REPLACE INDEX FOR p2 WITH p2_a_idx;
>
> ... which would be no-ops most of the time, but at least would ensure
> we use the correct index. (Likely we could fix the FOREIGN KEY
> constraint choosing the first matching index with some variation of
> this syntax)

Sure, that would fix the problem I'm concerned about, but creating the
parent index first, as a shell, and then creating each child and
attaching it to the parent *also* fixes the problem I'm concerned
about, and without dragging any bystander objects into the fray.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-07 Thread David Rowley
On 8 December 2017 at 10:07, Robert Haas  wrote:
> I do agree with you that an index which is currently enforcing a
> unique constraint has to remain continuously valid -- or if it
> unavoidably doesn't, for example if we allowed an unlogged unique
> index on a logged table, then we'd have to do something unpleasant
> like disallow inserts and updates to the key column until that gets
> fixed.  However, that doesn't seem to preclude gracefully swapping out
> indexes for individual partitions; instead of providing a DETACH
> operation, we could provide a REPLACE operation that effectively does
> DETACH + ATTACH.

Yeah, that's what I've been trying to push for. I've mentioned a handy
wavy REPLACE syntax a few times.

> It's possible that we are not that far apart here.  I don't like the
> ATTACH syntax because it's like what we do for partitions; I like it
> because it solves the pg_dump problem.  And it seems to me that your
> reservations are more about DETACH than ATTACH.  I have no issue with
> punting DETACH to the curb, recasting it as REPLACE, or whatever.

I just don't quite see the pg_dump problem with my proposed approach.

Let me pseudo write out what I imagine in a pg_dump file.

CREATE TABLE p (a INT NOT NULL, b INT NOT NULL) PARTITION BY RANGE (a);

CREATE TABLE p1 PARTITION OF p FOR VALUES FROM (1) TO (10);
CREATE TABLE p2 PARTITION OF p FOR VALUES FROM (10) TO (20);



-- create indexes on partitions
CREATE INDEX p1_a_idx ON p1 (a); -- normal create index
CREATE INDEX p2_a_idx ON p2 (a); -- normal create index.

COMMENT ON INDEX p1_a_idx IS 'the comment';

-- create partitioned indexes

-- checks indexes exist for each partition of p with matching columns,
-- AM and unique property then perform a metadata only change to
-- say there's an index on this table, also updates each index uses to
-- say its parent index is this index. If someone happens to have edited
-- the dump file to remove one of the indexes on a leaf partition then the
-- following would have to create that index again.
CREATE INDEX p_a_idx ON p (a);

--
-- PostgreSQL database dump complete
--

We'd just need to ensure the indexes of leafs are created in an order
that takes into account the dependency order.

And yeah, this does nothing for making sure we choose the correct
index if more than one matching index exists on the leaf partition,
but perhaps we can dump a series of

ALTER INDEX p_a_idx REPLACE INDEX FOR p1 WITH p1_a_idx;
ALTER INDEX p_a_idx REPLACE INDEX FOR p2 WITH p2_a_idx;

... which would be no-ops most of the time, but at least would ensure
we use the correct index. (Likely we could fix the FOREIGN KEY
constraint choosing the first matching index with some variation of
this syntax)

Also, DROP INDEX should be disallowed on a leaf partition index which
is in use for a partitioned index. I imagine pg_index just needs a new
column indparentid which would be InvalidOid if it's not used. I'm
just not that clear on if that column should be set to the leaf
partition's direct parent, or the parent that the CREATE INDEX was
done on.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Add %r substitution for psql prompts to show recovery status

2017-12-07 Thread David G. Johnston
On Wed, Dec 6, 2017 at 7:19 PM, Ian Barwick 
wrote:

> A possible alternative would be only to check the status each time a new
> database connection is made, but that wouldn't catch the case where the
> server has been promoted.
>

​Can we cache a false pg_is_in_recovery response and return that instead of
querying the server?

David J.
​


Re: Partition-wise aggregation/grouping

2017-12-07 Thread legrand legrand
Hello,

I'm testing Partition wise and found a strange result in V8 patch
see  init_star_schema_agg.sql
  

Explain gives  
  ->  Seq Scan on facts_p2  (...) (actual time=0.012..0.012 rows=1 loops=1)
for partitions that are joined with empty partitions

I was expecting a message saying that partition facts_p2 was not accessed

Am I wrong ?
Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Add %r substitution for psql prompts to show recovery status

2017-12-07 Thread Robert Haas
On Wed, Dec 6, 2017 at 9:19 PM, Ian Barwick  wrote:
> Note this substitution sends a "pg_is_in_recovery()" query to the server
> each time it's encountered; unless there's something I'm overlooking I
> think that's the only reliable way to determine current recovery status.

That seems kinda painful.

And what happens in an aborted transaction?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-07 Thread Robert Haas
On Tue, Dec 5, 2017 at 7:42 PM, David Rowley
 wrote:
> On 6 December 2017 at 11:35, Robert Haas  wrote:
>> What are we giving up by explicitly attaching
>> the correct index?
>
> The part I don't like about the ATTACH and DETACH of partitioned index
> is that it seems to be trying to just follow the syntax we use to
> remove a partition from a partitioned table, however, there's a huge
> difference between the two, as DETACHing a partition from a
> partitioned table leaves the partitioned table in a valid state, it
> simply just no longer contains the detached partition. With the
> partitioned index, we leave the index in an invalid state after a
> DETACH. It can only be made valid again once another leaf index has
> been ATTACHED again and that we've verified that all other indexes on
> every leaf partition is also there and are valid. If we're going to
> use these indexes to answer queries, then it seems like we should try
> to keep them valid so that queries can actually use them for
> something.

I think keeping them valid is a great goal, just like I like low
interest rates and a chicken in every pot.  However, I'm pretty
skeptical of our ability to always succeed in meeting that goal with
absolutely zero corner cases.  What about a CREATE INDEX CONCURRENTLY
that fails midway through, or similarly DROP INDEX CONCURRENTLY?
Those operations can leave around artifacts in the unpartitioned table
case, and I bet they will also leave around artifacts for partitioned
tables, and maybe there will be cases where they don't leave the same
artifacts for every table in the hierarchy.  Even if they do, there is
future development to consider.  Maybe REINDEX INDEX CONCURRENTLY will
carry a possibility of creating a mix of states.  Or maybe someone
will implement Simon's idea from a few years ago of allowing an
unlogged index on a permanent table, with the index being somehow
marked not-valid after a restart.  In that situation, each one would
need to be reindexed independently to become valid.

I do agree with you that an index which is currently enforcing a
unique constraint has to remain continuously valid -- or if it
unavoidably doesn't, for example if we allowed an unlogged unique
index on a logged table, then we'd have to do something unpleasant
like disallow inserts and updates to the key column until that gets
fixed.  However, that doesn't seem to preclude gracefully swapping out
indexes for individual partitions; instead of providing a DETACH
operation, we could provide a REPLACE operation that effectively does
DETACH + ATTACH.

It's possible that we are not that far apart here.  I don't like the
ATTACH syntax because it's like what we do for partitions; I like it
because it solves the pg_dump problem.  And it seems to me that your
reservations are more about DETACH than ATTACH.  I have no issue with
punting DETACH to the curb, recasting it as REPLACE, or whatever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com

2017-12-07 Thread Thomas Munro
On Fri, Dec 8, 2017 at 7:36 AM, Robert Haas  wrote:
> On Wed, Dec 6, 2017 at 1:05 AM, Amit Kapila  wrote:
>> Right and seeing that I have prepared the patch (posted above [1])
>> which fixes it such that it will resemble the non-parallel case.
>
> Long story short, I like the patch.

LGTM.  There might be an argument for clearing the instrumentation
every time on the basis that you might finish up keeping data from a
non-final loop when a worker opted not to do anything in the final
loop, but I'm not going to make that argument because I don't think it
matters.   The patch makes the tests in
test-hash-join-rescan-instr-v1.patch pass (from my previous message).
Please also consider that test patch for commit.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Postgres with pthread

2017-12-07 Thread Andres Freund
Hi,

On 2017-12-07 20:48:06 +, Greg Stark wrote:
> But then I thought about it a bit and I do wonder. I don't know how
> well we test having multiple portals doing all kinds of different
> query plans with their execution interleaved.

Cursors test that pretty well.


> And I definitely have doubts whether you can start SPI sessions from
> arbitrary points in the executor expression evaluation and don't know
> what state you can leave and resume them from on subsequent
> evaluations...

SPI being weird doesn't really have that much bearing on the executor
structure imo. But I'm unclear what you'd use SPI for that really
necessitates that. We don't suspend execution it the middle of function
execution...

Greetings,

Andres Freund



Re: Postgres with pthread

2017-12-07 Thread Greg Stark
On 7 December 2017 at 19:58, Andres Freund  wrote:
> On 2017-12-07 11:26:07 +0800, Craig Ringer wrote:
>> PostgreSQL's architecture conflates "connection", "session" and "executor"
>> into one somewhat muddled mess.
>
> How is the executor entangled in the other two?

I was going to ask the same question. AFAICS it's the one part of
Postgres that isn't muddled at all -- it's crystal clear that
"connection" == "session" as far as the backend is concerned and
"executor context" is completely separate.

But then I thought about it a bit and I do wonder. I don't know how
well we test having multiple portals doing all kinds of different
query plans with their execution interleaved. And I definitely have
doubts whether you can start SPI sessions from arbitrary points in the
executor expression evaluation and don't know what state you can leave
and resume them from on subsequent evaluations...



-- 
greg



Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-12-07 Thread Andres Freund
On 2017-12-07 17:41:56 -0300, Alvaro Herrera wrote:
> > > Looking at 0002: I agree with the stuff being done here.  I think a
> > > couple of these checks could be moved one block outerwards in term of
> > > scope; I don't see any reason why the check should not apply in that
> > > case.  I didn't catch any place missing additional checks.
> > 
> > I think I largely put them into the inner blocks because they were
> > guaranteed to be reached in those case (the horizon has to be before the
> > cutoff etc), and that way additional branches are avoided.
> 
> Hmm, it should be possible to call vacuum with a very low freeze_min_age
> (which sets a very recent relfrozenxid), then shortly thereafter call it
> with a large one, no?  So it's not really guaranteed ...

Fair point!

- Andres



Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-12-07 Thread Alvaro Herrera
Hello,

Andres Freund wrote:

> On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote:
> > > I've played around quite some with the attached patch. So far, after
> > > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> > > the situation worse for already existing corruption. HOT pruning can
> > > change the exact appearance of existing corruption a bit, but I don't
> > > think it can make the corruption meaningfully worse.  It's a bit
> > > annoying and scary to add so many checks to backbranches but it kinda
> > > seems required.  The error message texts aren't perfect, but these are
> > > "should never be hit" type elog()s so I'm not too worried about that.
> > 
> > Looking at 0002: I agree with the stuff being done here.  I think a
> > couple of these checks could be moved one block outerwards in term of
> > scope; I don't see any reason why the check should not apply in that
> > case.  I didn't catch any place missing additional checks.
> 
> I think I largely put them into the inner blocks because they were
> guaranteed to be reached in those case (the horizon has to be before the
> cutoff etc), and that way additional branches are avoided.

Hmm, it should be possible to call vacuum with a very low freeze_min_age
(which sets a very recent relfrozenxid), then shortly thereafter call it
with a large one, no?  So it's not really guaranteed ...


> > Despite these being "shouldn't happen" conditions, I think we should
> > turn these up all the way to ereports with an errcode and all, and also
> > report the XIDs being complained about.  No translation required,
> > though.  Other than those changes and minor copy editing a commit
> > (attached), 0002 looks good to me.
> 
> Hm, I don't really care one way or another. I do see that you used
> errmsg() in some places, errmsg_internal() in others. Was that
> intentional?

Eh, no, my intention was to make these all errmsg_internal() to avoid
translation (serves no purpose here).  Feel free to update the remaining
ones.


> > I started thinking it'd be good to report block number whenever anything
> > happened while scanning the relation.  The best way to go about this
> > seems to be to add an errcontext callback to lazy_scan_heap, so I attach
> > a WIP untested patch to add that.  (I'm not proposing this for
> > back-patch for now, mostly because I don't have the time/energy to push
> > for it right now.)
> 
> That seems like a good idea. There's some cases where that could
> increase log spam noticeably (unitialized blocks), but that seems
> acceptable.

Yeah, I noticed that and I agree it seems ok.

> > +   if (info->blkno != InvalidBlockNumber)
> > +   errcontext("while scanning page %u of relation %s",
> > +  info->blkno, 
> > RelationGetRelationName(info->relation));
> > +   else
> > +   errcontext("while vacuuming relation %s",
> > +  RelationGetRelationName(info->relation));
> 
> Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by
> replacing scanning with vacuuming?

Makes sense.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-12-07 Thread Andres Freund
On 2017-12-07 12:08:38 +0900, Michael Paquier wrote:
> > Your commit message does a poor job of acknowledging prior work on
> > diagnosing the problem starting from Dan's initial test case and patch.
> 
> (Nit: I have extracted from the test case of Dan an isolation test,
> which Andres has reduced to a subset of permutations. Peter G. also
> complained about what is visibly the same bug we are discussing here
> but without a test case.)

The previous versions of the test case didn't actually hit the real
issues, so I don't think that matter much.

Greetings,

Andres Freund



Re: Logical replication without a Primary Key

2017-12-07 Thread Andres Freund
On 2017-12-07 11:38:51 -0800, Joshua D. Drake wrote:
> On 12/07/2017 10:49 AM, Robert Haas wrote:
> > On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek
> >  wrote:
> > > No it won't, it will update only one row, it does not try to find
> > > multiple matching rows.
> > Good, because that's exactly what it should do.  I mean, if you have
> > on the master two tuples that are identical, and you update one of
> > them, then the replica had better update exactly one of them as well.
> > Since they are identical, it doesn't matter *which* one gets updated
> > on the replica, but if you update *both* of them on the replica, then
> > things are out of sync.
> 
> Well I think that is a problem actually. If I have:
> 
> A    B   C
> foo,bar,baz
> foo,bar,baz
> 
> And then I say:
> 
> UPDATE test set A = 1 where C = baz
> 
> I have updated two rows because there is no primary key to identify the
> differences. Both of those rows should be updated and thus replicated
> otherwise, logical replication (of this specific table) provides inaccurate
> data on the subscriber.

Not a problem. If you updated both rows, then there's two cases:
a) the update actually changed the column values. In which case the first 
per-row
   change that's replicated updates the first row, but the second one won't
   again find it as matching in all columns.
b) the update didn't actually change anything. In which case the same
   row gets updated twice, but because the column values didn't change,
   that doesn't matter.

Greetings,

Andres Freund



Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-12-07 Thread Andres Freund
Hi,

On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote:
> > I've played around quite some with the attached patch. So far, after
> > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> > the situation worse for already existing corruption. HOT pruning can
> > change the exact appearance of existing corruption a bit, but I don't
> > think it can make the corruption meaningfully worse.  It's a bit
> > annoying and scary to add so many checks to backbranches but it kinda
> > seems required.  The error message texts aren't perfect, but these are
> > "should never be hit" type elog()s so I'm not too worried about that.
> 
> Looking at 0002: I agree with the stuff being done here.  I think a
> couple of these checks could be moved one block outerwards in term of
> scope; I don't see any reason why the check should not apply in that
> case.  I didn't catch any place missing additional checks.

I think I largely put them into the inner blocks because they were
guaranteed to be reached in those case (the horizon has to be before the
cutoff etc), and that way additional branches are avoided.


> Despite these being "shouldn't happen" conditions, I think we should
> turn these up all the way to ereports with an errcode and all, and also
> report the XIDs being complained about.  No translation required,
> though.  Other than those changes and minor copy editing a commit
> (attached), 0002 looks good to me.

Hm, I don't really care one way or another. I do see that you used
errmsg() in some places, errmsg_internal() in others. Was that
intentional?

> I started thinking it'd be good to report block number whenever anything
> happened while scanning the relation.  The best way to go about this
> seems to be to add an errcontext callback to lazy_scan_heap, so I attach
> a WIP untested patch to add that.  (I'm not proposing this for
> back-patch for now, mostly because I don't have the time/energy to push
> for it right now.)

That seems like a good idea. There's some cases where that could
increase log spam noticeably (unitialized blocks), but that seems
acceptable.


> +static void
> +lazy_scan_heap_cb(void *arg)
> +{
> + LazyScanHeapInfo *info = (LazyScanHeapInfo *) arg;
> +
> + if (info->blkno != InvalidBlockNumber)
> + errcontext("while scanning page %u of relation %s",
> +info->blkno, 
> RelationGetRelationName(info->relation));
> + else
> + errcontext("while vacuuming relation %s",
> +RelationGetRelationName(info->relation));
> +}

Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by
replacing scanning with vacuuming?

Greetings,

Andres Freund



Re: plpgsql test layout

2017-12-07 Thread Pavel Stehule
2017-12-07 20:08 GMT+01:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 11/14/17 11:51, Pavel Stehule wrote:
> > One option would be to create a new test setup under
> src/pl/pgsql(/src)
> > and move some of the test material from the main test suite there.
> Some
> > of the test cases in the main test suite are more about SPI and
> triggers
> > and such, so it makes sense to keep these in the main line.  Of
> course
> > finding the cut-off might be hard.  Or maybe we'll just start with
> new
> > stuff from now on.
> >
> > Any thoughts?
> >
> > +1
>
> Here is a first attempt.
>

looks ok



> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Logical replication without a Primary Key

2017-12-07 Thread Robert Haas
On Thu, Dec 7, 2017 at 2:53 PM, Chapman Flack  wrote:
> On 12/07/2017 02:38 PM, Joshua D. Drake wrote:
>> AB   C
>> foo,bar,baz
>> foo,bar,baz
>>
>> And then I say:
>>
>> UPDATE test set A = 1 where C = baz
>>
>> I have updated two rows because there is no primary key to identify the
>> differences. Both of those rows should be updated and thus replicated
>
> Would the subscriber see two records reporting update of a
> foo,bar,baz row to 1, so it would do that to (arbitrarily)
> one of them the first time, and (necessarily) the other, the
> second time?

Exactly.

(I think.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-12-07 Thread Andres Freund
Hi,

On 2017-12-06 13:21:15 -0300, Alvaro Herrera wrote:
> I think you've done a stellar job of identifying what the actual problem
> was.  I like the new (simpler) coding of that portion of
> HeapTupleSatisfiesVacuum.

Thanks!

> freeze-the-dead is not listed in isolation_schedule; an easy fix.

Yea, I'd sent an update about that, stupidly forgot git amend the
commit...


> I confirm that the test crashes with an assertion failure without the
> code fix, and that it doesn't with it.
> 
> I think the comparison to OldestXmin should be reversed:
> 
>   if (!TransactionIdPrecedes(xmax, OldestXmin))
>   return HEAPTUPLE_RECENTLY_DEAD;
> 
>   return HEAPTUPLE_DEAD;
> 
> This way, an xmax that has exactly the OldestXmin value will return
> RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
> OldestXmin value itself is supposed to be still possibly visible to
> somebody).

Yes, I think you're right. That's a bug.


> Your commit message does a poor job of acknowledging prior work on
> diagnosing the problem starting from Dan's initial test case and patch.

Yea, you're right. I was writing it with 14h of jetlag, apparently that
does something to your brain...

Greetings,

Andres Freund



Re: Signals in a BGW

2017-12-07 Thread Chapman Flack
On 12/07/2017 02:58 PM, Andres Freund wrote:
> On 2017-12-07 12:35:07 -0500, Robert Haas wrote:
>> On Wed, Dec 6, 2017 at 6:59 PM, Chapman Flack  wrote:
 The default handler is bgworker_die ; see src/backend/postmaster
 /bgworker.c
 . I don't know if elog() is safe in a signal handler, but I guess in
 the absence of non-reentrant errcontext functions...
>>>
>>> That does seem bold, doesn't it?
>>
>> Yes, I think it's flat busted.
> 
> We've had that discussion a couple times. The concensus so far has been
> that FATALing is considered kinda ok, everything else not. But it
> definitely has caused issues in the ast, mostly due to malloc being
> entered while already in malloc.

Well, bgworker.c:bgworker_die() FATALs, so that might be kinda ok,
but postgres.c:FloatExceptionHandler() ERRORs, so what's up with that?

Has it just not caused much trouble because the existing math
operators test their operands rather than relying on exceptions,
and it might only get called in case of some extension that did
float operations without checking?

I admit I've been trying to think of a better thing it could do,
and it seems challenging ... ideally you'd want an ERROR to happen,
and soon (before trying to evaluate more of the expression), from
code that might not be checking right away ... but how could that
be made to work?

-Chap



Re: Postgres with pthread

2017-12-07 Thread Andres Freund
On 2017-12-07 11:26:07 +0800, Craig Ringer wrote:
> PostgreSQL's architecture conflates "connection", "session" and "executor"
> into one somewhat muddled mess.

How is the executor entangled in the other two?

Greetings,

Andres Freund



Re: Signals in a BGW

2017-12-07 Thread Andres Freund
On 2017-12-07 12:35:07 -0500, Robert Haas wrote:
> On Wed, Dec 6, 2017 at 6:59 PM, Chapman Flack  wrote:
> >> The default handler is bgworker_die ; see src/backend/postmaster
> >> /bgworker.c
> >> . I don't know if elog() is safe in a signal handler, but I guess in
> >> the absence of non-reentrant errcontext functions...
> >
> > That does seem bold, doesn't it?
> 
> Yes, I think it's flat busted.

We've had that discussion a couple times. The concensus so far has been
that FATALing is considered kinda ok, everything else not. But it
definitely has caused issues in the ast, mostly due to malloc being
entered while already in malloc.

Greetings,

Andres Freund



Re: Logical replication without a Primary Key

2017-12-07 Thread Chapman Flack
On 12/07/2017 02:38 PM, Joshua D. Drake wrote:

> A    B   C
> foo,bar,baz
> foo,bar,baz
> 
> And then I say:
> 
> UPDATE test set A = 1 where C = baz
> 
> I have updated two rows because there is no primary key to identify the
> differences. Both of those rows should be updated and thus replicated

Would the subscriber see two records reporting update of a
foo,bar,baz row to 1, so it would do that to (arbitrarily)
one of them the first time, and (necessarily) the other, the
second time?

Or is that not the way it would work?

-Chap



Re: [HACKERS] CUBE seems a bit confused about ORDER BY

2017-12-07 Thread Alexander Korotkov
On Wed, Nov 29, 2017 at 3:10 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I'll try to provide meaningful review next week.
>>
>
> Cool, thanks.
>

Andrey Borodin complained through Telegram that he can't apply my patches
using "git apply".  After investigation of this problem, it appears that
the reason is that I still use git context diff setup according to our wiki
instruction (https://wiki.postgresql.org/wiki/Working_with_Git).  I've look
around some threads and realized that I'm probably the only guy who still
sends context diff patches.

Attached patchset in universal diff format.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-cube-knn-fix-3.patch
Description: Binary data


0002-cube-knn-negative-coordinate-3.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2017-12-07 Thread Peter Eisentraut
On 12/7/17 08:31, Peter Eisentraut wrote:
> On 12/4/17 10:15, Nikhil Sontakke wrote:
>> PFA, latest patch for this functionality.
> 
> This probably needs documentation updates for the logical decoding chapter.

You need the attached patch to be able to compile without warnings.

Also, the regression tests crash randomly for me at

frame #4: 0x00010a6febdb
postgres`heap_prune_record_prunable(prstate=0x7ffee5578990, xid=0)
at pruneheap.c:625
   622   * This should exactly match the PageSetPrunable macro.  We
can't store
   623   * directly into the page header yet, so we update working 
state.
   624   */
-> 625  Assert(TransactionIdIsNormal(xid));
   626  if (!TransactionIdIsValid(prstate->new_prune_xid) ||
   627  TransactionIdPrecedes(xid, prstate->new_prune_xid))
   628  prstate->new_prune_xid = xid;

Did you build with --enable-cassert?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 0beddc4b47d160d4fcd9e99d23a90c4273b32c41 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 7 Dec 2017 14:42:04 -0500
Subject: [PATCH] fixup! Original patch

---
 contrib/test_decoding/test_decoding.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/contrib/test_decoding/test_decoding.c 
b/contrib/test_decoding/test_decoding.c
index 362683feef..a709e2ff92 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -74,7 +74,7 @@ static void pg_decode_message(LogicalDecodingContext *ctx,
  Size sz, const char *message);
 static bool pg_filter_prepare(LogicalDecodingContext *ctx,
  ReorderBufferTXN *txn,
- char *gid);
+ const char *gid);
 static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
  ReorderBufferTXN *txn,
  XLogRecPtr prepare_lsn);
@@ -290,7 +290,7 @@ pg_decode_commit_txn(LogicalDecodingContext *ctx, 
ReorderBufferTXN *txn,
 /* Filter out unnecessary two-phase transactions */
 static bool
 pg_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
-   char *gid)
+   const char *gid)
 {
TestDecodingData *data = ctx->output_plugin_private;
 
-- 
2.15.1



Re: Logical replication without a Primary Key

2017-12-07 Thread Joshua D. Drake

On 12/07/2017 10:49 AM, Robert Haas wrote:

On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek
 wrote:

No it won't, it will update only one row, it does not try to find
multiple matching rows.

Good, because that's exactly what it should do.  I mean, if you have
on the master two tuples that are identical, and you update one of
them, then the replica had better update exactly one of them as well.
Since they are identical, it doesn't matter *which* one gets updated
on the replica, but if you update *both* of them on the replica, then
things are out of sync.


Well I think that is a problem actually. If I have:

A    B   C
foo,bar,baz
foo,bar,baz

And then I say:

UPDATE test set A = 1 where C = baz

I have updated two rows because there is no primary key to identify the 
differences. Both of those rows should be updated and thus replicated 
otherwise, logical replication (of this specific table) provides 
inaccurate data on the subscriber.


Thanks,

JD


--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.org
* Unless otherwise stated, opinions are my own.   *




Re: [HACKERS] A design for amcheck heapam verification

2017-12-07 Thread Peter Geoghegan
On Tue, Nov 28, 2017 at 9:54 PM, Peter Geoghegan  wrote:
> On Tue, Nov 28, 2017 at 9:50 PM, Michael Paquier
>  wrote:
>>> Would that address your concern? There would be an SQL interface, but
>>> it would be trivial.
>>
>> That's exactly what I think you should do, and mentioned so upthread.
>> A SQL interface can also show a good example of how developers can use
>> this API.

Attach revision, v5, adds a new test harness -- test_bloomfilter.

This can be used to experimentally verify that the meets the well
known "1% false positive rate with 9.6 bits per element" standard. It
manages to do exactly that:

postgres=# set client_min_messages = 'debug1';
SET
postgres=# SELECT test_bloomfilter(power => 23, nelements => 873813,
seed => -1, tests => 3);
DEBUG:  beginning test #1...
DEBUG:  bloom_work_mem (KB): 1024
DEBUG:  false positives: 8630 (rate: 0.009876, proportion bits set:
0.517625, seed: 1373191603)
DEBUG:  beginning test #2...
DEBUG:  bloom_work_mem (KB): 1024
DEBUG:  false positives: 8623 (rate: 0.009868, proportion bits set:
0.517623, seed: 406665822)
DEBUG:  beginning test #3...
DEBUG:  bloom_work_mem (KB): 1024
WARNING:  false positives: 8840 (rate: 0.010117, proportion bits set:
0.517748, seed: 398116374)
 test_bloomfilter
--

(1 row)

Here, we repeat the same test 3 times, varying only the seed value
used for each run.

The last message is a WARNING because we exceed the 1% threshold
(hard-coded into test_bloomfilter.c), though only by a tiny margin,
due only to random variations in seed value. We round up to 10 bits
per element for the regression tests. That's where the *actual*
"nelements" argument comes from within the tests, so pg_regress tests
should never see the WARNING (if they do, that counts as a failure).

I've experimentally observed that we get the 1% false positive rate
with any possible bitset when "nelements" works out at 9.6 bitset bits
per element. Inter-run variation is tiny. With 50 tests, I didn't
observe these same Bloom filter parameters produce a false positive
rate that came near to 1.1%. 1.01% or 1.02% was about as bad as it
got.

There is a fairly extensive README, which I hope will clear up the
theory behind the bloomfilter.c strategy on bitset size and false
positives. Also, there was a regression that I had to fix in
bloomfilter.c, in seeding. It didn't reliably cause variation in the
false positives. And, there was bitrot with the documentation that I
fixed up.

-- 
Peter Geoghegan
From 47f2c6cd398244f11c3490b644cd225beac9ae31 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Thu, 24 Aug 2017 20:58:21 -0700
Subject: [PATCH 1/2] Add Bloom filter data structure implementation.

A Bloom filter is a space-efficient, probabilistic data structure that
can be used to test set membership.  Callers will sometimes incur false
positives, but never false negatives.  The rate of false positives is a
function of the total number of elements and the amount of memory
available for the Bloom filter.

Two classic applications of Bloom filters are cache filtering, and data
synchronization testing.  Any user of Bloom filters must accept the
possibility of false positives as a cost worth paying for the benefit in
space efficiency.

This commit adds a test harness extension module, test_bloomfilter.  It
can be used to get a sense of how the Bloom filter implementation
performs under varying conditions.
---
 src/backend/lib/Makefile   |   4 +-
 src/backend/lib/README |   2 +
 src/backend/lib/bloomfilter.c  | 313 +
 src/include/lib/bloomfilter.h  |  27 ++
 src/test/modules/Makefile  |   1 +
 src/test/modules/test_bloomfilter/.gitignore   |   4 +
 src/test/modules/test_bloomfilter/Makefile |  21 ++
 src/test/modules/test_bloomfilter/README   |  72 +
 .../test_bloomfilter/expected/test_bloomfilter.out |  25 ++
 .../test_bloomfilter/sql/test_bloomfilter.sql  |  22 ++
 .../test_bloomfilter/test_bloomfilter--1.0.sql |  10 +
 .../modules/test_bloomfilter/test_bloomfilter.c| 138 +
 .../test_bloomfilter/test_bloomfilter.control  |   4 +
 13 files changed, 641 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/lib/bloomfilter.c
 create mode 100644 src/include/lib/bloomfilter.h
 create mode 100644 src/test/modules/test_bloomfilter/.gitignore
 create mode 100644 src/test/modules/test_bloomfilter/Makefile
 create mode 100644 src/test/modules/test_bloomfilter/README
 create mode 100644 src/test/modules/test_bloomfilter/expected/test_bloomfilter.out
 create mode 100644 src/test/modules/test_bloomfilter/sql/test_bloomfilter.sql
 create mode 100644 src/test/modules/test_bloomfilter/test_bloomfilter--1.0.sql
 create mode 100644 src/test/modules/test_bloomfilter/test_bloomfilter.c
 create mode 100644 src/test/modules/test_bloomfilter/test_bloomfilter.control

diff --git a/src/backend/lib/Makefil

Re: Transform for pl/perl

2017-12-07 Thread Peter Eisentraut
On 12/1/17 13:15, Tom Lane wrote:
> Robert Haas  writes:
>> On Fri, Dec 1, 2017 at 12:30 AM, Michael Paquier
>>  wrote:
>>> Patch moved to CF 2018-01. Perhaps a committer will look at it at some 
>>> point.
> 
>> FWIW, although I like the idea, I'm not going to work on the patch.  I
>> like Perl, but I neither know its internals nor use plperl.  I hope
>> someone else will be interested.
> 
> I've been assuming (and I imagine other committers think likewise) that
> Peter will pick this up at some point, since the whole transform feature
> was his work to begin with.  If he doesn't want to touch it, maybe he
> should say so explicitly so that other people will feel free to take it.

I'll take a look.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Speeding up pg_upgrade

2017-12-07 Thread Stephen Frost
Tom, David,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Well, if it's dropped, I think we need to make sure that users are aware
> > of that going in and that's why I was suggesting a switch.  If you've
> > got a better idea for that, great, but having certain pg_upgrade
> > migrations require running ANALYZE and some migrations not require it is
> > something we need to make users *very* clear about.  No, I don't think a
> > note in the release notes is really enough..
> 
> Seems like we could make this reasonably transparent if pg_upgrade
> continues to emit an analyze script that you're supposed to run
> afterwards.  It just has to vary how much that script does.

That's post-upgrade though, meaning it's entirely likely to catch
someone by surprise.  That's the issue that I have with it.

That said, I'm remembering that there's a '--check' option to
pg_upgrade.  Let's make sure to include in that --check output what
additional post-upgrade work will have to happen, that'll work well
enough to deal with this issue and I have little sympathy for people who
run pg_upgrade blindly and without running --check first, given that
there's an option for it.

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Thu, Dec 7, 2017 at 12:04 PM, Stephen Frost  wrote:
> > If you've
> > got a better idea for that, great, but having certain pg_upgrade
> > migrations require running ANALYZE and some migrations not require it is
> > something we need to make users *very* clear about.  No, I don't think a
> > note in the release notes is really enough..
> 
> Ideally this would be covered by:
> 
> """
> 13. Post-Upgrade processing
> 
> If any post-upgrade processing is required, pg_upgrade will issue warnings
> as it completes. It will also generate script files that must be run by the
> administrator. The script files will connect to each database that needs
> post-upgrade processing. Each script should be run using: ...
> """
> 
> https://www.postgresql.org/docs/10/static/pgupgrade.html
> 
> Then whether is instructed as per #14 would be conditional.
> 
> Its arguable whether a warning/notice that "ANALYZE doesn't need to be run"
> message would be useful.

I'm fine with updating the docs to make it conditional but our users
need an easy way to figure out if their particular from/to PG versions
require ANALYZE to be run or not, so they're able to prepare for it.
Having that be included in the --check output would work to handle that
nicely.

Also, really, we should update those steps to have --check run earlier,
imv, and before saying "stop the clusters".

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: Speeding up pg_upgrade

2017-12-07 Thread David G. Johnston
On Thu, Dec 7, 2017 at 12:04 PM, Stephen Frost  wrote:

> If you've
> got a better idea for that, great, but having certain pg_upgrade
> migrations require running ANALYZE and some migrations not require it is
> something we need to make users *very* clear about.  No, I don't think a
> note in the release notes is really enough..
>

​Ideally this would be covered by:

"""
13. Post-Upgrade processing

​If any post-upgrade processing is required, pg_upgrade will issue warnings
as it completes. It will also generate script files that must be run by the
administrator. The script files will connect to each database that needs
post-upgrade processing. Each script should be run using: ...
"""

https://www.postgresql.org/docs/10/static/pgupgrade.html

Then whether is instructed as per #14 would be conditional.

Its arguable whether a warning/notice that "ANALYZE doesn't need to be run"
message would be useful.

David J.


Re: Speeding up pg_upgrade

2017-12-07 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Yeah, there's that.  But the rate of change in pg_statistic hasn't been
>> *that* large.  Alvaro might be right that we can design some transmission
>> procedure that allows stats to be forward-migrated when compatible and
>> dropped when not.

> Well, if it's dropped, I think we need to make sure that users are aware
> of that going in and that's why I was suggesting a switch.  If you've
> got a better idea for that, great, but having certain pg_upgrade
> migrations require running ANALYZE and some migrations not require it is
> something we need to make users *very* clear about.  No, I don't think a
> note in the release notes is really enough..

Seems like we could make this reasonably transparent if pg_upgrade
continues to emit an analyze script that you're supposed to run
afterwards.  It just has to vary how much that script does.

regards, tom lane



Re: plpgsql test layout

2017-12-07 Thread Peter Eisentraut
On 11/14/17 11:51, Pavel Stehule wrote:
> One option would be to create a new test setup under src/pl/pgsql(/src)
> and move some of the test material from the main test suite there.  Some
> of the test cases in the main test suite are more about SPI and triggers
> and such, so it makes sense to keep these in the main line.  Of course
> finding the cut-off might be hard.  Or maybe we'll just start with new
> stuff from now on.
> 
> Any thoughts?
> 
> +1

Here is a first attempt.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 00821d8a460f5e4f39376a87531cfd6be2354684 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 7 Dec 2017 14:03:29 -0500
Subject: [PATCH] Start a separate test suite for plpgsql

The plpgsql.sql test file in the main regression tests is now by far the
largest after numeric_big, making editing and managing the test cases
very cumbersome.  The other PLs have their own test suites split up into
smaller files by topic.  It would be nice to have that for plpgsql as
well.  So, to get that started, set up test infrastructure in
src/pl/plpgsql/src/ and split out the recently added procedure test
cases into a new file there.  That file now mirrors the test cases added
to the other PLs, making managing those matching tests a bit easier too.
---
 src/pl/plpgsql/src/.gitignore|  3 ++
 src/pl/plpgsql/src/Makefile  | 14 
 src/pl/plpgsql/src/expected/plpgsql_call.out | 41 +++
 src/pl/plpgsql/src/sql/plpgsql_call.sql  | 47 ++
 src/test/regress/expected/plpgsql.out| 41 ---
 src/test/regress/sql/plpgsql.sql | 49 
 6 files changed, 105 insertions(+), 90 deletions(-)
 create mode 100644 src/pl/plpgsql/src/expected/plpgsql_call.out
 create mode 100644 src/pl/plpgsql/src/sql/plpgsql_call.sql

diff --git a/src/pl/plpgsql/src/.gitignore b/src/pl/plpgsql/src/.gitignore
index 92387fa3cb..ff6ac965fd 100644
--- a/src/pl/plpgsql/src/.gitignore
+++ b/src/pl/plpgsql/src/.gitignore
@@ -1,3 +1,6 @@
 /pl_gram.c
 /pl_gram.h
 /plerrcodes.h
+/log/
+/results/
+/tmp_check/
diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile
index 95348179ac..64991c3115 100644
--- a/src/pl/plpgsql/src/Makefile
+++ b/src/pl/plpgsql/src/Makefile
@@ -24,6 +24,8 @@ OBJS = pl_gram.o pl_handler.o pl_comp.o pl_exec.o \
 
 DATA = plpgsql.control plpgsql--1.0.sql plpgsql--unpackaged--1.0.sql
 
+REGRESS = plpgsql_call
+
 all: all-lib
 
 # Shared library stuff
@@ -65,6 +67,18 @@ pl_gram.c: BISONFLAGS += -d
 plerrcodes.h: $(top_srcdir)/src/backend/utils/errcodes.txt 
generate-plerrcodes.pl
$(PERL) $(srcdir)/generate-plerrcodes.pl $< > $@
 
+
+check: submake
+   $(pg_regress_check) $(REGRESS_OPTS) $(REGRESS)
+
+installcheck: submake
+   $(pg_regress_installcheck) $(REGRESS_OPTS) $(REGRESS)
+
+.PHONY: submake
+submake:
+   $(MAKE) -C $(top_builddir)/src/test/regress pg_regress$(X)
+
+
 distprep: pl_gram.h pl_gram.c plerrcodes.h
 
 # pl_gram.c, pl_gram.h and plerrcodes.h are in the distribution tarball,
diff --git a/src/pl/plpgsql/src/expected/plpgsql_call.out 
b/src/pl/plpgsql/src/expected/plpgsql_call.out
new file mode 100644
index 00..d0f35163bc
--- /dev/null
+++ b/src/pl/plpgsql/src/expected/plpgsql_call.out
@@ -0,0 +1,41 @@
+--
+-- Tests for procedures / CALL syntax
+--
+CREATE PROCEDURE test_proc1()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+NULL;
+END;
+$$;
+CALL test_proc1();
+-- error: can't return non-NULL
+CREATE PROCEDURE test_proc2()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+RETURN 5;
+END;
+$$;
+CALL test_proc2();
+ERROR:  cannot return a value from a procedure
+CONTEXT:  PL/pgSQL function test_proc2() while casting return value to 
function's return type
+CREATE TABLE test1 (a int);
+CREATE PROCEDURE test_proc3(x int)
+LANGUAGE plpgsql
+AS $$
+BEGIN
+INSERT INTO test1 VALUES (x);
+END;
+$$;
+CALL test_proc3(55);
+SELECT * FROM test1;
+ a  
+
+ 55
+(1 row)
+
+DROP PROCEDURE test_proc1;
+DROP PROCEDURE test_proc2;
+DROP PROCEDURE test_proc3;
+DROP TABLE test1;
diff --git a/src/pl/plpgsql/src/sql/plpgsql_call.sql 
b/src/pl/plpgsql/src/sql/plpgsql_call.sql
new file mode 100644
index 00..38fd220e8f
--- /dev/null
+++ b/src/pl/plpgsql/src/sql/plpgsql_call.sql
@@ -0,0 +1,47 @@
+--
+-- Tests for procedures / CALL syntax
+--
+
+CREATE PROCEDURE test_proc1()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+NULL;
+END;
+$$;
+
+CALL test_proc1();
+
+
+-- error: can't return non-NULL
+CREATE PROCEDURE test_proc2()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+RETURN 5;
+END;
+$$;
+
+CALL test_proc2();
+
+
+CREATE TABLE test1 (a int);
+
+CREATE PROCEDURE test_proc3(x int)
+LANGUAGE plpgsql
+AS $$
+BEGIN
+INSERT INTO test1 VALUES (x);
+END;
+$$;
+
+CALL test_proc3(55);
+
+SELECT * FROM test1;
+
+
+DROP PROCEDURE test_proc1;
+DROP PROCEDURE test_pr

Re: Speeding up pg_upgrade

2017-12-07 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > If we go down that route, since this makes a pretty serious difference
> > in terms of what the user has to deal with post-pg_upgrade, I'd suggest
> > we require an additional option for the user to pass when stats aren't
> > going to be migrated, so they are aware of that.
> 
> -1 ... you are forgetting that a lot of systems wrap pg_upgrade in some
> sort of vendor-supplied upgrade script.  Nanny switches don't help;
> the vendors will just start passing them automatically.

That really depends on the packagers.

> > Of course, this might end up having an entirely different effect: it
> > might mean that we're suddenly a lot shier about changing the stats in a
> > backwards-incompatible way, just as we now are basically stuck with the
> > existing on-disk heap format..
> 
> Yeah, there's that.  But the rate of change in pg_statistic hasn't been
> *that* large.  Alvaro might be right that we can design some transmission
> procedure that allows stats to be forward-migrated when compatible and
> dropped when not.

Well, if it's dropped, I think we need to make sure that users are aware
of that going in and that's why I was suggesting a switch.  If you've
got a better idea for that, great, but having certain pg_upgrade
migrations require running ANALYZE and some migrations not require it is
something we need to make users *very* clear about.  No, I don't think a
note in the release notes is really enough..

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: Speeding up pg_upgrade

2017-12-07 Thread Tom Lane
Stephen Frost  writes:
> If we go down that route, since this makes a pretty serious difference
> in terms of what the user has to deal with post-pg_upgrade, I'd suggest
> we require an additional option for the user to pass when stats aren't
> going to be migrated, so they are aware of that.

-1 ... you are forgetting that a lot of systems wrap pg_upgrade in some
sort of vendor-supplied upgrade script.  Nanny switches don't help;
the vendors will just start passing them automatically.

> Of course, this might end up having an entirely different effect: it
> might mean that we're suddenly a lot shier about changing the stats in a
> backwards-incompatible way, just as we now are basically stuck with the
> existing on-disk heap format..

Yeah, there's that.  But the rate of change in pg_statistic hasn't been
*that* large.  Alvaro might be right that we can design some transmission
procedure that allows stats to be forward-migrated when compatible and
dropped when not.

regards, tom lane



Re: Mention ordered datums in PartitionBoundInfoData comment

2017-12-07 Thread Robert Haas
On Tue, Dec 5, 2017 at 1:03 AM, Ashutosh Bapat
 wrote:
> Sorry. Thanks for pointing it out. fixed in the attached patch.

+ * The datums in datums array are arranged in the increasing order defined by

Suggest: in increasing order as defined

There's a second place where the same change is needed.

+ * resp. For range and list partitions this simply means that the datums in the

I think you should spell out "respectively" instead of abbreviating to "resp".

+ * datums array are arranged in the increasing order defined by the partition
+ * key collation.

It's not just the collation but also, and I think more importantly,
the operator class.   And there can be multiple columns, and thus
multiple opclases/collations.  Maybe "defined by the partition key's
operator classes and collations".

+ * PartitionBoundInfoData structures for two partitioned tables with exactly
+ * same bounds look exactly same.

This doesn't seem to me to add much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Speeding up pg_upgrade

2017-12-07 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> The reason pg_upgrade hasn't done that in the past is not wishing to
>> assume that the new version does stats identically to the old version.
>> Since we do in fact add stats or change stuff around from time to time,
>> that's not a negligible consideration.

> Sure, but the new version can probably limp along with incomplete stats
> until the next natural ANALYZE runs -- the system is operational in much
> shorter time than if you have to make it all wait for the post-upgrade
> full-database ANALYZE run.  The serialization step is so that the
> underlying representation doesn't have to remain identical -- surely the
> new server would be able to represent whatever the old server was able
> to, regardless of any improvement made.

Well, this is assuming a lot of stuff not in evidence about how the
"serialization format" is designed and how we insert the data in the
new installation.  But if you think you can come up with something
that can handle such cases, go for it.

(In the spirit of full disclosure, I actually wrote code that allowed
dumping and reloading stats while I was at Salesforce.  But I've forgotten
the details of that design, and anyway I'm pretty sure it didn't handle
any cross-version scenarios, else I probably would have offered it to
the community.)

regards, tom lane



Re: Logical replication without a Primary Key

2017-12-07 Thread Robert Haas
On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek
 wrote:
> No it won't, it will update only one row, it does not try to find
> multiple matching rows.

Good, because that's exactly what it should do.  I mean, if you have
on the master two tuples that are identical, and you update one of
them, then the replica had better update exactly one of them as well.
Since they are identical, it doesn't matter *which* one gets updated
on the replica, but if you update *both* of them on the replica, then
things are out of sync.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: Support Parallel Append plan nodes.

2017-12-07 Thread Robert Haas
On Thu, Dec 7, 2017 at 6:01 AM, Amit Khandekar  wrote:
> I have not managed to make the regression test cases execute the above
> not-covered case. I could do that with my local test that I have, but
> that test has lots of data, so it is slow, and not suitable for
> regression. In select_parallel.sql, by the time a worker w1 gets into
> the function choose_next_subplan_for_worker(), an earlier worker w2
> must have already wrapped around the pa_next_plan at place_2, so this
> worker doesn't get a chance to hit place_1. It's a timing issue. I
> will see if I can solve this.

Well, it seems to me that just running the test case with
parallel_leader_participation = off might be good enough.  It would
hit a decent chunk of this logic, and that's better than what we have
today.  If we're determined to hit the wraparound case the way to do
it is probably to make the first child of the append something that
takes a long time to execute and the second one something quick.  But,
as you say, it's hard to do such things in the regression tests
without making them slow, and I have observed Tom to dislike things
that make the regression tests slow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Speeding up pg_upgrade

2017-12-07 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > It seems pretty clear to me that we should somehow transfer stats from
> > the old server to the new one.  Shouldn't it just be a matter of
> > serializing the MCV/histogram/ndistinct values, then have capabilities
> > to load on the new server?
> 
> The reason pg_upgrade hasn't done that in the past is not wishing to
> assume that the new version does stats identically to the old version.
> Since we do in fact add stats or change stuff around from time to time,
> that's not a negligible consideration.

Sure, but the new version can probably limp along with incomplete stats
until the next natural ANALYZE runs -- the system is operational in much
shorter time than if you have to make it all wait for the post-upgrade
full-database ANALYZE run.  The serialization step is so that the
underlying representation doesn't have to remain identical -- surely the
new server would be able to represent whatever the old server was able
to, regardless of any improvement made.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com

2017-12-07 Thread Robert Haas
On Wed, Dec 6, 2017 at 1:05 AM, Amit Kapila  wrote:
> Right and seeing that I have prepared the patch (posted above [1])
> which fixes it such that it will resemble the non-parallel case.
> Ideally, it would have obviated the need for my previous patch which
> got committed as 778e78ae.  However, now that is committed, I could
> think of below options:
>
> 1. I shall rebase it atop what is committed and actually, I have done
> that in the attached patch.  I have also prepared a regression test
> case patch just to show the output with and without the patch.
> 2. For sort node, we can fix it by having some local_info same as
> shared_info in sort node and copy the shared_info in that or we could
> reinstate the pointer to the DSM in ExecSortReInitializeDSM() by
> looking it up in the TOC as suggested by Thomas. If we go this way,
> then we need a similar fix for hash node as well.

Well, the patch you've actually attached makes the bug go away by
removing a net of 53 lines of code.  The other approach would probably
add code.  So I am tempted to go with the approach you have here.  I
would probably change the T_HashState and T_SortState cases in
ExecParallelReInitializeDSM so that they still exist, but just do
something like this:

case T_HashState:
case T_SortState:
/* these nodes have DSM state, but no reinitialization is required */
break;

That way, it will be more clear to future readers of this code that
the lack of a reinitialize function is not an oversight, and the
compiler should optimize these cases away, merging them with the
default case.

I was a little worried when I first opened this patch that it might be
imposing a one-size-fits-all solution; just because sort and hash want
to report details from the last execution, there could be some other
executor node that wants to do otherwise.  But on reflection, that's
just fine: an individual executor node is free to accumulate stats
across rescans if it wants to do so.  It's merely that sort and hash
don't want to do that.  In fact, it's really the current system that
is imposing a straightjacket: no matter what the individual node types
choose to do, rescans are a pile of fail.

Long story short, I like the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Speeding up pg_upgrade

2017-12-07 Thread Stephen Frost
Robert, all,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Dec 7, 2017 at 11:42 AM, Tom Lane  wrote:
> > Alvaro Herrera  writes:
> >> It seems pretty clear to me that we should somehow transfer stats from
> >> the old server to the new one.  Shouldn't it just be a matter of
> >> serializing the MCV/histogram/ndistinct values, then have capabilities
> >> to load on the new server?
> >
> > The reason pg_upgrade hasn't done that in the past is not wishing to
> > assume that the new version does stats identically to the old version.
> > Since we do in fact add stats or change stuff around from time to time,
> > that's not a negligible consideration.
> 
> Yes, but we don't do that for every release.  We could put rules into
> pg_upgrade about which releases changed the stats format incompatibly,
> and not transfer the stats when crossing between two releases with
> incompatible formats.  That's more than zero effort, of course, but it
> might be worth it.  We've already got CATALOG_VERSION_NO,
> XLOG_PAGE_MAGIC, PG_CONTROL_VERSION, PG_PROTOCOL_LATEST,
> BTREE_VERSION, HASH_VERSION, BRIN_CURRENT_VERSION,
> GIN_CURRENT_VERSION, LOGICALREP_PROTO_VERSION_NUM,
> PG_PAGE_LAYOUT_VERSION, PG_DATA_CHECKSUM_VERSION, K_VERS_MAJOR,
> K_VERS_MINOR, K_VERS_REV, and the utterly unused MIGRATOR_API_VERSION.

If we go down that route, since this makes a pretty serious difference
in terms of what the user has to deal with post-pg_upgrade, I'd suggest
we require an additional option for the user to pass when stats aren't
going to be migrated, so they are aware of that.

The concern I have hear is that we end up changing things in v13 and
suddenly everyone has to re-analyze but they didn't to go from 10->11 or
11->12 and they'll get caught off-guard by it.

Of course, this might end up having an entirely different effect: it
might mean that we're suddenly a lot shier about changing the stats in a
backwards-incompatible way, just as we now are basically stuck with the
existing on-disk heap format..

> Now, I have to admit that I find the process of trying to remember to
> bump the correct set of version numbers in every commit just a tad
> frustrating; it adds a cognitive burden I'd just as well skip.

Agreed, would be great if we could improve on this.

> However, the failure to transfer stats over the years seems to have
> actually caused real problems for many users, so I think in this case
> we might be best off sucking it up and adding one more version number.

Yes, it's definitely been an issue for users.

> We might even want to make it a little more fine-grained and track it
> separately by data type, but I'm not sure if that's really worth it.

This would have the nice property that we could just re-analyze the data
types where things changed, something that's more likely to happen with
new data types than existing ones, I'd guess, and so that might be much
more reasonable for users.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: Speeding up pg_upgrade

2017-12-07 Thread Robert Haas
On Thu, Dec 7, 2017 at 11:42 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> It seems pretty clear to me that we should somehow transfer stats from
>> the old server to the new one.  Shouldn't it just be a matter of
>> serializing the MCV/histogram/ndistinct values, then have capabilities
>> to load on the new server?
>
> The reason pg_upgrade hasn't done that in the past is not wishing to
> assume that the new version does stats identically to the old version.
> Since we do in fact add stats or change stuff around from time to time,
> that's not a negligible consideration.

Yes, but we don't do that for every release.  We could put rules into
pg_upgrade about which releases changed the stats format incompatibly,
and not transfer the stats when crossing between two releases with
incompatible formats.  That's more than zero effort, of course, but it
might be worth it.  We've already got CATALOG_VERSION_NO,
XLOG_PAGE_MAGIC, PG_CONTROL_VERSION, PG_PROTOCOL_LATEST,
BTREE_VERSION, HASH_VERSION, BRIN_CURRENT_VERSION,
GIN_CURRENT_VERSION, LOGICALREP_PROTO_VERSION_NUM,
PG_PAGE_LAYOUT_VERSION, PG_DATA_CHECKSUM_VERSION, K_VERS_MAJOR,
K_VERS_MINOR, K_VERS_REV, and the utterly unused MIGRATOR_API_VERSION.
Now, I have to admit that I find the process of trying to remember to
bump the correct set of version numbers in every commit just a tad
frustrating; it adds a cognitive burden I'd just as well skip.
However, the failure to transfer stats over the years seems to have
actually caused real problems for many users, so I think in this case
we might be best off sucking it up and adding one more version number.

We might even want to make it a little more fine-grained and track it
separately by data type, but I'm not sure if that's really worth it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

2017-12-07 Thread Robert Haas
On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik
 wrote:
> I'm afraid that creating a function that implements quite different
> algorithms depending on a global parameter seems very hacky and would lead
> to misunderstandings. I do understand the need of backward compatibility,
> but I'd opt for the lesser evil. Perhaps a good idea would be to change the
> name to 'substring_similarity()' and introduce the new function
> 'word_similarity()' later, for example in the next major version release.

That breaks things for everybody using word_similarity() currently.
If the previous discussion of this topic concluded that
word_similarity() was an OK name despite being a slight misnomer, I
don't think we should change our mind now.  Instead the new function
can be called something which makes the difference clear, e.g.
strict_word_similarity(), and the old function can remain as it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Postgres with pthread

2017-12-07 Thread Robert Haas
On Wed, Dec 6, 2017 at 10:20 PM, Craig Ringer  wrote:
> Personally I think it's a pity we didn't land up here before the foundations
> for parallel query went in - DSM, shm_mq, DSA, etc. I know the EDB folks at
> least looked into it though, and presumably there were good reasons to go in
> this direction. Maybe that was just "community will never accept threaded
> conversion" at the time, though.

Yep.  Never is a long time, but it took 3 release cycles to get a
user-visible feature as it was, and if I'd tried to insist on a
process->thread conversion first I suspect we'd still be stuck on that
point today.  Perhaps we would have gotten as far as getting that much
done, but that wouldn't make parallel query be done on top of it.

> Now we have quite a lot of homebrew infrastructure to consider if we do a
> conversion.
>
> That said, it might in some ways make it easier. shm_mq, for example, would
> likely convert to a threaded backend with minimal changes to callers, and
> probably only limited changes to shm_mq its self. So maybe these
> abstractions will prove to have been a win in some ways. Except DSA, and
> even then it could serve as a transitional API...

Yeah, I don't feel too bad about what we've built.  Even if it
ultimately goes away, it will have served the useful purpose of
proving that parallel query is a good idea and can work.  Besides,
shm_mq is just a ring buffer for messages; that's not automatically
something that we don't want just because we move to threads.  If it
goes away, which I think not unlikely, it'll be because something else
is faster.

Also, it's not as if only parallel query structures might have been
designed differently if we had been using threads all along.
dynahash, for example, is quite unlike most concurrent hash tables and
a big part of the reason is that it has to cope with being situated in
a fixed-size chunk of shared memory.  More generally, the whole reason
there's no cheap, straightforward palloc_shared() is the result of the
current design, and it seems very unlikely we wouldn't have that quite
apart from parallel query.  Install pg_stat_statements without a
server restart?  Yes, please.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [BUGS] pg_trgm word_similarity inconsistencies or bug

2017-12-07 Thread François CHAHUNEAU
Hello Alexander,
This is fine with us. Yes, separate thresholds seem preferable.
Best Regards

Obtenez Outlook pour iOS

From: Alexander Korotkov 
Sent: Thursday, December 7, 2017 4:38:59 PM
To: Jan Przemysław Wójcik; Cristiano Coelho
Cc: pgsql-b...@postgresql.org; François CHAHUNEAU; Artur Zakirov; pgsql-hackers
Subject: Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

On Tue, Nov 7, 2017 at 7:24 PM, Alexander Korotkov 
mailto:a.korot...@postgrespro.ru>> wrote:
On Tue, Nov 7, 2017 at 3:51 PM, Jan Przemysław Wójcik 
mailto:jan.przemyslaw.woj...@gmail.com>> wrote:
my statement about the function usefulness was probably too categorical,
though I had in mind the current name of the function.

I'm afraid that creating a function that implements quite different
algorithms depending on a global parameter seems very hacky and would lead
to misunderstandings. I do understand the need of backward compatibility,
but I'd opt for the lesser evil. Perhaps a good idea would be to change the
name to 'substring_similarity()' and introduce the new function
'word_similarity()' later, for example in the next major version release.

Good point.  I've no complaints about that.  I'm going to propose corresponding 
patch to the next commitfest.

I've written a draft patch for fixing this inconsistency.  Please, find it in 
attachment.  This patch doesn't contain proper documentation and comments yet.

I've called existing behavior subset_similarity().  I didn't use name 
substring_similarity(), because it doesn't really looking for substring with 
appropriate padding, but rather searching for continuous subset of trigrams.  
For index search over subset similarity, %>>, <<%, <->>>, <<<-> operators are 
provided.  I've added extra arrow sign to denote these operators look deeper 
into string.

Simultaneously, word_similarity() now forces extent bounds to be word bounds.  
Now word_similarity() behaves similar to my_word_similarity() proposed on 
stackoverlow.

# with data(t) as (
values
('message'),
('message s'),
('message sag'),
('message sag sag'),
('message sag sage')
)
select t, subset_similarity('sage', t), word_similarity('sage', t)
from data;
t | subset_similarity | word_similarity
--+---+-
 message  |   0.6 | 0.3
 message s|   0.8 |0.363636
 message sag  | 1 | 0.5
 message sag sag  | 1 | 0.5
 message sag sage | 1 |   1
(5 rows)

The difference here is only in 'messsage s' row, because word_similarity() 
allows matching one word to two or more while my_word_similarity() doesn't 
allow that.  In this case word_similarity() returns similarity between 'sage' 
and 'message s'.

# select similarity('sage', 'message s');
 similarity

   0.363636
(1 row)

I think behavior of word_similarity() appears better here, because typo can 
break word into two.

I also wonder if word_similarity() and subset_similarity() should share same 
threshold value for indexed search.  subset_similarity() typically returns 
higher values than word_similarity().  Thus, it's probably makes sense to split 
their threshold values.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Signals in a BGW

2017-12-07 Thread Robert Haas
On Wed, Dec 6, 2017 at 6:59 PM, Chapman Flack  wrote:
>> The default handler is bgworker_die ; see src/backend/postmaster
>> /bgworker.c
>> . I don't know if elog() is safe in a signal handler, but I guess in
>> the absence of non-reentrant errcontext functions...
>
> That does seem bold, doesn't it?

Yes, I think it's flat busted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Speeding up pg_upgrade

2017-12-07 Thread Tom Lane
Alvaro Herrera  writes:
> It seems pretty clear to me that we should somehow transfer stats from
> the old server to the new one.  Shouldn't it just be a matter of
> serializing the MCV/histogram/ndistinct values, then have capabilities
> to load on the new server?

The reason pg_upgrade hasn't done that in the past is not wishing to
assume that the new version does stats identically to the old version.
Since we do in fact add stats or change stuff around from time to time,
that's not a negligible consideration.

regards, tom lane



Re: Speeding up pg_upgrade

2017-12-07 Thread Justin Pryzby
On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote:
> As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> zero-downtime upgrades.  ... we discussed speeding up pg_upgrade.
> 
> There are clusters that take a long time to dump the schema from the old
> cluster

Maybe it isn't representative of a typical case, but I can offer a data point:

For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB
(for which ~90% is on a ZFS tablespace with compression).  We have what's
traditionally considered to be an excessive number of child tables, which works
okay since planning time is unimportant to us for the report queries which hit
them.  Some of the tables are wide (historically up to 1600 columns).  Some of
those have default values on nearly every column, and pg_attrdef was large
(>500MB), causing pg_dump --section pre-data to be slow (10+ minutes).  Since
something similar is run by pg_upgrade, I worked around the issue for now by
dropping defaults on the historic children in advance of upgrades (at some
point I'll figure out what I have to do to allow DROPing DEFAULTs).  It's not
the first time we've seen an issue with larger number of children*columns.

Our slowest pg-upgrade was ~40min, caused by column defaults in a case where I
failed to re-DROP DEFAULTs after our first scheduled upgrade date was pushed
back by over a month.  Most of the rest were completed in less than 15min.

Justin



Re: pgsql: When VACUUM or ANALYZE skips a concurrently dropped table, log i

2017-12-07 Thread Robert Haas
On Wed, Dec 6, 2017 at 9:42 PM, Bossart, Nathan  wrote:
> On 12/6/17, 8:25 PM, "Robert Haas"  wrote:
>> Please give me a little time to see if I can speed up this test enough
>> to fix this problem.  If that doesn't work out, then we can rip this
>> out.
>
> Just in case it got missed earlier, here’s a patch that speeds it
> up enough to pass with CLOBBER_CACHE_ALWAYS enabled.  Instead of
> doing database-wide operations, it just uses a partitioned table.

Yeah, that looks like a reasonable approach to try.  Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Speeding up pg_upgrade

2017-12-07 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
> Stephen Frost wrote:
> > * Alexander Kukushkin (cyberd...@gmail.com) wrote:
> 
> > > 2 ANALYZE phase is a pain. I think everybody agrees with it.
> > > 
> > > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
> > > reasonable, except one case: some of the columns might have non default
> > > statistics target.
> > 
> > Ok, if the stage-1 is very fast and performance is reasonable enough
> > after that then perhaps it's not so bad to keep it as-is for now and
> > focus on the dump/restore time.  That said, we should certainly also
> > work on improving this too.
> 
> It seems pretty clear to me that we should somehow transfer stats from
> the old server to the new one.  Shouldn't it just be a matter of
> serializing the MCV/histogram/ndistinct values, then have capabilities
> to load on the new server?  I suppose it'd just be used during binary
> upgrade, but the point seems painful enough for a lot of users.
> Obviously it would not be the raw contents of pg_statistic{,_ext}, but
> rather something a bit higher-level.

Right, I think that's what Bruce was getting at and certainly makes
sense to me as well.  I agree that it's a definite pain point for
people.  One complication is going to be custom data types, of course..

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: Speeding up pg_upgrade

2017-12-07 Thread Alvaro Herrera
Stephen Frost wrote:
> Alexander,

> * Alexander Kukushkin (cyberd...@gmail.com) wrote:

> > 2 ANALYZE phase is a pain. I think everybody agrees with it.
> > 
> > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
> > reasonable, except one case: some of the columns might have non default
> > statistics target.
> 
> Ok, if the stage-1 is very fast and performance is reasonable enough
> after that then perhaps it's not so bad to keep it as-is for now and
> focus on the dump/restore time.  That said, we should certainly also
> work on improving this too.

It seems pretty clear to me that we should somehow transfer stats from
the old server to the new one.  Shouldn't it just be a matter of
serializing the MCV/histogram/ndistinct values, then have capabilities
to load on the new server?  I suppose it'd just be used during binary
upgrade, but the point seems painful enough for a lot of users.
Obviously it would not be the raw contents of pg_statistic{,_ext}, but
rather something a bit higher-level.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Speeding up pg_upgrade

2017-12-07 Thread Stephen Frost
Alexander,

* Alexander Kukushkin (cyberd...@gmail.com) wrote:
> Couple of months ago we at Zalando upgraded a few databases of different
> sizes to 9.6.

Thanks for sharing your experience!

> During preparations to the I've found 2.5 pain-points:
> 
> 1. We are using schema-based api deployment. Basically ~every week we
> create a new schema in the database and hundreds of stored procedures in it.
> Off course we remove old API schemas and trying not to keep more than
> last 10. Before the upgrade we basically dropped all API schemas except the
> one used in production.
> And even in this case dump-restore phase was taking much more time than
> relinking of datafiles.
> Unfortunately I don't have any numbers right now, but usually run of
> pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was
> spend in dump-restore.

Ok, so eliminating 2/3 of the time would mean bringing it down to more
like 10 seconds.  That certainly seems worthwhile to me.  With the
linking time being much less than the dump/restore, we could at least
consider moving forward with Bruce's original idea where we do the
dump/restore while the system is online but then the linking with it
offline and get a serious performance boost out of it.  That also avoids
the issue with new files showing up while the system is running that I
brought up when we were talking about having the linking done with the
system online.

> 2 ANALYZE phase is a pain. I think everybody agrees with it.
> 
> 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
> reasonable, except one case: some of the columns might have non default
> statistics target.

Ok, if the stage-1 is very fast and performance is reasonable enough
after that then perhaps it's not so bad to keep it as-is for now and
focus on the dump/restore time.  That said, we should certainly also
work on improving this too.

> It breaks `vacuumdb --analyze-in-stages`, because those specific
> columns it will not use value of default_statistics_target provided by
> vacuumdb.
> What I did - reset those non default values right before running
> pg_upgrade and restored them only when analyze was completed. Off course
> after that I've re-analyze those columns.

Ah, yeah, ouch, that's unfortuante..  I wonder if there's something we
could do there to fix it..

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: Postgres with pthread

2017-12-07 Thread Craig Ringer
On 7 December 2017 at 19:55, Konstantin Knizhnik 
wrote:

>
> Pros:
> 1. Simplified memory model: no need in DSM, shm_mq, DSA, etc
>

shm_mq would remain useful, and the others could only be dropped if you
also dropped process-model support entirely.


> 1. Breaks compatibility with existed extensions and adds more requirements
> for authors of new extension
>

Depends on how much frightening preprocessor magic you're willing to use,
doesn't it? ;)

Wouldn't be surprised if simple extensions (C functions etc) stayed fairly
happy, but it'd be hazardous enough in terms of library use etc that
deliberate breakage may be beter.


> 2. Problems with integration of single-threaded PLs: Python, Lua,...
>

Yeah, that's going to hurt. Especially since most non-plpgsql code out
there will be plperl and plpython. Breaking that's not going to be an
option, but nobody's going to be happy if all postgres backends must
contend for the same Python GIL. Plus it'd be deadlock-city.

That's nearly a showstopper right there. Especially since with a quick look
around it looks like the cPython GIL is per-DLL (at least on Windows) not
per-interpreter-state, so spawning separate interpreter states per-thread
may not be sufficient. That makes sense given that cPython its self is
thread-aware; otherwise it'd have a really hard time figuring out which GIL
and interpreter state to look at when in a cPython-spawned thread.


> 3. Worser protection from programming errors, included errors in
> extensions.
>

Mainly contaminating memory of unrelated procesess, or the postmaster.

I'm not worried about outright crashes. On any modern system it's not
significantly worse to take down the postmaster than it is to have it do
its own recovery. A modern init will restart it promptly. (If you're not
running postgres under an init daemon for production then... well, you
should be.)


> 4. Lack of explicit separation of shared and privite memory leads to more
> synchronization errors.
>

Accidentally clobbering postmaster memory/state would be my main worry
there.

Right now we gain a lot of protection from our copy-on-write
shared-nothing-by-default model, and we rely on it in quite a lot of places
where backends merrily stomp on inherited postmaster state.

The more I think about it, the less enthusiastic I am, really.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Logical replication without a Primary Key

2017-12-07 Thread Petr Jelinek
On 07/12/17 15:32, Joshua D. Drake wrote:
> On 12/07/2017 05:30 AM, Peter Eisentraut wrote:
>>
>>> How does that work? Is it using one of the hidden columns on a row?
>> It means that for example if an update record is produced, the entire
>> row is included in the record as the key.
> 
> Thanks Peter, Craig also responded,
> 
> The confusion I have is what if we have two rows that are identical and
> now that I think about it we would just update both rows, yes? That
> would make sense because it would produce two updated rows.
> 

No it won't, it will update only one row, it does not try to find
multiple matching rows.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Logical replication without a Primary Key

2017-12-07 Thread David G. Johnston
On Thursday, December 7, 2017, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 12/6/17 19:03, Joshua D. Drake wrote:
>
> >
> > How does that work? Is it using one of the hidden columns on a row?
>
> It means that for example if an update record is produced, the entire
> row is included in the record as the key.
>
>
IOW, IIUC, whether defined or not the user data portion of the table must,
as a whole, provide a natural unique key if you are going to use logical
replication.  If two records only differ in their OID (or maybe
ctid?) value you will have problem.

David J.


Re: Logical replication without a Primary Key

2017-12-07 Thread Craig Ringer
On 7 December 2017 at 22:32, Joshua D. Drake  wrote:

>
> The confusion I have is what if we have two rows that are identical and
> now that I think about it we would just update both rows, yes? That would
> make sense because it would produce two updated rows.
>
>
I expect so, but honestly, maybe it's easier to just add a pg_regress test
to check and demonstrate the behaviour, or do a manual test?

(a pg_regress test would make sense to add anyway, though, and might be
easier)

The only ways you could update only one of two identical rows would be if
you did so by ctid (since if it were a table with oids, the rows wouldn't
be identical anymore). It'd make no sense to send the ctid to the
downstream since it'd bear no relationship to the downstream table. I
expect it'd probably update both rows. If it doesn't, it probably should.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Logical replication without a Primary Key

2017-12-07 Thread Joshua D. Drake

On 12/07/2017 05:30 AM, Peter Eisentraut wrote:



How does that work? Is it using one of the hidden columns on a row?

It means that for example if an update record is produced, the entire
row is included in the record as the key.


Thanks Peter, Craig also responded,

The confusion I have is what if we have two rows that are identical and 
now that I think about it we would just update both rows, yes? That 
would make sense because it would produce two updated rows.


Thanks,

JD







--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.org
* Unless otherwise stated, opinions are my own.   *




Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

2017-12-07 Thread Alexander Korotkov
On Tue, Nov 7, 2017 at 7:24 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Nov 7, 2017 at 3:51 PM, Jan Przemysław Wójcik <
> jan.przemyslaw.woj...@gmail.com> wrote:
>
>> my statement about the function usefulness was probably too categorical,
>> though I had in mind the current name of the function.
>>
>> I'm afraid that creating a function that implements quite different
>> algorithms depending on a global parameter seems very hacky and would lead
>> to misunderstandings. I do understand the need of backward compatibility,
>> but I'd opt for the lesser evil. Perhaps a good idea would be to change
>> the
>> name to 'substring_similarity()' and introduce the new function
>> 'word_similarity()' later, for example in the next major version release.
>>
>
> Good point.  I've no complaints about that.  I'm going to propose
> corresponding patch to the next commitfest.
>

I've written a draft patch for fixing this inconsistency.  Please, find it
in attachment.  This patch doesn't contain proper documentation and
comments yet.

I've called existing behavior subset_similarity().  I didn't use name
substring_similarity(), because it doesn't really looking for substring
with appropriate padding, but rather searching for continuous subset of
trigrams.  For index search over subset similarity, %>>, <<%, <->>>, <<<->
operators are provided.  I've added extra arrow sign to denote these
operators look deeper into string.

Simultaneously, word_similarity() now forces extent bounds to be word
bounds.  Now word_similarity() behaves similar to my_word_similarity()
proposed on stackoverlow.

# with data(t) as (
values
('message'),
('message s'),
('message sag'),
('message sag sag'),
('message sag sage')
)
select t, subset_similarity('sage', t), word_similarity('sage', t)
from data;
t | subset_similarity | word_similarity
--+---+-
 message  |   0.6 | 0.3
 message s|   0.8 |0.363636
 message sag  | 1 | 0.5
 message sag sag  | 1 | 0.5
 message sag sage | 1 |   1
(5 rows)

The difference here is only in 'messsage s' row, because word_similarity()
allows matching one word to two or more while my_word_similarity() doesn't
allow that.  In this case word_similarity() returns similarity between
'sage' and 'message s'.

# select similarity('sage', 'message s');
 similarity

   0.363636
(1 row)

I think behavior of word_similarity() appears better here, because typo can
break word into two.

I also wonder if word_similarity() and subset_similarity() should share
same threshold value for indexed search.  subset_similarity() typically
returns higher values than word_similarity().  Thus, it's probably makes
sense to split their threshold values.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pg-trgm-word-subset-similarity-1.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2017-12-07 Thread Peter Eisentraut
On 12/4/17 10:15, Nikhil Sontakke wrote:
> PFA, latest patch for this functionality.

This probably needs documentation updates for the logical decoding chapter.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical replication without a Primary Key

2017-12-07 Thread Peter Eisentraut
On 12/6/17 19:03, Joshua D. Drake wrote:
> -Hackers,
> 
> In the docs it says:
> 
> "
> If the table does not have any suitable key, then it can be set to 
> replica identity“full”, which means the entire row becomes the key.
> 
> "
> 
> How does that work? Is it using one of the hidden columns on a row?

It means that for example if an update record is produced, the entire
row is included in the record as the key.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: compress method for spgist - 2

2017-12-07 Thread Alexander Korotkov
On Thu, Dec 7, 2017 at 3:17 AM, Nikita Glukhov 
wrote:

> On 06.12.2017 21:49, Alexander Korotkov wrote:
>
> On Wed, Dec 6, 2017 at 6:08 PM, Nikita Glukhov 
> wrote:
>
>> On 05.12.2017 23:59, Alexander Korotkov wrote:
>>
>> On Tue, Dec 5, 2017 at 1:14 PM, Darafei Praliaskouski 
>> wrote:
>>
>>> The following review has been posted through the commitfest application:
>>> make installcheck-world:  not tested
>>> Implements feature:   not tested
>>> Spec compliant:   not tested
>>> Documentation:tested, passed
>>>
>>> I've read the updated patch and see my concerns addressed.
>>>
>>> I'm looking forward to SP-GiST compress method support, as it will allow
>>> usage of SP-GiST index infrastructure for PostGIS.
>>>
>>> The new status of this patch is: Ready for Committer
>>>
>>
>> I went trough this patch.  I found documentation changes to be not
>> sufficient.  And I've made some improvements.
>>
>> In particular, I didn't understand why is reconstructedValue claimed to
>> be of spgConfigOut.leafType while it should be of spgConfigIn.attType both
>> from general logic and code.  I've fixed that.  Nikita, correct me if I'm
>> wrong.
>>
>>
>> I think we are reconstructing a leaf datum, so documentation was correct
>> but the code in spgWalk() and freeScanStackEntry() wrongly used attType
>> instead of attLeafType. Fixed patch is attached.
>>
>
> Reconstructed datum is used for index-only scan.  Thus, it should be
> original indexed datum of attType, unless we have decompress method and
> pass reconstructed datum through it.
>
> But if we have compress method and do not have decompress method then
> index-only scan seems to be impossible.
>

Sorry, I didn't realize that we don't return reconstructed value
immediately, but return only leafValue provided by leafConsistent.  In this
case, leafConsistent is responsible to convert value from
spgConfigOut.leafType to spgConfigIn.attType.

TBH, practical example of SP-GiST opclass with both compress method and
index-only scan support doesn't come to my mind, because compress method is
typically needed when we have lossy representation of index keys.  For
example, in GiST all the opclasses where compress method do useful work use
lossy representation of keys.  Nevertheless, it's good to not cut
possibility of index-only scans when spgConfigOut.leafType !=
spgConfigIn.attType.  And to lay responsibility for conversion on
leafConsistent seems like elegant way to do this.  So, that's OK for me.

Also, I wonder should we check for existence of compress method when
>> attType and leafType are not the same in spgvalidate() function?  We don't
>> do this for now.
>>
>> I've added compress method existence check to spgvalidate().
>>
> It would be nice to evade double calling of config method.  Possible
> option could be to memorize difference between attribute type and leaf type
> in high bit of functionset, which is guaranteed to be free.
>
> I decided to simply set compress method's bit in functionset when this
> method it is not required.
>

Looks good for me.

Now, this patch is ready for committer from my point of view.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Postgres with pthread

2017-12-07 Thread Konstantin Knizhnik



On 07.12.2017 00:58, Thomas Munro wrote:

Using a ton of thread local variables may be a useful stepping stone,
but if we want to be able to separate threads/processes from sessions
eventually then I guess we'll want to model sessions as first class
objects and pass them around explicitly or using a single TLS variable
current_session.


It was my primary intention.
Unfortunately separating all static variables into some kind of session 
context requires much more efforts:

we have to change all accesses to such variables.

But please notice, that from performance point of view access to 
__thread variables is not more expensive then access to static variable or
access to fields of session context structure through current_session.  
And there is no extra space overhead for them.


--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Postgres with pthread

2017-12-07 Thread Konstantin Knizhnik

Hi

On 06.12.2017 20:08, Andres Freund wrote:


4. Rewrite file descriptor cache to be global (shared by all threads).
That one I'm very unconvinced of, that's going to add a ton of new
contention.


Do you mean lock contention because of mutex I used to synchronize 
access to shared file descriptor cache

or contention for file descriptors?
Right now each thread has its own virtual file descriptors, so them are 
not shared between threads.
But there is common LRU, restricting total number of opened descriptors 
in the process.


Actually I have not other choice if I want to support thousands of 
connection.
If each thread has its own private descriptor cache (as it is now for 
processes) and its size is estimated base on open file quota,

then there will be millions of opened file descriptors.

Concerning contention for mutex, I do not think that it is a problem.
At least I have to say that performance (with 100 connections) is 
significantly improved and shows almost the same speed as for 10 
connections

after I have rewritten file descriptor can and made it global
(my original implementation just made all fd.c static variables as 
thread local, so each thread has its separate pool).


It is possible to go further and shared file descriptors between threads 
and use pwrite/pread instead of seek+read/write.

But we still need mutex to implement LRU l2list and free handler list.



Re: Postgres with pthread

2017-12-07 Thread Konstantin Knizhnik

I want to thank everybody for feedbacks and a lot of useful notices.
I am very pleased with interest of community to this topic and will 
continue research in this direction.

Some more comments from my side:

My original intention was to implement some king of built-in connection 
pooling for Postgres: be able to execute several transactions into one 
backend.
It requires use of some kind lightweight multitasking (coroutines). The 
obvious candidate for it is libcore.


In this case we also need to solve the problem with static variables. 
And __thread will not help in this case. We have to collect all static 
variables into some structure (context)
and replace any references to such variable with indirection through 
pointer. It will be much harder to implement than annotating variable 
definitions with __thread:
it will require change of all accesses to variables, so almost all 
Postgres code has to be refactored.


Another problem with this approach is that we need asynchronous disk IO 
for it. Unfortunately this is no good file AIO implementation for Linux.
Certainly we can spawn dedicated IO thread (or threads)  and queue IO 
requests to it. But such architecture seems to become quite complex.


Also cooperative multitasking itself is not able to load all CPU cores. 
So we need to have several physical processes/threads which will execute 
coroutines.
In theory such architecture should provide the best performance and 
scalability (handle hundreds of thousands of client connections). But in 
practice there are a lot of pitfals:
1. Right now each backend has its local relation, catalog and prepared 
statement caches. For large database this caches can be large enough: 
several megabytes.
So such coroutines becomes really not "lightweight". The  obvious 
solution is to have global caches or combine global and local caches. 
But it once again requires significant

changes in postgres.
2. Large number of sessions makes current approach with procarray almost 
unusable: we need to provide some alternative implementation of 
snapshots, for example CSN based.

3. All locking mechanisms have to be rewritten.

So this approach almost exclude possibility of evolution of existed 
postgres code base and requires "revolution": rewriting most of Postgres 
components from scratch and refactoring  almost all other postgres code.

This is why I have to abandon move in this direction.

Replacing processes with threads can be considered just as first step 
and requires changes in many postgres components if we really want to 
get significant advantages from it.
But at least such work can be splitted into several phases and it is 
possible for some time to support both multithreaded and multiprocess 
model in the same codebase.
Below I want to summarize the most important (from my point of view) 
arguments pro/contra multithreaded I got from your feedbacks:


Pros:
1. Simplified memory model: no need in DSM, shm_mq, DSA, etc
2. Efficient integration of PLs supporting multithreaded execution, 
first of all Java
3. Less memory footprint, faster context switching, more efficient use 
of TLB


Contras:
1. Breaks compatibility with existed extensions and adds more 
requirements for authors of new extension

2. Problems with integration of single-threaded PLs: Python, Lua,...
3. Worser protection from programming errors, included errors in extensions.
4. Lack of explicit separation of shared and privite memory leads to 
more synchronization errors.
Right now in Postgres there is strict distinction between shared memory 
and private memory, so it is clear for programmer
whether (s)he is working with shared data and so need some kind of 
synchronization to avoid race condition.

In pthreads all memory is shared and more care is needed to work with it.

So pthreads can help to increase scalability, but still do not help much 
in implementation of built-in connection pooling, autonomous 
transactions,...


Current 50% improvement of select speed for large number of connections 
certainly can not be considered as enough motivation for such radical 
changes of Postgres architecture.
But it is just first step and much more benefits can be obtained by 
adopting Postgres to this model.
It is hard to me to estimate now all complexity of switching to thread 
model and all advantages we can get from it.
First of all I am going to repeat my benchmarks at SMP computers with 
large number of cores (so that 100 or more active backends can be really 
useful even in case of connection pooling).



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Runtime Partition Pruning

2017-12-07 Thread Beena Emerson
Hello,

On Thu, Dec 7, 2017 at 12:52 PM, Beena Emerson  wrote:
>
> 1. Only runtime pruning - David's case1
> explain analyse execute ab_q1 (2,3);
>QUERY PLAN
> -
>  Append  (cost=0.00..395.10 rows=9 width=8) (actual time=0.101..0.101
> rows=0 loops=1)
>Runtime Partition Pruning: ((a = $1) AND (b = $2))
>->  Seq Scan on ab_a1_b1  (cost=0.00..43.90 rows=1 width=8) (never 
> executed)
>  Filter: ((a = $1) AND (b = $2))
>->  Seq Scan on ab_a1_b2  (cost=0.00..43.90 rows=1 width=8) (never 
> executed)
>  Filter: ((a = $1) AND (b = $2))
>->  Seq Scan on ab_a1_b3  (cost=0.00..43.90 rows=1 width=8) (never 
> executed)
>  Filter: ((a = $1) AND (b = $2))
>->  Seq Scan on ab_a2_b1  (cost=0.00..43.90 rows=1 width=8) (never 
> executed)
>  Filter: ((a = $1) AND (b = $2))
>->  Seq Scan on ab_a2_b2  (cost=0.00..43.90 rows=1 width=8) (never 
> executed)
>  Filter: ((a = $1) AND (b = $2))
>->  Seq Scan on ab_a2_b3  (cost=0.00..43.90 rows=1 width=8) (actual
> time=0.007..0.007 rows=0 loops=1)
>  Filter: ((a = $1) AND (b = $2))
>->  Seq Scan on ab_a3_b1  (cost=0.00..43.90 rows=1 width=8) (never 
> executed)
>  Filter: ((a = $1) AND (b = $2))
>->  Seq Scan on ab_a3_b2  (cost=0.00..43.90 rows=1 width=8) (never 
> executed)
>  Filter: ((a = $1) AND (b = $2))
>->  Seq Scan on ab_a3_b3  (cost=0.00..43.90 rows=1 width=8) (never 
> executed)
>  Filter: ((a = $1) AND (b = $2))
>  Planning time: 0.780 ms
>  Execution time: 0.220 ms
> (22 rows)
>
> 2. Runtime pruning after optimizer pruning - David's case 2.
> ((a >= 4) AND (a <= 5)  is used during optimizer pruning and only (a =
> $1) is used for runtime pruning.
> =#  explain (analyse, costs off, summary off) execute ab_q1 (4);
> QUERY PLAN
> ---
>  Append (actual time=0.062..0.062 rows=0 loops=1)
>Runtime Partition Pruning: (a = $1)
>->  Seq Scan on ab_a4 (actual time=0.005..0.005 rows=0 loops=1)
>  Filter: ((a >= 4) AND (a <= 5) AND (a = $1))
>->  Seq Scan on ab_a5 (never executed)
>  Filter: ((a >= 4) AND (a <= 5) AND (a = $1))
> (6 rows)
>

FYI,

The v4 version of the patch accidentally included the
choose_custom_plan hack I had used to force the runtime pruning in the
above cases(1,2), which has been removed in v5. So with only the patch
applied, it would continue to give the output as with the const and
not the Param because the custom plan is preferred over the generic
one. This was pointed out in the initial post of this thread. Just to
compare, I continued using the hack for the tests to show the
behaviour changes.

A different case would need to be used to test the behaviour which
picks the generic plan.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2017-12-07 Thread Beena Emerson
Hello David,

On Thu, Dec 7, 2017 at 4:07 PM, David Rowley
 wrote:
> On 7 December 2017 at 20:22, Beena Emerson  wrote:
>> PFA the updated patch.
>
> Hi Beena,
>
> Thanks for updating this.
>
> Can you list the patches which are required for this to apply to
> today's master branch?
>

Thanks for looking into this.

Currently Amit's v13 patches do not apply on the HEAD and I was
working on 487a0c1518af2f3ae2d05b7fd23d636d687f28f3 which is the last
commit where all Amit's v13 patches applies cleanly.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] path toward faster partition pruning

2017-12-07 Thread David Rowley
On 30 November 2017 at 11:15, Robert Haas  wrote:
> Committed 0004 after reviewing the code and testing that it seems to
> work as advertised.
>
> 0005 looks like it might need to be split into smaller patches.  More
> broadly, the commit messages you wrote for for 0005, 0006, and 0008
> don't seem to me to do a great job explaining the motivation for the
> changes which they make.  They tell me what the patches do, but not
> why they are doing it.  If there's an email in this thread that
> explains that stuff, please point me to it and I'll go back and reread
> it more carefully; if not, I think I definitely need some more
> explanation both of the mission of each patch and the reason why the
> patch set is divided up in the way that it is.

Hi Amit,

It looks like just 0005 to 0008 remain of this and I see that the v13
0005 patch no longer applies to current master.

Are you working on splitting this up as requested by Robert above?

I can continue reviewing this once patches are available that apply to
current master.

Many thanks

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2017-12-07 Thread David Rowley
On 7 December 2017 at 20:22, Beena Emerson  wrote:
> PFA the updated patch.

Hi Beena,

Thanks for updating this.

Can you list the patches which are required for this to apply to
today's master branch?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Transform for pl/perl

2017-12-07 Thread Anthony Bykov
On Thu, 7 Dec 2017 12:54:55 +0300
Anthony Bykov  wrote:

> On Fri, 1 Dec 2017 15:49:21 -0300
> Alvaro Herrera  wrote:
> 
> > A few very minor comments while quickly paging through:
> > 
> > 1. non-ASCII tests are going to cause problems in one platform or
> > another.  Please don't include that one.
> > 
> > 2. error messages
> >a) please use ereport() not elog()
> >b) conform to style guidelines: errmsg() start with lowercase,
> > others are complete phrases (start with uppercase, end with period)
> >c) replace
> >   "The type you was trying to transform can't be represented in
> > JSONB" maybe with
> >   errmsg("could not transform to type \"%s\"", "jsonb"),
> >   errdetail("The type you are trying to transform can't be
> > represented in JSON") d) same errmsg() for the other error; figure
> > out suitable errdetail.
> > 
> > 3. whitespace: don't add newlines to while, DirectFunctionCallN,
> > pnstrdup.
> > 
> > 4. the "relocatability" test seems pointless to me.
> > 
> > 5. "#undef _" looks bogus.  Don't do it.
> >   
> 
> Hello,
> thank you for your time.
> 
> 1. I really think that it might be a good practice to test non ASCII
>   symbols on platforms where it is possible. Maybe locale-dependent
>   Makefile? I've already spent pretty much time trying to find
> possible solutions and I have no results. So, I've deleted this
> tests. Maybe there is a better solution I don't know about?
> 
> 2. Thank you for this one. Writing those errors were really pain for
>   me. I've changed those things in new patch.
> 
> 3. I've fixed all the whitespaces you was talking about in new version
>   of the patch.
> 
> 4. The relocatibility test is needed in order to check if patch is
>   still relocatable. With this test I've tried to prove the code
>   "relocatable=true" in *.control files. So, I've decided to leave
> them in next version of the patch.
> 
> 5. If I delete #undef _, I will get the warning:
>   /usr/lib/x86_64-linux-gnu/perl/5.22/CORE/config.h:3094:0:
>   warning: "_" redefined #define _(args) args
>  
>   In file included from ../../src/include/postgres.h:47:0,
>  from jsonb_plperl.c:12:
>   ../../src/include/c.h:971:0: note: this is the location of the
>   previous definition #define _(x) gettext(x)
>   This #undef was meant to fix the warning. I hope a new comment next
>   to #undef contains all the explanations needed.
> 
> Please, find a new version of the patch in attachments to this
> message.
> 
> 
> --
> Anthony Bykov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

Forgot the patch.

--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/contrib/Makefile b/contrib/Makefile
index 8046ca4..53d44fe 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -75,9 +75,9 @@ ALWAYS_SUBDIRS += sepgsql
 endif
 
 ifeq ($(with_perl),yes)
-SUBDIRS += hstore_plperl
+SUBDIRS += hstore_plperl jsonb_plperl
 else
-ALWAYS_SUBDIRS += hstore_plperl
+ALWAYS_SUBDIRS += hstore_plperl jsonb_plperl
 endif
 
 ifeq ($(with_python),yes)
diff --git a/contrib/jsonb_plperl/Makefile b/contrib/jsonb_plperl/Makefile
new file mode 100644
index 000..cd86553
--- /dev/null
+++ b/contrib/jsonb_plperl/Makefile
@@ -0,0 +1,40 @@
+# contrib/jsonb_plperl/Makefile
+
+MODULE_big = jsonb_plperl
+OBJS = jsonb_plperl.o $(WIN32RES)
+PGFILEDESC = "jsonb_plperl - jsonb transform for plperl"
+
+PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plperl
+
+EXTENSION = jsonb_plperlu jsonb_plperl
+DATA = jsonb_plperlu--1.0.sql jsonb_plperl--1.0.sql
+
+REGRESS = jsonb_plperl jsonb_plperl_relocatability jsonb_plperlu jsonb_plperlu_relocatability
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/jsonb_plperl
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+# We must link libperl explicitly
+ifeq ($(PORTNAME), win32)
+# these settings are the same as for plperl
+override CPPFLAGS += -DPLPERL_HAVE_UID_GID -Wno-comment
+# ... see silliness in plperl Makefile ...
+SHLIB_LINK += $(sort $(wildcard ../../src/pl/plperl/libperl*.a))
+else
+rpathdir = $(perl_archlibexp)/CORE
+SHLIB_LINK += $(perl_embed_ldflags)
+endif
+
+# As with plperl we need to make sure that the CORE directory is included
+# last, probably because it sometimes contains some header files with names
+# that clash with some of ours, or with some that we include, notably on
+# Windows.
+override CPPFLAGS := $(CPPFLAGS) $(perl_embed_ccflags) -I$(perl_archlibexp)/CORE
diff --git a/contrib/jsonb_plperl/expected/jsonb_plperl.out b/contrib/jsonb_plperl/expected/jsonb_plperl.out
new file mode 100644
index 000..9032f7e
--- /dev/null
+++ b/contrib/jsonb_plperl/expected/jsonb_plperl.out
@@ -0,0 +1,235 @@
+CREATE EXTENSION jsonb_plperl CASCADE;
+NOTICE:  installing required extension "plperl"
+-- test hash -> jsonb
+CREATE

Re: Transform for pl/perl

2017-12-07 Thread Anthony Bykov
On Fri, 1 Dec 2017 15:49:21 -0300
Alvaro Herrera  wrote:

> A few very minor comments while quickly paging through:
> 
> 1. non-ASCII tests are going to cause problems in one platform or
> another.  Please don't include that one.
> 
> 2. error messages
>a) please use ereport() not elog()
>b) conform to style guidelines: errmsg() start with lowercase,
> others are complete phrases (start with uppercase, end with period)
>c) replace
>   "The type you was trying to transform can't be represented in
> JSONB" maybe with
>   errmsg("could not transform to type \"%s\"", "jsonb"),
>   errdetail("The type you are trying to transform can't be
> represented in JSON") d) same errmsg() for the other error; figure
> out suitable errdetail.
> 
> 3. whitespace: don't add newlines to while, DirectFunctionCallN,
> pnstrdup.
> 
> 4. the "relocatability" test seems pointless to me.
> 
> 5. "#undef _" looks bogus.  Don't do it.
> 

Hello,
thank you for your time.

1. I really think that it might be a good practice to test non ASCII
  symbols on platforms where it is possible. Maybe locale-dependent
  Makefile? I've already spent pretty much time trying to find possible
  solutions and I have no results. So, I've deleted this tests. Maybe
  there is a better solution I don't know about?

2. Thank you for this one. Writing those errors were really pain for
  me. I've changed those things in new patch.

3. I've fixed all the whitespaces you was talking about in new version
  of the patch.

4. The relocatibility test is needed in order to check if patch is
  still relocatable. With this test I've tried to prove the code
  "relocatable=true" in *.control files. So, I've decided to leave them
  in next version of the patch.

5. If I delete #undef _, I will get the warning:
/usr/lib/x86_64-linux-gnu/perl/5.22/CORE/config.h:3094:0:
warning: "_" redefined #define _(args) args
 
In file included from ../../src/include/postgres.h:47:0,
 from jsonb_plperl.c:12:
../../src/include/c.h:971:0: note: this is the location of the
previous definition #define _(x) gettext(x)
  This #undef was meant to fix the warning. I hope a new comment next
  to #undef contains all the explanations needed.

Please, find a new version of the patch in attachments to this message.


--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-12-07 Thread Michael Paquier
On Thu, Dec 7, 2017 at 5:23 AM, Alvaro Herrera  wrote:
> Looking at 0002: I agree with the stuff being done here.

The level of details you are providing with a proper error code is an
improvement over the first version proposed in my opinion.

> I think a
> couple of these checks could be moved one block outerwards in term of
> scope; I don't see any reason why the check should not apply in that
> case. I didn't catch any place missing additional checks.

In FreezeMultiXactId() wouldn't it be better to issue an error as well
for this assertion?
Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));

> Despite these being "shouldn't happen" conditions, I think we should
> turn these up all the way to ereports with an errcode and all, and also
> report the XIDs being complained about.  No translation required,
> though.  Other than those changes and minor copy editing a commit
> (attached), 0002 looks good to me.

+   if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+   TransactionIdDidCommit(xid))
+   ereport(ERROR,
+   (errcode(ERRCODE_DATA_CORRUPTED),
+errmsg("can't freeze committed xmax %u", xid)));
The usual wording used in errmsg is not the "can't" but "cannot".

+   ereport(ERROR,
+   (errcode(ERRCODE_DATA_CORRUPTED),
+errmsg_internal("uncommitted Xmin %u from
before xid cutoff %u needs to be frozen",
+xid, cutoff_xid)));
"Xmin" I have never seen, but "xmin" I did.

> I started thinking it'd be good to report block number whenever anything
> happened while scanning the relation.  The best way to go about this
> seems to be to add an errcontext callback to lazy_scan_heap, so I attach
> a WIP untested patch to add that.  (I'm not proposing this for
> back-patch for now, mostly because I don't have the time/energy to push
> for it right now.)

I would recommend to start a new thread and to add that patch to the
next commit fest as you would get more visibility and input from other
folks on -hackers. It looks like a good idea to me.
-- 
Michael



Re: Speeding up pg_upgrade

2017-12-07 Thread Alexander Kukushkin
Hi,


> Yes, dump/reload of analyze statistics seems like a better use of time.
> I have avoided it since it locks us into supporting the text
> respresentation of data type, but at this point it might be worth it.
>
>
Couple of months ago we at Zalando upgraded a few databases of different
sizes to 9.6.
During preparations to the I've found 2.5 pain-points:

1. We are using schema-based api deployment. Basically ~every week we
create a new schema in the database and hundreds of stored procedures in it.
Off course we remove old API schemas and trying not to keep more than
last 10. Before the upgrade we basically dropped all API schemas except the
one used in production.
And even in this case dump-restore phase was taking much more time than
relinking of datafiles.
Unfortunately I don't have any numbers right now, but usually run of
pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was
spend in dump-restore.

2 ANALYZE phase is a pain. I think everybody agrees with it.

2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
reasonable, except one case: some of the columns might have non default
statistics target.
It breaks `vacuumdb --analyze-in-stages`, because those specific
columns it will not use value of default_statistics_target provided by
vacuumdb.
What I did - reset those non default values right before running
pg_upgrade and restored them only when analyze was completed. Off course
after that I've re-analyze those columns.



Regards,
Alexander Kukushkin


  1   2   >