Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-05-05 Thread Noah Misch
On Fri, May 05, 2017 at 03:36:39AM +1200, David Rowley wrote:
> On 2 May 2017 at 00:10, David Rowley  wrote:
> > On 20 April 2017 at 07:29, Euler Taveira  wrote:
> >> 2017-04-19 1:32 GMT-03:00 Michael Paquier :
> >>>
> >>> I vote for "location" -> "lsn". I would expect complains about the
> >>> current inconsistency at some point, and the function names have been
> >>> already changed for this release..
> >
> > OK, so I've created a draft patch which does this.
> 
> I ended up adding this to the open items list.  I feel it's best to be
> on there so that we don't forget about this.
> 
> If we decide not to do it then we can just remove it from the list,
> but it would be a shame to release the beta having forgotten to make
> this change.
> 
> Do any of the committers who voted for this change feel inclined to
> pick this patch up?

I'll echo that question.  This open item lacks a clear owner.  One might argue
that 806091c caused it by doing the backward-compatibility breaks that
inspired this patch, but that's a link too tenuous to create ownership.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2017-05-05 Thread Noah Misch
On Fri, May 05, 2017 at 08:23:33AM +1200, Thomas Munro wrote:
> On Fri, May 5, 2017 at 12:39 AM, Neha Sharma
>  wrote:
> > While testing the feature we encountered one more crash,below is the
> > scenario to reproduce.
> >
> > create table t1 ( a int);
> > create table t2 ( a int);
> > insert into t1 values (11),(12),(13);
> >
> > create or replace function my_trig() returns trigger
> > language plpgsql as $my_trig$
> > begin
> > insert into t2(select a from new_table);
> > RETURN NEW;
> > end;
> > $my_trig$;
> >
> > create trigger my_trigger
> > after truncate or update  on t1
> > referencing new table as new_table old table as oldtab
> > for each statement
> > execute procedure my_trig();
> >
> > truncate t1;
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > The connection to the server was lost. Attempting reset: Failed.
> 
> Thanks.  Reproduced here.  The stack looks like this:
> 
> frame #3: 0x000103e5e8b0
> postgres`ExceptionalCondition(conditionName="!((trigdata->tg_event)
> & 0x0003) == 0x) || (((trigdata->tg_event) & 0x0003)
> == 0x0002) || (((trigdata->tg_event) & 0x0003) == 0x0001))
> && (((trigdata->tg_event) & 0x0018) == 0x) &&
> !(trigdata->tg_event & 0x0020) && !(trigdata->tg_event &
> 0x0040)) || (trigdata->tg_oldtable == ((void*)0) &&
> trigdata->tg_newtable == ((void*)0)))", errorType="FailedAssertion",
> fileName="trigger.c", lineNumber=2045) + 128 at assert.c:54
> frame #4: 0x000103a6f542
> postgres`ExecCallTriggerFunc(trigdata=0x7fff5c40bad0, tgindx=0,
> finfo=0x7fd8ba0817b8, instr=0x,
> per_tuple_context=0x7fd8b906f928) + 258 at trigger.c:2039
> frame #5: 0x000103a754ed
> postgres`AfterTriggerExecute(event=0x7fd8ba092460,
> rel=0x0001043fd9c0, trigdesc=0x7fd8ba068758,
> finfo=0x7fd8ba0817b8, instr=0x,
> per_tuple_context=0x7fd8b906f928,
> trig_tuple_slot1=0x,
> trig_tuple_slot2=0x) + 1469 at trigger.c:3860
> frame #6: 0x000103a73080
> postgres`afterTriggerInvokeEvents(events=0x7fd8ba07fb00,
> firing_id=1, estate=0x7fd8ba090440, delete_ok='\x01') + 592 at
> trigger.c:4051
> frame #7: 0x000103a72b7b
> postgres`AfterTriggerEndQuery(estate=0x7fd8ba090440) + 203 at
> trigger.c:4227
> frame #8: 0x000103a498aa
> postgres`ExecuteTruncate(stmt=0x7fd8ba059f40) + 2026 at
> tablecmds.c:1485
> 
> There's an assertion that it's (one of INSERT, UPDATE, DELETE, an
> AFTER trigger, not deferred) *or* there are no transition tables.
> Here it's TRUNCATE and there are transition tables, so it fails:
> 
> /*
>  * Protect against code paths that may fail to initialize transition table
>  * info.
>  */
> Assert(((TRIGGER_FIRED_BY_INSERT(trigdata->tg_event) ||
>  TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event) ||
>  TRIGGER_FIRED_BY_DELETE(trigdata->tg_event)) &&
> TRIGGER_FIRED_AFTER(trigdata->tg_event) &&
> !(trigdata->tg_event & AFTER_TRIGGER_DEFERRABLE) &&
> !(trigdata->tg_event & AFTER_TRIGGER_INITDEFERRED)) ||
>(trigdata->tg_oldtable == NULL && trigdata->tg_newtable == NULL));
> 
> 
> We can't possibly support transition tables on TRUNCATE (the whole
> point of TRUNCATE is not to inspect all the rows so we can't collect
> them), and we already reject ROW triggers on TRUNCATE, so we should
> reject transition tables on STATEMENT triggers for TRUNCATE at
> creation time too.  See attached.  Thoughts?

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Kevin,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2017-05-05 Thread Noah Misch
On Thu, May 04, 2017 at 09:51:03PM +1200, Thomas Munro wrote:
> On Thu, May 4, 2017 at 9:12 PM, Prabhat Sahu
>  wrote:
> > I have been testing this for a while and observed a server crash while 
> > referencing table column value in a trigger procedure for AFTER DELETE 
> > trigger.
> >
> > -- Steps to reproduce:
> > CREATE TABLE t1(c1 int);
> > CREATE TABLE t2(cc1 int);
> > INSERT INTO t1 VALUES (10);
> > INSERT INTO t2 VALUES (10);
> >
> > CREATE OR REPLACE FUNCTION trig_func() RETURNS trigger AS
> > $$ BEGIN
> > DELETE FROM t1 WHERE c1 IN (select OLD.cc1 from my_old);
> > RETURN OLD;
> > END; $$ LANGUAGE PLPGSQL;
> >
> > CREATE TRIGGER trg1
> >   AFTER DELETE ON t2
> > REFERENCING OLD TABLE AS my_old
> > FOR EACH ROW
> >   EXECUTE PROCEDURE trig_func();
> >
> > DELETE FROM t2 WHERE cc1 =10;
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > The connection to the server was lost. Attempting reset: Failed.
> 
> Reproduced here.  The stack looks like this:
> 
> frame #3: 0x00010f06f8b0
> postgres`ExceptionalCondition(conditionName="!(readptr->eflags &
> 0x0002)", errorType="FailedAssertion", fileName="tuplestore.c",
> lineNumber=1237) + 128 at assert.c:54
> frame #4: 0x00010f0cbc85
> postgres`tuplestore_rescan(state=0x7ff219840200) + 85 at
> tuplestore.c:1237
> frame #5: 0x00010eced9b1
> postgres`ExecReScanNamedTuplestoreScan(node=0x7ff21d007840) + 81
> at nodeNamedtuplestorescan.c:197
> frame #6: 0x00010eca46a6
> postgres`ExecReScan(node=0x7ff21d007840) + 822 at execAmi.c:216
> frame #7: 0x00010ece7eca
> postgres`ExecNestLoop(node=0x7ff21d006310) + 538 at
> nodeNestloop.c:148
> 
> I think the problem is that the tuplestore read pointer hasn't been
> opened with the "rewindable" flag.  It works for me with the attached.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Kevin,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] New command to monitor progression of long running queries

2017-05-05 Thread Jaime Casanova
On 5 May 2017 at 22:38, Vinayak Pokale  wrote:
>
> On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet 
> wrote:
>>
>> Hello,
>>
>> I've implemented a new command named PROGRESS to monitor progression of
>> long running SQL queries in a backend process.
>>
> Thank you for the patch.
>

sorry if i'm bikeshedding to soon but... why a command instead of a function?
something like pg_progress_backend() will be in sync with
pg_cancel_backend()/pg_terminate_backend() and the result of such a
function could be usable by a tool to examine a slow query status

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] New command to monitor progression of long running queries

2017-05-05 Thread Vinayak Pokale
On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet 
wrote:

> Hello,
>
> I've implemented a new command named PROGRESS to monitor progression of
> long running SQL queries in a backend process.
>
> Thank you for the patch.

I am testing your patch but after applying your patch other regression test
failed.

$ make installcheck
13 of 178 tests failed.

Regards,
Vinayak


Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table

2017-05-05 Thread Stephen Frost
Noah,

On Fri, May 5, 2017 at 23:19 Noah Misch  wrote:

> On Thu, May 04, 2017 at 05:47:02PM -0400, Stephen Frost wrote:
> > * Amit Langote (amitlangot...@gmail.com) wrote:
> > > On Wed, May 3, 2017 at 12:05 PM, Stephen Frost 
> wrote:
> > > > Assuming this looks good to you, I'll push it tomorrow, possibly with
> > > > other minor adjustments and perhaps a few more tests.
> > >
> > > Your latest patch looks good to me.
> >
> > Found a few more issues (like pg_dump not working against older versions
> > of PG, because the queries for older versions hadn't been adjusted) and
> > did a bit more tidying.
> >
> > I'll push this in a couple of hours.
>
> This PostgreSQL 10 open item is past due for your status update.  Kindly
> send
> a status update within 24 hours, and include a date for your subsequent
> status
> update.  Refer to the policy on open item ownership:
>
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>
> I see you did commit a related patch.  Is this item done?


Yes, I believe it to be. I anticipate reviewing the code in this area more
in the coming weeks, but this specific issue can be marked as resolved.

Thanks!

Stephen


Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table

2017-05-05 Thread Noah Misch
On Thu, May 04, 2017 at 05:47:02PM -0400, Stephen Frost wrote:
> * Amit Langote (amitlangot...@gmail.com) wrote:
> > On Wed, May 3, 2017 at 12:05 PM, Stephen Frost  wrote:
> > > Assuming this looks good to you, I'll push it tomorrow, possibly with
> > > other minor adjustments and perhaps a few more tests.
> > 
> > Your latest patch looks good to me.
> 
> Found a few more issues (like pg_dump not working against older versions
> of PG, because the queries for older versions hadn't been adjusted) and
> did a bit more tidying.
> 
> I'll push this in a couple of hours.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I see you did commit a related patch.  Is this item done?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-05-05 Thread Amit Kapila
On Sat, May 6, 2017 at 4:41 AM, Amit Kapila  wrote:
> On Fri, May 5, 2017 at 9:32 PM, Robert Haas  wrote:
>> On Fri, May 5, 2017 at 11:15 AM, Amit Kapila  wrote:
>>> I am not saying to rip those changes.  Your most of the mail stresses
>>> about the 30-second timeout which you have added in the patch to
>>> detect timeouts during failures (rollback processing).  I have no
>>> objection to that part of the patch except that still there is a race
>>> condition around PQsendQuery and you indicate that it is better to
>>> deal it as a separate patch to which I agree.
>>
>> OK.
>>
>>> The point of which I am
>>> not in total agreement is about the failure other than the time out.
>>>
>>> +pgfdw_exec_cleanup_query(PGconn *conn, const char *query)
>>> {
>>> ..
>>> + /* Get the result of the query. */
>>> + if (pgfdw_get_cleanup_result(conn, endtime, ))
>>> + return false;
>>> +
>>> + /* Issue a warning if not successful. */
>>> + if (PQresultStatus(result) != PGRES_COMMAND_OK)
>>> + {
>>> + pgfdw_report_error(WARNING, result, conn, true, query);
>>> + return false;
>>> + }
>>> ..
>>> }
>>>
>>> In the above code, if there is a failure due to timeout then it will
>>> return from first statement (pgfdw_get_cleanup_result), however, if
>>> there is any other failure, then it will issue a warning and return
>>> false.  I am not sure if there is a need to return false in the second
>>> case, because even without that it will work fine (and there is a
>>> chance that it won't drop the connection), but maybe your point is
>>> better to be consistent for all kind of error handling in this path.
>>
>> There are three possible queries that can be executed by
>> pgfdw_exec_cleanup_query; let's consider them individually.
>>
>> 1. ABORT TRANSACTION.  If ABORT TRANSACTION fails, I think that we
>> have no alternative but to close the connection.  If we don't, then
>> the next local connection that tries to use this connection will
>> probably also fail, because it will find the connection inside an
>> aborted transaction rather than not in a transaction.  If we do close
>> the connection, the next transaction will try to reconnect and
>> everything will be fine.
>>
>> 2. DEALLOCATE ALL.  If DEALLOCATE ALL fails, we do not have to close
>> the connection, but the reason why we're running that statement in the
>> first place is because we've potentially lost track of which
>> statements are prepared on the remote side.  If we don't drop the
>> connection, we'll still be confused about that.  Reconnecting will fix
>> it.
>>
>
> There is a comment in the code which explains why currently we ignore
> errors from DEALLOCATE ALL.
>
> * DEALLOCATE ALL only exists in 8.3 and later, so this
> * constrains how old a server postgres_fdw can
> * communicate with.  We intentionally ignore errors in
> * the DEALLOCATE, so that we can hobble along to some
> * extent with older servers (leaking prepared statements
> * as we go; but we don't really support update operations
> * pre-8.3 anyway).
>
> It is not entirely clear to me whether it is only for Commit case or
> in general.  From the code, it appears that the comment holds true for
> both commit and abort.  If it is for both, then after this patch, the
> above comment will not be valid and you might want to update it in
> case we want to proceed with your proposed changes for DEALLOCATE ALL
> statement.
>

Apart from this, I don't see any other problem with the patch.
Additionally, I have run the regression tests with the patch and
everything passed.

As a side note, why we want 30-second timeout only for Rollback
related commands and why not for Commit and the Deallocate All as
well?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

2017-05-05 Thread Thomas Munro
On Sat, May 6, 2017 at 7:34 AM, Robert Haas  wrote:
> On Thu, May 4, 2017 at 10:20 PM, David Rowley
>  wrote:
>> Now I'm not going to pretend that this patch is ready for the
>> prime-time. I've not yet worked out how to properly report sync-scan
>> locations without risking reporting later pages after reporting the
>> end of the scan. What I have at the moment could cause a report to be
>> missed if SYNC_SCAN_REPORT_INTERVAL is not divisible by the batch
>> size. I'm also not sure how batching like this affect read-aheads, but
>> at least the numbers above speak for something. Although none of the
>> pages in this case came from disk.
>
> This kind of approach has also been advocated within EnterpriseDB, and
> I immediately thought of the read-ahead problem.  I think we need more
> research into how Parallel Seq Scan interacts with OS readahead
> behavior on various operating systems.  It seem possible that Parallel
> Seq Scan frustrates operating system read-ahead even without this
> change on at least some systems (because maybe they can only detect
> ascending block number requests within a single process) and even more
> possible that you run into problems with the block number requests are
> no longer precisely in order (which, at present, they should be, or
> very close).  If it turns out to be a problem, either currently or
> with this patch, we might need to add explicit prefetching logic to
> Parallel Seq Scan.

I don't know much about this stuff, but I was curious to go looking at
source code.  I hope someone will correct me if I'm wrong but here's
what I could glean:

In Linux, each process that opens a file gets its own 'file'
object[1][5].  Each of those has it's own 'file_ra_state'
object[2][3], used by ondemand_readahead[4] for sequential read
detection.  So I speculate that page-at-a-time parallel seq scan must
look like random access to Linux.

In FreeBSD the situation looks similar.  Each process that opens a
file gets a 'file' object[8] which has members 'f_seqcount' and
'f_nextoff'[6].  These are used by the 'sequential_heuristics'
function[7] which affects the ioflag which UFS/FFS uses to control
read ahead (see ffs_read).  So I speculate that page-at-a-time
parallel seq scan must look like random access to FreeBSD too.

In both cases I suspect that if you'd inherited (or sent the file
descriptor to the other process via obscure tricks), it would actually
work because they'd have the same 'file' entry, but that's clearly not
workable for md.c.

Experimentation required...

[1] 
https://github.com/torvalds/linux/blob/a3719f34fdb664ffcfaec2160ef20fca7becf2ee/include/linux/fs.h#L837
[2] 
https://github.com/torvalds/linux/blob/a3719f34fdb664ffcfaec2160ef20fca7becf2ee/include/linux/fs.h#L858
[3] 
https://github.com/torvalds/linux/blob/a3719f34fdb664ffcfaec2160ef20fca7becf2ee/include/linux/fs.h#L817
[4] 
https://github.com/torvalds/linux/blob/a3719f34fdb664ffcfaec2160ef20fca7becf2ee/mm/readahead.c#L376
[5] http://www.makelinux.net/ldd3/chp-3-sect-3 "There can be numerous
file structures representing multiple open descriptors on a single
file, but they all point to a single inode structure."
[6] 
https://github.com/freebsd/freebsd/blob/7e6cabd06e6caa6a02eeb86308dc0cb3f27e10da/sys/sys/file.h#L180
[7] 
https://github.com/freebsd/freebsd/blob/7e6cabd06e6caa6a02eeb86308dc0cb3f27e10da/sys/kern/vfs_vnops.c#L477
[8] Page 319 of 'Design and Implementation of the FreeBSD Operating
System' 2nd Edition

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Build pgoutput with MSVC

2017-05-05 Thread MauMau
From: Michael Paquier
Magnus has already pushed it as 28d1c8c.

+if exist src\backend\replication\libpqwalreceiver\win32ver.rc del /q
src\backend\replication\pgoutput\win32ver.rc
This is not right by the way, you need to check for the existence of
the file in pgoutput/, not libpqwalreceiver/.


I'm relieved to hear that's already committed.  Oh, how careless I
was, thanks.

Regards
MauMau



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Draft release notes for next week's back-branch releases

2017-05-05 Thread Tom Lane
I've written $SUBJECT ... you can find them at

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=54dbd4dc78b045ffcc046b9a43681770c3992dd4

and in an hour or so they should be built in the devel docs at

https://www.postgresql.org/docs/devel/static/release-9-6-3.html

As usual, this one page contains entries for all back branches;
I'll split them up later.

Please review by Sunday.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Build pgoutput with MSVC

2017-05-05 Thread Michael Paquier
On Sat, May 6, 2017 at 7:07 AM, MauMau  wrote:
> From: Magnus Hagander
> If that's all that's required, I'll just go ahead and commit it right
> away, including the clean.bat.
>
> I think the problem with clean.bat isn't cleaning up pgoutput.dll --
> that one goes in a different directory. But it does need to clean up
> the win32ver.rc file that gets dropped there automaticaly.
>
> The attached patch itself seems broken (it has some sort of byte order
> marker at the beginning, but removing that still breaks with "patch
> unexpectedly ends in middle of line patch:  Only garbage was found
> in the patch input.". But I can just copy/paste it manually :)
>
>
> Thanks, fixed clean.bat, too.  My original patch was in UTF-16
> unexpectedly.  With Git Shell in GitHub Desktop on Windows, "git diff
>> filename" seems to produce output in UTF-16.  I guess that's due to
> PowerShell.  I think there's nothing else to do, so please commit
> this.  Also, I added an item in the Open Items page.

Magnus has already pushed it as 28d1c8c.

+if exist src\backend\replication\libpqwalreceiver\win32ver.rc del /q
src\backend\replication\pgoutput\win32ver.rc
This is not right by the way, you need to check for the existence of
the file in pgoutput/, not libpqwalreceiver/.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-05-05 Thread Amit Kapila
On Fri, May 5, 2017 at 9:32 PM, Robert Haas  wrote:
> On Fri, May 5, 2017 at 11:15 AM, Amit Kapila  wrote:
>> I am not saying to rip those changes.  Your most of the mail stresses
>> about the 30-second timeout which you have added in the patch to
>> detect timeouts during failures (rollback processing).  I have no
>> objection to that part of the patch except that still there is a race
>> condition around PQsendQuery and you indicate that it is better to
>> deal it as a separate patch to which I agree.
>
> OK.
>
>> The point of which I am
>> not in total agreement is about the failure other than the time out.
>>
>> +pgfdw_exec_cleanup_query(PGconn *conn, const char *query)
>> {
>> ..
>> + /* Get the result of the query. */
>> + if (pgfdw_get_cleanup_result(conn, endtime, ))
>> + return false;
>> +
>> + /* Issue a warning if not successful. */
>> + if (PQresultStatus(result) != PGRES_COMMAND_OK)
>> + {
>> + pgfdw_report_error(WARNING, result, conn, true, query);
>> + return false;
>> + }
>> ..
>> }
>>
>> In the above code, if there is a failure due to timeout then it will
>> return from first statement (pgfdw_get_cleanup_result), however, if
>> there is any other failure, then it will issue a warning and return
>> false.  I am not sure if there is a need to return false in the second
>> case, because even without that it will work fine (and there is a
>> chance that it won't drop the connection), but maybe your point is
>> better to be consistent for all kind of error handling in this path.
>
> There are three possible queries that can be executed by
> pgfdw_exec_cleanup_query; let's consider them individually.
>
> 1. ABORT TRANSACTION.  If ABORT TRANSACTION fails, I think that we
> have no alternative but to close the connection.  If we don't, then
> the next local connection that tries to use this connection will
> probably also fail, because it will find the connection inside an
> aborted transaction rather than not in a transaction.  If we do close
> the connection, the next transaction will try to reconnect and
> everything will be fine.
>
> 2. DEALLOCATE ALL.  If DEALLOCATE ALL fails, we do not have to close
> the connection, but the reason why we're running that statement in the
> first place is because we've potentially lost track of which
> statements are prepared on the remote side.  If we don't drop the
> connection, we'll still be confused about that.  Reconnecting will fix
> it.
>

There is a comment in the code which explains why currently we ignore
errors from DEALLOCATE ALL.

* DEALLOCATE ALL only exists in 8.3 and later, so this
* constrains how old a server postgres_fdw can
* communicate with.  We intentionally ignore errors in
* the DEALLOCATE, so that we can hobble along to some
* extent with older servers (leaking prepared statements
* as we go; but we don't really support update operations
* pre-8.3 anyway).

It is not entirely clear to me whether it is only for Commit case or
in general.  From the code, it appears that the comment holds true for
both commit and abort.  If it is for both, then after this patch, the
above comment will not be valid and you might want to update it in
case we want to proceed with your proposed changes for DEALLOCATE ALL
statement.

> 3. ROLLBACK TO SAVEPOINT %d; RELEASE SAVEPOINT %d.  If this fails, the
> local toplevel transaction is doomed.  It would be wrong to try to
> commit on the remote side, because there could be remote changes made
> by the aborted subtransaction which must not survive, and the ROLLBACK
> TO SAVEPOINT %d command is essential in order for us to get rid of
> them, and that has already failed.  So we must eventually abort the
> remote transaction, which we can do either by closing the connection
> or by trying to execute ABORT TRANSACTION.  The former, however, is
> quick, and the latter is very possibly going to involve a long hang,
> so closing the connection seems better.
>
> In all of these cases, the failure of one of these statements seems to
> me to *probably* mean that the remote side is just dead, in which case
> dropping the connection is the only option.  But even if the remote
> side is still alive and the statement failed for some other reason --
> say somebody changed kwlist.h so that ABORT now has to be spelled
> ABURT -- returning true rather than false just causes us to end up
> with a remote connection whose state is out of step with the local
> server state, and such a connection is not going to be usable.  I
> think part of your point is that in case 3, we might still get back to
> a usable connection if ABORT TRANSACTION is successfully executed
> later,
>

Yes.

> but that's not very likely and, even if it would have happened,
> reconnecting instead doesn't really leave us any worse off.
>

I see your point.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 

Re: [HACKERS] PG 10 release notes

2017-05-05 Thread Bruce Momjian
On Fri, May  5, 2017 at 02:28:50PM +0200, Petr Jelinek wrote:
> On 05/05/17 02:46, Bruce Momjian wrote:
> > I am getting tired of saying this.  When I am writing the release notes,
> > I am trying to figure out how it affects our shipped code, and the only
> > "decoding" I know of is test_decoding.  My message was this:
> 
> I actually think the main misunderstanding here comes from the
> test_decoding name interpretation. The test_decoding does not decode
> anything and there are no external "decoders". The decoding happens in
> core, the decoding just provides C API for plugins to consume the
> decoded info (ie, heap tuples and similar). The test_decoding *tests*
> the decoding API and the external projects use the decoding API as well
> but they don't really decode anything, their role is of filtering and
> deciding the wire protocol mostly.

Yes, I thought test_decoding did decoding and not just use a
backend-internal decoder.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Build pgoutput with MSVC

2017-05-05 Thread MauMau
From: Magnus Hagander
If that's all that's required, I'll just go ahead and commit it right
away, including the clean.bat.

I think the problem with clean.bat isn't cleaning up pgoutput.dll -- 
that one goes in a different directory. But it does need to clean up
the win32ver.rc file that gets dropped there automaticaly.

The attached patch itself seems broken (it has some sort of byte order
marker at the beginning, but removing that still breaks with "patch
unexpectedly ends in middle of line patch:  Only garbage was found
in the patch input.". But I can just copy/paste it manually :)


Thanks, fixed clean.bat, too.  My original patch was in UTF-16
unexpectedly.  With Git Shell in GitHub Desktop on Windows, "git diff
> filename" seems to produce output in UTF-16.  I guess that's due to
PowerShell.  I think there's nothing else to do, so please commit
this.  Also, I added an item in the Open Items page.

Regards
MauMau


msvc_build_pgoutput_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

2017-05-05 Thread Peter Geoghegan
On Fri, May 5, 2017 at 12:40 PM, Robert Haas  wrote:
> One idea that crossed my mind is to just have workers write all of
> their output tuples to a temp file and have the leader read them back
> in.  At some cost in I/O, this would completely eliminate the overhead
> of workers waiting for the leader.  In some cases, it might be worth
> it.  At the least, it could be interesting to try a prototype
> implementation of this with different queries (TPC-H, maybe) and see
> what happens.  It would give us some idea how much of a problem
> stalling on the leader is in practice.  Wait event monitoring could
> possibly also be used to figure out an answer to that question.

The use of temp files in all cases was effective in my parallel
external sort patch, relative to what I imagine an approach built on a
gather node would get you, but not because of the inherent slowness of
a Gather node. I'm not so sure that Gather is actually inherently
slow, given the interface it supports.

While incremental, retail processing of each tuple is flexible and
composable, it will tend to be slow compared to an approach based on
batch processing (for tasks where you happen to be able to get away
with batch processing). This is true for all the usual reasons --
better locality of access, better branch prediction properties, lower
"effective instruction count" due to having very tight inner loops,
and so on.

I agree with Andres that we shouldn't put too much effort into
modelling concurrency ahead of optimizing serial performance. The
machine's *aggregate* memory bandwidth should be used as efficiently
as possible, and parallelism is just one (very important) tool for
making that happen.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

2017-05-05 Thread Andres Freund
Hi,


On 2017-05-05 15:29:40 -0400, Robert Haas wrote:
> On Thu, May 4, 2017 at 9:37 PM, Andres Freund  wrote:
> It's pretty easy (but IMHO not very interesting) to measure internal
> contention in the Parallel Seq Scan node.  As David points out
> downthread, that problem isn't trivial to fix, but it's not that hard,
> either.

Well, I think it's important that we do some basic optimization before
we start building a more elaborate costing model, because we'll
otherwise codify assumptions that won't be true for long.  Changing an
established costing model once it's out there is really hard, because
you always will hurt someone.  I think some of the contention is easy
enough to remove, and some of the IO concurrency issues


> I do believe that there is a problem with too much concurrent
> I/O on things like:
> 
> Gather
> -> Parallel Seq Scan on lineitem
> -> Hash Join
>-> Seq Scan on lineitem
> 
> If that goes to multiple batches, you're probably wrenching the disk
> head all over the place - multiple processes are now reading and
> writing batch files at exactly the same time.

At least on rotating media. On decent SSDs you usually can have so many
concurrent IOs out there, that it's not actually easy to overwhelm a
disk that way.  While we obviously still need to pay some attention to
spinning disks, I also think that being good on decent (not great) SSDs
is more important.  We shouldn't optimize for things to run most
performant on hydra with it's slow storage system ;)

We probably should take something like effective_io_concurrency into
account, but that'd require a smarter default than we currently have.
It's not that hard to estimate an upper bound of parallelism with a
meaningful effective_io_concurrency - we'd probably have to move the
effective_io_concurrency handling in bitmapscans to be a plan parameter
though, so it's divided by the number of workers.



> I also strongly suspect
> that contention on individual buffers can turn into a problem on
> queries like this:
> 
> Gather (Merge)
> -> Merge Join
>   -> Parallel Index Scan
>   -> Index Scan
> 
> The parallel index scan surely has some upper limit on concurrency,
> but if that is exceeded, what will tend to happen is that processes
> will sleep.  On the inner side, though, every process is doing a full
> index scan and chances are good that they are doing it more or less in
> lock step, hitting the same buffers at the same time.

Hm. Not sure how big that problem is practically.  But I wonder if it's
worthwhile to sleep more if you hit contention or something liek that...


> I think there are two separate questions here:
> 
> 1. How do we reduce or eliminate contention during parallel query execution?
> 2. How do we make the query planner smarter about picking the optimal
> number of workers?
> 
> I think the second problem is both more difficult and more
> interesting.  I think that no matter how much work we do on #1, there
> are always going to be cases where the amount of effective parallelism
> has some finite limit - and I think the limit will vary substantially
> from query to query.  So, without #2, we'll either leave a lot of
> money on the table for queries that can benefit from using a large
> number of workers, or we'll throw extra workers uselessly at queries
> where they don't help (or even make things worse).

Yea, I agree that 2) is more important in the long run, but I do think
that my point that we shouldn't put too much effort into modelling
concurrency before doing basic optimization still stands.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row Level Security UPDATE Confusion

2017-05-05 Thread Stephen Frost
Rod,

* Rod Taylor (rod.tay...@gmail.com) wrote:
> UPDATE seems to work as described (unable to create records you cannot
> select); both the single rule version and multi-rule version seem to work
> the same.

I'm glad they work the same now, as they were always intended to.

Regarding the "rules", they're actually based on what the ACL rules are.
In particular, an UPDATE with a WHERE clause requires SELECT rights- and
so the code mandates that the UPDATE can see both the row-to-be-updated
and the row-post-updating, but you're able to INSERT records which you
can't then see, as an INSERT doesn't generally require SELECT
privileges.

> This combination works too though it seems funny that UPDATE can create an
> entry it cannot reverse. I can set the value to 100 (going to -1 fails) but
> the UPDATE cannot see the record to set it back. I can see use cases for
> it, for example you might be able to promote someone to manager but not
> demote a manager to front-desk. We also allow INSERT on tables you cannot
> delete from, so it's not inconsistent.
> 
> CREATE POLICY split_select ON t FOR SELECT TO split USING (value > 0);
> CREATE POLICY split_update ON t FOR UPDATE TO split USING (value < 10) WITH
> CHECK (value > 2);
> SET session authorization split;
> update t set value = 100 where value = 4; -- 1 record changed
> update t set value = 5 where value = 100; -- 0 records changed

Being able to UPDATE records to change them in a way that you're not
able to subsequently UPDATE them seems reasonable to me, at least.

> However, despite INSERT also functioning the same for both styles of
> commands it's definitely not obeying the `cannot give away records` rule:

Right, this is because INSERT doesn't generally require SELECT
privileges, and therefore doesn't require the USING clause to be
checked also.

We could certainly debate if the approach of applying policies based on
the privileges required for the command is the "correct" approach, but
it's definitely what was intended and generally works well, based on
what I've seen thus far in terms of actual usage.  Admittedly, it's a
bit of an edge case which doesn't come up very often either, so perhaps
we should consider changing this down the road, but for now we should go
ahead and fix the obvious unintentional bug in the code around ALL
policies and back-patch that as a bug fix, we can then consider if
changes should be made here in future releases independently.

Assuming there aren't objections to that, I'll plan to push this fix
later tonight or tomorrow.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

2017-05-05 Thread Robert Haas
On Thu, May 4, 2017 at 10:36 PM, David Rowley
 wrote:
> On 3 May 2017 at 07:13, Robert Haas  wrote:
>> Multiple people (including David Rowley
>> as well as folks here at EnterpriseDB) have demonstrated that for
>> certain queries, we can actually use a lot more workers and everything
>> works great.  The problem is that for other queries, using a lot of
>> workers works terribly.  The planner doesn't know how to figure out
>> which it'll be - and honestly, I don't either.
>
> For me, it seems pretty much related to the number of tuples processed
> on a worker, vs how many they return. As a general rule, I'd say the
> higher this ratio, the higher the efficiency ratio will be for the
> worker. Although that's not taking into account contention points
> where workers must wait for fellow workers to complete some operation.
> I think parallel_tuple_cost is a good GUC to have, perhaps we can be
> smarter about the use of it when deciding on how many workers should
> be used.

It does seem pretty clear that Gather is horrifyingly slow and
therefore often the gating factor, but I think it's also clear that
even if you removed the overhead of Gather completely, there will
always be *something* that limits scaling -- and I bet we're quite a
long way from the point where that thing is typically the amount of
hardware that you have.

One idea that crossed my mind is to just have workers write all of
their output tuples to a temp file and have the leader read them back
in.  At some cost in I/O, this would completely eliminate the overhead
of workers waiting for the leader.  In some cases, it might be worth
it.  At the least, it could be interesting to try a prototype
implementation of this with different queries (TPC-H, maybe) and see
what happens.  It would give us some idea how much of a problem
stalling on the leader is in practice.  Wait event monitoring could
possibly also be used to figure out an answer to that question.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

2017-05-05 Thread Robert Haas
On Thu, May 4, 2017 at 10:20 PM, David Rowley
 wrote:
> Now I'm not going to pretend that this patch is ready for the
> prime-time. I've not yet worked out how to properly report sync-scan
> locations without risking reporting later pages after reporting the
> end of the scan. What I have at the moment could cause a report to be
> missed if SYNC_SCAN_REPORT_INTERVAL is not divisible by the batch
> size. I'm also not sure how batching like this affect read-aheads, but
> at least the numbers above speak for something. Although none of the
> pages in this case came from disk.

This kind of approach has also been advocated within EnterpriseDB, and
I immediately thought of the read-ahead problem.  I think we need more
research into how Parallel Seq Scan interacts with OS readahead
behavior on various operating systems.  It seem possible that Parallel
Seq Scan frustrates operating system read-ahead even without this
change on at least some systems (because maybe they can only detect
ascending block number requests within a single process) and even more
possible that you run into problems with the block number requests are
no longer precisely in order (which, at present, they should be, or
very close).  If it turns out to be a problem, either currently or
with this patch, we might need to add explicit prefetching logic to
Parallel Seq Scan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

2017-05-05 Thread Robert Haas
On Thu, May 4, 2017 at 9:37 PM, Andres Freund  wrote:
> Have those benchmarks, even in a very informal form, been shared /
> collected / referenced centrally?  I'd be very interested to know where
> the different contention points are. Possibilities:
>
> - in non-resident workloads: too much concurrent IOs submitted, leading
>   to overly much random IO
> - internal contention in the the parallel node, e.g. parallel seqscan
> - contention on PG componenents like buffer mapping, procarray, clog
> - contention on individual buffers, e.g. btree root pages, or small
>   tables in nestloop joins
> - just too many context switches, due to ineffective parallelization
>
> probably multiple of those are a problem, but without trying to figure
> them out, we're going to have a hard time to develop a better costing
> model...

It's pretty easy (but IMHO not very interesting) to measure internal
contention in the Parallel Seq Scan node.  As David points out
downthread, that problem isn't trivial to fix, but it's not that hard,
either.  I do believe that there is a problem with too much concurrent
I/O on things like:

Gather
-> Parallel Seq Scan on lineitem
-> Hash Join
   -> Seq Scan on lineitem

If that goes to multiple batches, you're probably wrenching the disk
head all over the place - multiple processes are now reading and
writing batch files at exactly the same time.  I also strongly suspect
that contention on individual buffers can turn into a problem on
queries like this:

Gather (Merge)
-> Merge Join
  -> Parallel Index Scan
  -> Index Scan

The parallel index scan surely has some upper limit on concurrency,
but if that is exceeded, what will tend to happen is that processes
will sleep.  On the inner side, though, every process is doing a full
index scan and chances are good that they are doing it more or less in
lock step, hitting the same buffers at the same time.

Also consider:

Finalize HashAggregate
-> Gather
  -> Partial HashAggregate
-> Parallel Seq Scan

Suppose that the average group contains ten items which will tend to
be widely spaced across the table.  As you add workers, the number of
workers across which any given group gets spread increases.  There's
probably a "sweet spot" here.  Up to a certain point, adding workers
makes it faster, because the work of the Seq Scan and the Partial
HashAggregate gets spread across more processes.  However, adding
workers also increases the number of rows that pass through the Gather
node, because as you add workers more groups end up being split across
workers, or across more workers.  That means more and more of the
aggregation starts happening in the Finalize HashAggregate rather than
the Partial HashAggregate.  If you had for example 20 workers here
almost nothing would be happening in the Partial HashAggregate,
because chances are good that each of the 10 rows in each group would
be encountered by a different worker, so that'd probably be
counterproductive.

I think there are two separate questions here:

1. How do we reduce or eliminate contention during parallel query execution?
2. How do we make the query planner smarter about picking the optimal
number of workers?

I think the second problem is both more difficult and more
interesting.  I think that no matter how much work we do on #1, there
are always going to be cases where the amount of effective parallelism
has some finite limit - and I think the limit will vary substantially
from query to query.  So, without #2, we'll either leave a lot of
money on the table for queries that can benefit from using a large
number of workers, or we'll throw extra workers uselessly at queries
where they don't help (or even make things worse).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row Level Security UPDATE Confusion

2017-05-05 Thread Rod Taylor
Hmm.

UPDATE seems to work as described (unable to create records you cannot
select); both the single rule version and multi-rule version seem to work
the same.

This combination works too though it seems funny that UPDATE can create an
entry it cannot reverse. I can set the value to 100 (going to -1 fails) but
the UPDATE cannot see the record to set it back. I can see use cases for
it, for example you might be able to promote someone to manager but not
demote a manager to front-desk. We also allow INSERT on tables you cannot
delete from, so it's not inconsistent.

CREATE POLICY split_select ON t FOR SELECT TO split USING (value > 0);
CREATE POLICY split_update ON t FOR UPDATE TO split USING (value < 10) WITH
CHECK (value > 2);
SET session authorization split;
update t set value = 100 where value = 4; -- 1 record changed
update t set value = 5 where value = 100; -- 0 records changed


However, despite INSERT also functioning the same for both styles of
commands it's definitely not obeying the `cannot give away records` rule:

CREATE USER simple;
CREATE USER split;
CREATE TABLE t(value int);
grant select, update, insert, delete on table t to simple, split;

INSERT INTO t values (1), (2);

ALTER TABLE t ENABLE ROW LEVEL SECURITY;
CREATE POLICY simple_all ON t TO simple USING (value > 0) WITH CHECK (true);


CREATE POLICY split_select ON t FOR SELECT TO split USING (value > 0);
CREATE POLICY split_insert ON t FOR INSERT TO split WITH CHECK (true);


SET session authorization simple;
INSERT INTO t VALUES (3), (-3);
SELECT * FROM t;
 value
---
 1
 2
 3
(3 rows)


SET session authorization split;
INSERT INTO t VALUES (4), (-4);
SELECT * FROM t;
 value
---
 1
 2
 3
 4
(4 rows)


SET session authorization default;
SELECT * FROM t;
 value
---
 1
 2
 3
-3
 4
-4
(6 rows)


regards,

Rod



On Fri, May 5, 2017 at 12:10 PM, Stephen Frost  wrote:

> Rod, Robert,
>
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Fri, Apr 14, 2017 at 9:16 AM, Stephen Frost 
> wrote:
> > > I agreed already up-thread that there's an issue there and will be
> > > looking to fix it.  That comment was simply replying to Rod's point
> that
> > > the documentation could also be improved.
> >
> > OK, thanks.  The wrap for the next set of minor releases is, according
> > to my understanding, scheduled for Monday, so you'd better jump on
> > this soon if you're hoping to get a fix out this time around.
>
> The attached patch against master fixes this issue.  Rod, if you get a
> chance, would be great for you to check that you no longer see a
> difference between the single ALL policy and the split SELECT/UPDATE
> policies.
>
> Thanks!
>
> Stephen
>



-- 
Rod Taylor


Re: [HACKERS] idea: custom log_line_prefix components besides application_name

2017-05-05 Thread Robert Haas
On Thu, May 4, 2017 at 10:59 AM, Chapman Flack  wrote:
> invalid input syntax for integer: "21' && 1=2)) Uni/**/ON
> SEl/**/eCT 0x646665743166657274,0x646665743266657274,
> 0x646665743366657274 -- "

Now that is choice.  I wonder what specific database system that's targeting...

> I just wonder if anybody thinks web apps, and therefore this
> scenario, are common enough these days to maybe justify one
> or two more GUCs with their own log_line_prefix escapes, such
> as app_client_addr or app_user. Naturally they would only be
> as reliable as the app setting them, and uninterpreted by
> PostgreSQL itself, and so functionally no different from the
> uninterpreted string already available as application_name.
> The benefit is perhaps to be clearer than just overloading
> application_name to carry two or three pieces of information
> (and perhaps privacy, if you care about app user identities and
> source IPs showing up in ps titles).
>
> Worth considering, or is application_name Good Enough?

I mean, if there were a list of things that needed to propagated that
was (1) lengthy and (2) universally agreed, then we'd probably want
more than one field.  But your list is pretty short, so I guess I
don't see why you can't just join them together with a punctuation
mark of your choice and call it good.

I might be missing something, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Not getting error if ALTER SUBSCRIPTION syntax is wrong.

2017-05-05 Thread Robert Haas
On Fri, May 5, 2017 at 1:51 PM, Petr Jelinek
 wrote:
> Syntax of ALTER command is correct, syntax of the connection string is
> not, you are probably getting errors in log from the replication worker.
>
> We could check validity of the connection string though to complain
> immediately like we do in CREATE.

Well if CREATE checks the validity, surely it's a bug if ALTER doesn't
do the same.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Not getting error if ALTER SUBSCRIPTION syntax is wrong.

2017-05-05 Thread Petr Jelinek
On 05/05/17 14:40, tushar wrote:
> Hi,
> 
> While testing 'logical replication' against v10 , i encountered one
> issue where data stop migrating after ALTER PUBLICATION.
> 
> X Server
> \\ Make sure wal_level is set to logical in postgresql.conf file
> \\create table/Insert 1 row -> create table test(n int); insert into t
> values (1);
> \\create publication for all -> create publication pub for ALL TABLES ;
> 
> 
> Y server
> 
> \\ Make sure wal_level is set to logical in postgresql.conf file
> \\create table -> create table test(n int);
> 
> \\create Subscription
> 
> CREATE SUBSCRIPTION sub CONNECTION 'host=localhost dbname=postgres
> port=5432 ' PUBLICATION pub;
> 
> postgres=# select * from test;
>  n
> ---
>  1
> (1 row)
> 
> \\Alter subscription
> postgres=# alter subscription sub connection 'host=localhost
> dbname=postgres PUBLICATION pub';
> ALTER SUBSCRIPTION
> 
> X server
> postgres=# insert into test values (1);
> INSERT 0 1
> postgres=# select * from test;
>  n
> ---
>  1
>  1
> (2 rows)
> 
> Y server
> postgres=# select * from test;
>  n
> ---
>  1
> (1 row)
> 
> I think probably syntax of alter subscription is not correct but
> surprisingly it is not throwing an error.
> 

Syntax of ALTER command is correct, syntax of the connection string is
not, you are probably getting errors in log from the replication worker.

We could check validity of the connection string though to complain
immediately like we do in CREATE.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-05 Thread Petr Jelinek
On 05/05/17 18:18, Andres Freund wrote:
> On 2017-05-05 13:53:16 +0200, Petr Jelinek wrote:
>> On 05/05/17 02:42, Andres Freund wrote:
>>> On 2017-05-04 17:00:04 -0700, Andres Freund wrote:
 Attached is a prototype patch for that.
>>>
>>
>> I am not sure I understand the ABI comment for started_collection_at.
>> What's the ABI issue? The struct is private to snapbuild.c module. Or
>> you want to store it in the ondisk snapshot as well?
> 
> It's stored on-disk already :(

Hmm okay, well then I guess we'll have to store it somewhere inside
running, we should not normally care about that since we only load
CONSISTENT snapshots where running don't matter anymore. Alternatively
we could bump the SNAPBUILD_VERSION but that would make it impossible to
downgrade minor version which is bad (I guess we'll want to do that for
master, but not back-branches).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication syntax (was DROP SUBSCRIPTION, query cancellations and slot handling)

2017-05-05 Thread Petr Jelinek
On 02/05/17 15:31, Tom Lane wrote:
> Petr Jelinek  writes:
>> Let me expand, if we don't drop the slot by default when dropping
>> subscription, we'll have a lot of users with dead slots laying around
>> holding back WAL/catalog_xmin, that's really bad. If we do drop by
>> default like now, we need option to not do that, neither RESTRICT, nor
>> CASCADE fit that.
> 
> I'm not sure which part of "you can't have an option in DROP" isn't
> clear to you.  Whatever the default behavior is always has to work,
> because that is what's going to happen during DROP OWNED BY, or
> DROP DATABASE.  If you want more than one behavior during DROP,
> you need to drive that off something represented as a persistent
> property of the object, not off an option in the DROP command.
> 
> I agree that RESTRICT/CASCADE don't fit this, unless the model
> is that the slot is always owned by the subscription, which might
> be too restrictive.
> 

What do you think of attached. I actually did add RESTRICT/CASCADE, in a
way that if there is slot RESTRICT will refuse to drop subscription and
CASCADE will try to drop it. Still all errors.

The new way to not drop slot is to set slot_name to NONE which is new
value that I invented (inspiration from OWNED BY sequences) which
basically disassociates the subscription from slot.

It's slightly less automatic this way but perhaps that's not a bad
thing, at least nobody will drop slots by mistake while we still make
sure that slots are not left over without anybody noticing.

Note that this would need catalog version bump as it removes NOT NULL
constraint from subslotname.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Remove-the-NODROP-SLOT-option-from-DROP-SUBSCRIPTION.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compiler warning with VS 2017

2017-05-05 Thread Tom Lane
Petr Jelinek  writes:
> On 05/05/17 16:56, Tom Lane wrote:
>> So the comment should be something like "if the column is unchanged,
>> we should not attempt to access its value beyond this point.  To
>> help catch any such attempts, set the string to NULL" ?

> Yes that sounds about right. We don't get any data for unchanged TOAST
> columns (that's limitation of logical decoding) so we better not touch them.

OK.  I just made it say "we don't get the value of unchanged columns".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] what's up with IDENTIFIER_LOOKUP_EXPR?

2017-05-05 Thread Tom Lane
Robert Haas  writes:
> On Thu, May 4, 2017 at 6:05 PM, Tom Lane  wrote:
>> PLpgSQL_datum is really a symbol table entry.  The conflict against what
>> we mean by "Datum" elsewhere is pretty unfortunate.

> Yeah.  That's particularly bad because datum is a somewhat vague word
> under the best of circumstances (like "thing").  Maybe I'm missing
> something, but isn't it more like a parse tree than a symbol table
> entry?  The symbol table entries seem to be based on
> PLpgSQL_nsitem_type, not PLpgSQL_datum, and they only come in the
> flavors you'd expect to find in a symbol table: label, var, row, rec.

Right, there's this separate name lookup table that maps from names to
PLpgSQL_datums.  I personally wouldn't call the PLpgSQL_datum items
parse trees, since they're mostly pretty flat.  But if you want to
think of them that way I can't stop you.

One other important, and unfortunate, thing is that the PLpgSQL_datums
aren't (usually) read-only --- they not only hold static symbol-table-like
info but also the actual current values, for those datums that correspond
directly to a stored variable value (VAR and REC, but not ROW or RECFIELD
or ARRAYELEM).  This seems messy to me, and it forces a rather expensive
copy step during plpgsql function startup.  I'd like to see those duties
separated, so that the actual r/w state is just arrays of Datum/isnull
that we can trivially initialize during function start.

>> So, this representation is great for speed of access and modification
>> of individual fields of the composite variable.  It sucks when you
>> want to assign to the composite as a whole or retrieve its value as
>> a whole, because you have to deconstruct or reconstruct a tuple to
>> do that.  (The REC/RECFIELD approach has approximately the opposite
>> strengths and weaknesses.)  Also, dealing with changes in the named
>> composite type is a complete fail, because we've built its structure
>> into the function's symbol table at parse time.

> It would probably be possible to come up with a representation that
> allowed both things to be efficient,

Yeah, perhaps.  It would be good to take two steps back and reconsider
the whole data structure.

> I'm not volunteering to do the work, though.

It's not at the top of my priority list either, but I'd like to see it
happen sometime.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-05 Thread Andres Freund
On 2017-05-05 13:53:16 +0200, Petr Jelinek wrote:
> On 05/05/17 02:42, Andres Freund wrote:
> > On 2017-05-04 17:00:04 -0700, Andres Freund wrote:
> >> Attached is a prototype patch for that.
> > 
> 
> I am not sure I understand the ABI comment for started_collection_at.
> What's the ABI issue? The struct is private to snapbuild.c module. Or
> you want to store it in the ondisk snapshot as well?

It's stored on-disk already :(


> Otherwise the logic seems to be right on first read, will ponder it a
> bit more

Cool!

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row Level Security UPDATE Confusion

2017-05-05 Thread Stephen Frost
Rod, Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Apr 14, 2017 at 9:16 AM, Stephen Frost  wrote:
> > I agreed already up-thread that there's an issue there and will be
> > looking to fix it.  That comment was simply replying to Rod's point that
> > the documentation could also be improved.
> 
> OK, thanks.  The wrap for the next set of minor releases is, according
> to my understanding, scheduled for Monday, so you'd better jump on
> this soon if you're hoping to get a fix out this time around.

The attached patch against master fixes this issue.  Rod, if you get a
chance, would be great for you to check that you no longer see a
difference between the single ALL policy and the split SELECT/UPDATE
policies.

Thanks!

Stephen
diff --git a/src/backend/rewrite/rowsecurity.c b/src/backend/rewrite/rowsecurity.c
new file mode 100644
index 5c8c0cf..5a2c78b
*** a/src/backend/rewrite/rowsecurity.c
--- b/src/backend/rewrite/rowsecurity.c
*** static void add_with_check_options(Relat
*** 78,84 
  	   List *permissive_policies,
  	   List *restrictive_policies,
  	   List **withCheckOptions,
! 	   bool *hasSubLinks);
  
  static bool check_role_for_policy(ArrayType *policy_roles, Oid user_id);
  
--- 78,85 
  	   List *permissive_policies,
  	   List *restrictive_policies,
  	   List **withCheckOptions,
! 	   bool *hasSubLinks,
! 	   bool force_using);
  
  static bool check_role_for_policy(ArrayType *policy_roles, Oid user_id);
  
*** get_row_security_policies(Query *root, R
*** 272,278 
  			   permissive_policies,
  			   restrictive_policies,
  			   withCheckOptions,
! 			   hasSubLinks);
  
  		/*
  		 * Get and add ALL/SELECT policies, if SELECT rights are required for
--- 273,280 
  			   permissive_policies,
  			   restrictive_policies,
  			   withCheckOptions,
! 			   hasSubLinks,
! 			   false);
  
  		/*
  		 * Get and add ALL/SELECT policies, if SELECT rights are required for
*** get_row_security_policies(Query *root, R
*** 295,301 
     select_permissive_policies,
     select_restrictive_policies,
     withCheckOptions,
!    hasSubLinks);
  		}
  
  		/*
--- 297,304 
     select_permissive_policies,
     select_restrictive_policies,
     withCheckOptions,
!    hasSubLinks,
!    true);
  		}
  
  		/*
*** get_row_security_policies(Query *root, R
*** 324,330 
     conflict_permissive_policies,
     conflict_restrictive_policies,
     withCheckOptions,
!    hasSubLinks);
  
  			/*
  			 * Get and add ALL/SELECT policies, as WCO_RLS_CONFLICT_CHECK WCOs
--- 327,334 
     conflict_permissive_policies,
     conflict_restrictive_policies,
     withCheckOptions,
!    hasSubLinks,
!    true);
  
  			/*
  			 * Get and add ALL/SELECT policies, as WCO_RLS_CONFLICT_CHECK WCOs
*** get_row_security_policies(Query *root, R
*** 346,352 
  	   conflict_select_permissive_policies,
  	   conflict_select_restrictive_policies,
  	   withCheckOptions,
! 	   hasSubLinks);
  			}
  
  			/* Enforce the WITH CHECK clauses of the UPDATE policies */
--- 350,357 
  	   conflict_select_permissive_policies,
  	   conflict_select_restrictive_policies,
  	   withCheckOptions,
! 	   hasSubLinks,
! 	   true);
  			}
  
  			/* Enforce the WITH CHECK clauses of the UPDATE policies */
*** get_row_security_policies(Query *root, R
*** 355,361 
     conflict_permissive_policies,
     conflict_restrictive_policies,
     withCheckOptions,
!    hasSubLinks);
  		}
  	}
  
--- 360,367 
     conflict_permissive_policies,
     conflict_restrictive_policies,
     withCheckOptions,
!    hasSubLinks,
!    false);
  		}
  	}
  
*** add_with_check_options(Relation rel,
*** 659,671 
  	   List *permissive_policies,
  	   List *restrictive_policies,
  	   List **withCheckOptions,
! 	   bool *hasSubLinks)
  {
  	ListCell   *item;
  	List	   *permissive_quals = NIL;
  
  #define QUAL_FOR_WCO(policy) \
! 	( kind != WCO_RLS_CONFLICT_CHECK && \
  	  (policy)->with_check_qual != NULL ? \
  	  (policy)->with_check_qual : (policy)->qual )
  
--- 665,678 
  	   List *permissive_policies,
  	   List *restrictive_policies,
  	   List **withCheckOptions,
! 	   bool *hasSubLinks,
! 	   bool force_using)
  {
  	ListCell   *item;
  	List	   *permissive_quals = NIL;
  
  #define QUAL_FOR_WCO(policy) \
! 	( !force_using && \
  	  (policy)->with_check_qual != NULL ? \
  	  (policy)->with_check_qual : (policy)->qual )
  


signature.asc
Description: Digital signature


Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-05-05 Thread Robert Haas
On Fri, May 5, 2017 at 11:15 AM, Amit Kapila  wrote:
> I am not saying to rip those changes.  Your most of the mail stresses
> about the 30-second timeout which you have added in the patch to
> detect timeouts during failures (rollback processing).  I have no
> objection to that part of the patch except that still there is a race
> condition around PQsendQuery and you indicate that it is better to
> deal it as a separate patch to which I agree.

OK.

> The point of which I am
> not in total agreement is about the failure other than the time out.
>
> +pgfdw_exec_cleanup_query(PGconn *conn, const char *query)
> {
> ..
> + /* Get the result of the query. */
> + if (pgfdw_get_cleanup_result(conn, endtime, ))
> + return false;
> +
> + /* Issue a warning if not successful. */
> + if (PQresultStatus(result) != PGRES_COMMAND_OK)
> + {
> + pgfdw_report_error(WARNING, result, conn, true, query);
> + return false;
> + }
> ..
> }
>
> In the above code, if there is a failure due to timeout then it will
> return from first statement (pgfdw_get_cleanup_result), however, if
> there is any other failure, then it will issue a warning and return
> false.  I am not sure if there is a need to return false in the second
> case, because even without that it will work fine (and there is a
> chance that it won't drop the connection), but maybe your point is
> better to be consistent for all kind of error handling in this path.

There are three possible queries that can be executed by
pgfdw_exec_cleanup_query; let's consider them individually.

1. ABORT TRANSACTION.  If ABORT TRANSACTION fails, I think that we
have no alternative but to close the connection.  If we don't, then
the next local connection that tries to use this connection will
probably also fail, because it will find the connection inside an
aborted transaction rather than not in a transaction.  If we do close
the connection, the next transaction will try to reconnect and
everything will be fine.

2. DEALLOCATE ALL.  If DEALLOCATE ALL fails, we do not have to close
the connection, but the reason why we're running that statement in the
first place is because we've potentially lost track of which
statements are prepared on the remote side.  If we don't drop the
connection, we'll still be confused about that.  Reconnecting will fix
it.

3. ROLLBACK TO SAVEPOINT %d; RELEASE SAVEPOINT %d.  If this fails, the
local toplevel transaction is doomed.  It would be wrong to try to
commit on the remote side, because there could be remote changes made
by the aborted subtransaction which must not survive, and the ROLLBACK
TO SAVEPOINT %d command is essential in order for us to get rid of
them, and that has already failed.  So we must eventually abort the
remote transaction, which we can do either by closing the connection
or by trying to execute ABORT TRANSACTION.  The former, however, is
quick, and the latter is very possibly going to involve a long hang,
so closing the connection seems better.

In all of these cases, the failure of one of these statements seems to
me to *probably* mean that the remote side is just dead, in which case
dropping the connection is the only option.  But even if the remote
side is still alive and the statement failed for some other reason --
say somebody changed kwlist.h so that ABORT now has to be spelled
ABURT -- returning true rather than false just causes us to end up
with a remote connection whose state is out of step with the local
server state, and such a connection is not going to be usable.  I
think part of your point is that in case 3, we might still get back to
a usable connection if ABORT TRANSACTION is successfully executed
later, but that's not very likely and, even if it would have happened,
reconnecting instead doesn't really leave us any worse off.

> No, I am not saying any of those.  I hope the previous point clarifies
> what I want to say.

Apologies, but I still cannot see quite what you are getting at.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] what's up with IDENTIFIER_LOOKUP_EXPR?

2017-05-05 Thread Robert Haas
On Thu, May 4, 2017 at 6:05 PM, Tom Lane  wrote:
> PLpgSQL_datum is really a symbol table entry.  The conflict against what
> we mean by "Datum" elsewhere is pretty unfortunate.

Yeah.  That's particularly bad because datum is a somewhat vague word
under the best of circumstances (like "thing").  Maybe I'm missing
something, but isn't it more like a parse tree than a symbol table
entry?  The symbol table entries seem to be based on
PLpgSQL_nsitem_type, not PLpgSQL_datum, and they only come in the
flavors you'd expect to find in a symbol table: label, var, row, rec.
The datums on the other hand seem to consist of every kind of thing
that PLpgSQL might be expected to interpret itself, either as an
rvalue or as an lvalue.

> ROW - this is where it gets fun.  A ROW is effectively a variable
> of a possibly-anonymous composite type, and it is defined by a list
> (in its own datum) of links to PLpgSQL_datums representing the
> individual columns.  Typically the member datums would be VARs
> but that's not the only possibility.
>
> As I mentioned earlier, the case that ROW is actually well adapted
> for is multiple targets in INTO and similar constructs.  For example,
> if you have
>
> SELECT ...blah blah... INTO a,b,c
>
> then the target of the PLpgSQL_stmt_execsql is represented as a single
> ROW datum whose members are the datums for a, b, and c.  That's totally
> determined by the text of the function and can't change under us.
>
> However ... somebody thought it'd be cute to use the ROW infrastructure
> for variables of named composite types, too.  So if you have
>
> DECLARE foo some_composite_type;
>
> then the name "foo" refers to a ROW datum, and the plpgsql compiler
> generates additional anonymous VAR datums, one for each declared column
> in some_composite_type, which become the members of the ROW datum.
> The runtime representation is actually that each field value is stored
> separately in its datum, as though it were an independent VAR.  Field
> references "foo.col1" are not compiled into RECFIELD datums; we just look
> up the appropriate member datum during compile and make the expression
> tree point to that datum directly.

Ugh.

> So, this representation is great for speed of access and modification
> of individual fields of the composite variable.  It sucks when you
> want to assign to the composite as a whole or retrieve its value as
> a whole, because you have to deconstruct or reconstruct a tuple to
> do that.  (The REC/RECFIELD approach has approximately the opposite
> strengths and weaknesses.)  Also, dealing with changes in the named
> composite type is a complete fail, because we've built its structure
> into the function's symbol table at parse time.

It would probably be possible to come up with a representation that
allowed both things to be efficient, the way a slot can contain either
a Datum/isnull array or a heap tuple or both.  Call the resulting data
structure a record slot.  foo.col1 could retain the original column
name and also a cache containing a pointer to the record slot and a
column offset within the slot, so that it can say e.g.
assign_record_slot_column(slot, col, val).  But it could also register
the reference with the slot, so that if the slot structure changes,
the cached slot and column offset (or at least the column offset) are
cleared and get recomputed on next access.

I'm not volunteering to do the work, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-05-05 Thread Amit Kapila
On Fri, May 5, 2017 at 7:44 PM, Robert Haas  wrote:
> On Fri, May 5, 2017 at 1:01 AM, Amit Kapila  wrote:
>> I have tried to verify above point and it seems to me in such a
>> situation the transaction status will be either PQTRANS_INTRANS or
>> PQTRANS_INERROR.  I have tried to mimic "out of memory" failure by
>> making PQmakeEmptyPGresult return NULL (malloc failure).  AFAIU, it
>> will make the state as PQTRANS_IDLE only when the statement is
>> executed successfully.
>
> Hrm.  OK, you might be right, but still I don't see why having the 30
> second timeout is bad.  People don't want a transaction rollback to
> hang for a long period of time waiting for ABORT TRANSACTION to run if
> we could've just dropped the connection to get the same effect.  Sure,
> the next transaction will then have to reconnect, but that takes a few
> tens of milliseconds; if you wait for one extra second to avoid that
> outcome, you come out behind even if you do manage to avoid the
> reconnect.  Now for a subtransaction you could argue that it's worth
> being more patient, because forcing a toplevel abort sucks.  But what
> are the chances that, after failing to respond to a ROLLBACK; RELEASE
> SAVEPOINT within 30 seconds, the remote side is going to wake up again
> and start behaving normally?  I'd argue that it's not terribly likely
> and most people aren't going to want to wait for multiple minutes to
> see whether it does, but I suppose we could impose the 30 second
> timeout only at the toplevel and let subtransactions wait however long
> the operating system takes to time out.
>
 Also, does it matter if pgfdw_subxact_callback fails
 during execution of "ROLLBACK TO SAVEPOINT", because anyway user needs
 to execute rollback to savepoint command before proceeding and that
 should be good enough?
>>>
>>> I don't understand this.  If pgfdw_subxact_callback doesn't roll the
>>> remote side back to the savepoint, how is it going to get done later?
>>> There's no code in postgres_fdw other than that code to issue the
>>> rollback.
>>>
>> It is done via toplevel transaction ABORT.  I think how it works is
>> after "ROLLBACK TO SAVEPOINT" failure, any next execution is going to
>> fail.  Let us say if we issue Commit after failure, it will try to
>> execute RELEASE SAVEPOINT which will fail and lead to toplevel
>> transaction ABORT.  Now if the toplevel transaction ABORT also fails,
>> it will drop the connection.
>
> Mmmph.  I guess that would work, but I think it's better to track it
> explicitly.  I tried this (bar is a foreign table):
>
> begin;
> select * from bar;
> savepoint s1;
> select * from bar;
> savepoint s2;
> select * from bar;
> savepoint s3;
> select * from bar;
> commit;
>
> On the remote side, the commit statement produces this:
>
> 2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: RELEASE SAVEPOINT s4
> 2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: RELEASE SAVEPOINT s3
> 2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: RELEASE SAVEPOINT s2
> 2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: COMMIT TRANSACTION
>
> But that's obviously silly.  Somebody could easily come along and
> optimize that by getting rid of the RELEASE SAVEPOINT commands which
> are completely unnecessary if we're going to commit the toplevel
> transaction anyway, and then the argument that you're making wouldn't
> be true any more.  It seems smart to me to explicitly track whether
> the remote side is known to be in a bad state rather than relying on
> the future sequence of commands to be such that we'll get the right
> result anyway.
>

Sure, tracking explicitly might be a better way to do deal with it,
but at the cost of always dropping the connection in case of error
might not be the best thing to do.

>> Agreed, in such a situation toplevel transaction abort is the only way
>> out.  However, it seems to me that will anyway happen in current code
>> even if we don't try to force it via abort_cleanup_failure.  I
>> understand that it might be better to force it as you have done in the
>> patch, but not sure if it is good to drop the connection where
>> previously it could have done without it (for ex. if the first
>> statement Rollback To Savepoint fails, but ABORT succeeds).  I feel it
>> is worth considering whether such a behavior difference is okay as you
>> have proposed to backpatch it.
>
> Well, I think the whole point of this patch is that if one command on
> a connection fails, the chances of future commands succeeding are
> pretty poor.  It's not impossible, but it's not very likely either.
> To recap, the problem at present is that if network connectivity is
> interrupted and the statement is then killed by statement_timeout or a
> local cancel request, we try to send a cancel request to the remote
> side, which takes a while to time out.  Even if that fails, we then
> send an ABORT TRANSACTION, hoping against hope that it will succeed
> 

Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade

2017-05-05 Thread Peter Eisentraut
On 5/2/17 21:44, Noah Misch wrote:
>> I wonder if we should have an --no-subscriptions option, now that they
>> are dumped by default, just like we have --no-blobs, --no-owner,
>> --no-password, --no-privileges, --no-acl, --no-tablespaces, and
>> --no-security-labels.  It seems like there is probably a fairly large
>> use case for excluding subscriptions even if you have sufficient
>> permissions to dump them.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.

I consider this item low priority and don't plan to work on it before
all the other open items under logical replication are addressed.

(Here, working on it would include thinking further about whether it is
necessary at all or what alternatives might look like.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compiler warning with VS 2017

2017-05-05 Thread Petr Jelinek
On 05/05/17 16:56, Tom Lane wrote:
> Petr Jelinek  writes:
>> On 05/05/17 06:50, Tom Lane wrote:
>>> Actually, looking around a bit there, it's not even clear why
>>> we should be booby-trapping the value of an unchanged column in
>>> the first place.  So I'd say that not only is the code dubious
>>> but the comment is inadequate too.
> 
>> Hmm, as far as I can recollect this is just leftover debugging code that
>> was intended to help ensure that we are checking the "changed"
>> everywhere we are supposed to (since I changed handling of these
>> structured quite a bit during development). Should be changed to NULL,
>> that's what we usually do in this type of situation.
> 
> So the comment should be something like "if the column is unchanged,
> we should not attempt to access its value beyond this point.  To
> help catch any such attempts, set the string to NULL" ?
> 

Yes that sounds about right. We don't get any data for unchanged TOAST
columns (that's limitation of logical decoding) so we better not touch them.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compiler warning with VS 2017

2017-05-05 Thread Tom Lane
Petr Jelinek  writes:
> On 05/05/17 06:50, Tom Lane wrote:
>> Actually, looking around a bit there, it's not even clear why
>> we should be booby-trapping the value of an unchanged column in
>> the first place.  So I'd say that not only is the code dubious
>> but the comment is inadequate too.

> Hmm, as far as I can recollect this is just leftover debugging code that
> was intended to help ensure that we are checking the "changed"
> everywhere we are supposed to (since I changed handling of these
> structured quite a bit during development). Should be changed to NULL,
> that's what we usually do in this type of situation.

So the comment should be something like "if the column is unchanged,
we should not attempt to access its value beyond this point.  To
help catch any such attempts, set the string to NULL" ?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-05-05 Thread Peter Eisentraut
On 5/5/17 01:26, Michael Paquier wrote:
> The only code path doing HOT-pruning and generating WAL is
> heap_page_prune(). Do you think that we need to worry about FPWs as
> well?
> 
> Attached is an updated patch, which also forbids the run of any
> replication commands when the stopping state is reached.

I have committed this without the HOT pruning change.  That can be
considered separately, and I think it could use another round of
thinking about it.

I will move the open item to "Older Bugs" now, because the user
experience regression, so to speak, in version 10 has been addressed.

(This could be a backpatching candidate, but I am not planning on it for
next week's releases in any case.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why type coercion is not performed for parameters?

2017-05-05 Thread Tom Lane
Konstantin Knizhnik  writes:
> So my question is whether it is possible to use the same rule for type 
> coercion of parameters as for constant?

It's not possible, I think, and even if it is, we would almost certainly
reject a patch that tried to do it.

There are four different ways to spell type coercion of a literal:

cast('foo' as typename)
'foo'::typename
typename_thats_a_simple_identifier('foo')
typename 'foo'

The last of those is a huge syntactic PITA because it's so close to
being ambiguous against other constructs.  If you tried to allow
anything but a string literal there, it almost certainly would be
ambiguous, resulting in bison failures.  Even if you managed to
wedge it into the grammar today, I for one would vote to reject the
patch because of the near certainty that it would result in syntactic
conflicts further down the line.

The others are better for your purposes, because at least syntactically
they allow either a literal or something else as the subject of the
coercion.  But there's still an important point I think you're missing,
which is that even though these syntaxes look like type coercion (that
is, run-time conversion of values from one type to another), they are
not that when applied to a literal string.  Instead they represent
initial assignment of a type to the literal; so they feed the string
to the type's typinput function and then produce a Const with a resolved
type, not a type-coercion expression node.

Params are sort of a mess because depending on parser context, an attempt
to coerce them might result in either a runtime type coercion, or a
decision that a previously-indeterminate-type Param is now of a known
type.  The latter bears some similarities to assignment of a type to
an unknown literal, but it's not the same thing.

The code you are looking at in func_get_detail() can handle the situation
where the argument is a literal, because it knows what coerce_type() will
do in that case.  However, it does not know what coerce_type() would do
with a Param, and it can't readily find out because that information is
hidden behind a parser-hook API.  As noted in the comments in
func_get_detail, we *must not* return FUNCDETAIL_COERCION unless we know
that coerce_type will succeed, and we do not know that for the case you
are concerned with.

I could imagine extending the parser hook API, in the direction of adding
a function that can be asked "if we were to call p_coerce_param_hook on
this Param, would that be interpreted as a type assignment?".  But that
seems ugly: it's only squishily defined, and it would require a bunch of
places to supply additional hook code.

Given that you're not going to get anywhere with the "typename $1"
syntax, I don't see much point in complicating the parser hook API
to resolve the third case.  You need to think of a different way
to approach what you're trying to do.  Personally I'd think about
replacing the entire literal-with-cast construct with a Param having
already-known type.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-05-05 Thread Robert Haas
On Fri, May 5, 2017 at 1:01 AM, Amit Kapila  wrote:
> I have tried to verify above point and it seems to me in such a
> situation the transaction status will be either PQTRANS_INTRANS or
> PQTRANS_INERROR.  I have tried to mimic "out of memory" failure by
> making PQmakeEmptyPGresult return NULL (malloc failure).  AFAIU, it
> will make the state as PQTRANS_IDLE only when the statement is
> executed successfully.

Hrm.  OK, you might be right, but still I don't see why having the 30
second timeout is bad.  People don't want a transaction rollback to
hang for a long period of time waiting for ABORT TRANSACTION to run if
we could've just dropped the connection to get the same effect.  Sure,
the next transaction will then have to reconnect, but that takes a few
tens of milliseconds; if you wait for one extra second to avoid that
outcome, you come out behind even if you do manage to avoid the
reconnect.  Now for a subtransaction you could argue that it's worth
being more patient, because forcing a toplevel abort sucks.  But what
are the chances that, after failing to respond to a ROLLBACK; RELEASE
SAVEPOINT within 30 seconds, the remote side is going to wake up again
and start behaving normally?  I'd argue that it's not terribly likely
and most people aren't going to want to wait for multiple minutes to
see whether it does, but I suppose we could impose the 30 second
timeout only at the toplevel and let subtransactions wait however long
the operating system takes to time out.

>>> Also, does it matter if pgfdw_subxact_callback fails
>>> during execution of "ROLLBACK TO SAVEPOINT", because anyway user needs
>>> to execute rollback to savepoint command before proceeding and that
>>> should be good enough?
>>
>> I don't understand this.  If pgfdw_subxact_callback doesn't roll the
>> remote side back to the savepoint, how is it going to get done later?
>> There's no code in postgres_fdw other than that code to issue the
>> rollback.
>>
> It is done via toplevel transaction ABORT.  I think how it works is
> after "ROLLBACK TO SAVEPOINT" failure, any next execution is going to
> fail.  Let us say if we issue Commit after failure, it will try to
> execute RELEASE SAVEPOINT which will fail and lead to toplevel
> transaction ABORT.  Now if the toplevel transaction ABORT also fails,
> it will drop the connection.

Mmmph.  I guess that would work, but I think it's better to track it
explicitly.  I tried this (bar is a foreign table):

begin;
select * from bar;
savepoint s1;
select * from bar;
savepoint s2;
select * from bar;
savepoint s3;
select * from bar;
commit;

On the remote side, the commit statement produces this:

2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: RELEASE SAVEPOINT s4
2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: RELEASE SAVEPOINT s3
2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: RELEASE SAVEPOINT s2
2017-05-05 09:28:52.597 EDT [80263] LOG:  statement: COMMIT TRANSACTION

But that's obviously silly.  Somebody could easily come along and
optimize that by getting rid of the RELEASE SAVEPOINT commands which
are completely unnecessary if we're going to commit the toplevel
transaction anyway, and then the argument that you're making wouldn't
be true any more.  It seems smart to me to explicitly track whether
the remote side is known to be in a bad state rather than relying on
the future sequence of commands to be such that we'll get the right
result anyway.

> Agreed, in such a situation toplevel transaction abort is the only way
> out.  However, it seems to me that will anyway happen in current code
> even if we don't try to force it via abort_cleanup_failure.  I
> understand that it might be better to force it as you have done in the
> patch, but not sure if it is good to drop the connection where
> previously it could have done without it (for ex. if the first
> statement Rollback To Savepoint fails, but ABORT succeeds).  I feel it
> is worth considering whether such a behavior difference is okay as you
> have proposed to backpatch it.

Well, I think the whole point of this patch is that if one command on
a connection fails, the chances of future commands succeeding are
pretty poor.  It's not impossible, but it's not very likely either.
To recap, the problem at present is that if network connectivity is
interrupted and the statement is then killed by statement_timeout or a
local cancel request, we try to send a cancel request to the remote
side, which takes a while to time out.  Even if that fails, we then
send an ABORT TRANSACTION, hoping against hope that it will succeed
despite the failure of the cancel request.  The user ends up waiting
for the cancel request to time out and then for the ABORT TRANSACTION
to time out, which takes 15-16 minutes no matter how you set the
keepalive settings and no matter how many times you hit ^C on the
local side.

Now, if you want to allow for the possibility that some future
operation might succeed never mind past 

Re: [HACKERS] Patch - Tcl 8.6 version support for PostgreSQL

2017-05-05 Thread Alvaro Herrera
Paresh More wrote:
> Hello Alvaro,
> 
> I applied the patch which you mentioned and tried compiling  postgresql
> server for windows 32 and 64 and the results are:-
> 
> 1) postgresql server(96) with tcl version 8.5 is compiling fine with out
> any issue.
> 2) postgresql server(10) snapshot with tcl version 8.6 is also
> compiling fine without any issue.

Great, will push.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] May cause infinite loop when initializing rel-cache contains partitioned table

2017-05-05 Thread 甄明洋
The function of  initializing rel-cache is RelationCacheInitializePhase3 and in 
src/backend/utils/cache/relcache.c file.
When initializing the partition description information, we forget to check if 
partition key or descriptor is NULL. 
Therefore, after the loop restarts, the partition information will be 
initialized again, resulting in an infinite loop.
Code:
/*
* Reload partition key and descriptor for a partitioned table.
*/
if (relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
RelationBuildPartitionKey(relation);
Assert(relation->rd_partkey != NULL);


RelationBuildPartitionDesc(relation);
Assert(relation->rd_partdesc != NULL);


restart = true;
}

Re: [HACKERS] CTE inlining

2017-05-05 Thread David Rowley
On 5 May 2017 at 14:04, Tom Lane  wrote:
> Craig Ringer  writes:
>> We're carefully maintaining this bizarre cognitive dissonance where we
>> justify the need for using this as a planner hint at the same time as
>> denying that we have a hint. That makes it hard to make progress here.
>> I think there's fear that we're setting some kind of precedent by
>> admitting what we already have.
>
> I think you're overstating the case.  It's clear that there's a
> significant subset of CTE functionality where there has to be an
> optimization fence.  The initial implementation basically took the
> easy way out by deeming *all* CTEs to be optimization fences.  Maybe
> we shouldn't have documented that behavior, but we did.  Now we're
> arguing about how much of a compatibility break it'd be to change that
> planner behavior.  I don't see any particular cognitive dissonance here,
> just disagreements about the extent to which backwards compatibility is
> more important than better query optimization.

How about we get the ball rolling on this in v10 and pull that part
out of the docs. If anything that'll buy us a bit more wiggle room to
change this in v11.

I've attached a proposed patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


doc_caution_about_cte_changes_in_the_future.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] password_encryption, default and 'plain' support

2017-05-05 Thread Michael Paquier
On Thu, May 4, 2017 at 8:37 PM, Heikki Linnakangas  wrote:
> On 05/03/2017 08:40 PM, Tom Lane wrote:
>>
>> The other question I can think to ask is what will happen during
>> pg_upgrade, given an existing installation with one or more passwords
>> stored plain.  If the answer is "silently convert to MD5", I'd be
>> good with that.
>
>
> Yes, it will silently convert to MD5. That happened even on earlier
> versions, if you had password_encryption=on in the new cluster (which was
> the default).
>
> I'm planning to go ahead with the attached patch for this (removing
> password_encryption='plain' support, but keeping the default as 'md5').

The patch attached does not apply on HEAD at 499ae5f, regression tests
are conflicting.

+This option is obsolete but still accepted for backwards
+compatibility.
Isn't that incorrect English? It seems to me that this be non-plural,
as "for backward compatibility".

The comment at the top of check_password() in passwordcheck.c does not
mention scram, you may want to update that.

+   /*
+* We never store passwords in plaintext, so
this shouldn't
+* happen.
+*/
break;
An error here is overthinking?

 -- consistency of password entries
-SET password_encryption = 'plain';
-CREATE ROLE regress_passwd1 PASSWORD 'role_pwd1';
 SET password_encryption = 'md5';
Nit: this is skipping directly to role number 2.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Not getting error if ALTER SUBSCRIPTION syntax is wrong.

2017-05-05 Thread tushar

Hi,

While testing 'logical replication' against v10 , i encountered one 
issue where data stop migrating after ALTER PUBLICATION.


X Server
\\ Make sure wal_level is set to logical in postgresql.conf file
\\create table/Insert 1 row -> create table test(n int); insert into t 
values (1);

\\create publication for all -> create publication pub for ALL TABLES ;


Y server

\\ Make sure wal_level is set to logical in postgresql.conf file
\\create table -> create table test(n int);

\\create Subscription

CREATE SUBSCRIPTION sub CONNECTION 'host=localhost dbname=postgres 
port=5432 ' PUBLICATION pub;


postgres=# select * from test;
 n
---
 1
(1 row)

\\Alter subscription
postgres=# alter subscription sub connection 'host=localhost 
dbname=postgres PUBLICATION pub';

ALTER SUBSCRIPTION

X server
postgres=# insert into test values (1);
INSERT 0 1
postgres=# select * from test;
 n
---
 1
 1
(2 rows)

Y server
postgres=# select * from test;
 n
---
 1
(1 row)

I think probably syntax of alter subscription is not correct but 
surprisingly it is not throwing an error.


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Build pgoutput with MSVC

2017-05-05 Thread Petr Jelinek
On 05/05/17 12:10, Magnus Hagander wrote:
> 
> On Fri, May 5, 2017 at 11:58 AM, Michael Paquier
> > wrote:
> 
> On Fri, May 5, 2017 at 6:10 PM, MauMau  > wrote:
> > The pgoutput is not built with MSVC.  The attached patch fixes this.
> > I confirmed that a few INSERTs were replicated correctly.
> >
> > Should I add this matter in the PostgreSQL 10 Open Items page?
> 
> Yes, with Peter as committer and Petr as owner.
> 
> +my $pgoutput = $solution->AddProject(
> +'pgoutput', 'dll', '',
> +'src/backend/replication/pgoutput');
> +$pgoutput->AddReference($postgres);
> Yup, that's correct.
> 
> You have forgotten to update clean.bat, which should clean up
> pgoutput.dll.
> 
> 
> If that's all that's required, I'll just go ahead and commit it right
> away, including the clean.bat.
> 
> I think the problem with clean.bat isn't cleaning up pgoutput.dll --
> that one goes in a different directory. But it does need to clean up the
> win32ver.rc file that gets dropped there automaticaly.
> 
> The attached patch itself seems broken (it has some sort of byte order
> marker at the beginning, but removing that still breaks with "patch
> unexpectedly ends in middle of line patch:  Only garbage was found
> in the patch input.". But I can just copy/paste it manually :)
> 

Thanks for fixing this, I admit I have no idea how our windows build
system works.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-05-05 Thread Petr Jelinek
On 05/05/17 02:46, Bruce Momjian wrote:
> On Thu, May  4, 2017 at 05:09:40PM -0700, Andres Freund wrote:
 I would not in any way refer to logical decoding as being only a proof
 of concept, even before logical replication.
>>>
>>> The community ships a reliable logical _encoding_, and a test logical
>>> _decoding_.
>>
>> Yes, so what?  What you said is "I didn't think logical decoding was
>> really more than a proof-of-concept until now", which is plainly wrong,
>> given I know a significant number of users using it in production.  Some
>> of them are well known & large enterprises, and it's used to enable
>> critical things.
> 
> I am getting tired of saying this.  When I am writing the release notes,
> I am trying to figure out how it affects our shipped code, and the only
> "decoding" I know of is test_decoding.  My message was this:

I actually think the main misunderstanding here comes from the
test_decoding name interpretation. The test_decoding does not decode
anything and there are no external "decoders". The decoding happens in
core, the decoding just provides C API for plugins to consume the
decoded info (ie, heap tuples and similar). The test_decoding *tests*
the decoding API and the external projects use the decoding API as well
but they don't really decode anything, their role is of filtering and
deciding the wire protocol mostly.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-05-05 Thread Petr Jelinek
On 05/05/17 03:26, Andres Freund wrote:
> On 2017-05-04 18:23:38 -0700, Peter Geoghegan wrote:
>> On Thu, May 4, 2017 at 6:08 PM, Robert Haas  wrote:
 E.g. to power amazon's data migration service (yes, that scares me).
>>>
>>> If you recall, I did predict prior to commit that test_decoding would
>>> get put into production use regardless of the name.
>>
>> I thought you were completely wrong when you said that. Lesson
>> learned, I suppose.
> 
> At least I was a bit more optimistic that we'd get a decent json plugin
> into care soon-ish.  While there was some initial work towards that, it
> unfortunately stalled at some point.
> 
> Euler, are you perchance interested in working towards that? ;)
> 

I am happy to review any work in this area BTW.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compiler warning with VS 2017

2017-05-05 Thread Petr Jelinek
On 05/05/17 06:50, Tom Lane wrote:
> Haribabu Kommi  writes:
>> I am getting a compiler warning when I build the latest HEAD PostgreSQL with
>> visual studio 2017.
>> The code at the line is,
>> tuple->values[i] = (char *) (Size)0xdeadbeef; /* make bad usage more obvious 
>> */
> 
> Yeah, you're not the first to complain about this.  To my mind that
> coding is not pretty, not cute, and not portable: there's not even
> a good reason to believe that dereferencing the pointer would result
> in a crash.  Perhaps the author can explain to us why this is better
> than just assigning NULL.
> 
> Actually, looking around a bit there, it's not even clear why
> we should be booby-trapping the value of an unchanged column in
> the first place.  So I'd say that not only is the code dubious
> but the comment is inadequate too.

Hmm, as far as I can recollect this is just leftover debugging code that
was intended to help ensure that we are checking the "changed"
everywhere we are supposed to (since I changed handling of these
structured quite a bit during development). Should be changed to NULL,
that's what we usually do in this type of situation.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-05 Thread Petr Jelinek
On 05/05/17 02:42, Andres Freund wrote:
> On 2017-05-04 17:00:04 -0700, Andres Freund wrote:
>> Attached is a prototype patch for that.
> 

I am not sure I understand the ABI comment for started_collection_at.
What's the ABI issue? The struct is private to snapbuild.c module. Or
you want to store it in the ondisk snapshot as well?

About better name for it what about something like oldest_full_xact?

Otherwise the logic seems to be right on first read, will ponder it a
bit more

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate usage of tablespace location?

2017-05-05 Thread Neha Khatri
As Kyotaro san pointed out, the commit 22817041 started allowing creation
of multiple "tablespace version directories" in same location. However the
original purpose of that commit was to allow that just for the upgrade
purpose. So couple of points:
- The commit violated the requirement of emptiness of the tablespace
location directory.
(Though it is still prevented to create multiple tablespaces belonging
to one server, in same location.)
- The comment did not document this change in specification.

Probably it was not anticipated at that time that a user could create the
tablespaces for different server version at the same location.

Now that this behaviour is present in field for a while, there is
likelihood of having systems with tablespaces for two different versions,
in same location. To avoid the problem reported in [1] for such systems,
here are couple of alternative approaches:

1. Allow creation of multiple tablespaces in single location for different
server versions, but not the same version(exception).
a) Also allow this capability in utilities like pg_basebackup( and others
that update tablespaces) .
b) Update the documentation about this specification change.

I don't see this breaking any backwards compatibility.

2. Retain the current base rule of creating Tablespaces i.e. "The location
must be an existing, empty directory". This means:
a) For the future release, have a strict directory emptiness check while
creating new tablespace.
b) Only during upgrade, allow creation of multiple tablepaces in same
location .
c) Document the fact that only during upgrade the system would create
multiple tablespaces in same location.
d) Provide a flexibility to change the location of an existing tablespace,
something like:
ALTER TABLESPACE tblspcname SET LOCATION '/path/to/newlocation'
[where newlocation is an existing empty direcotry]

With the altered location of a tablespace it should be possible to perform
the pg_basebackup successfully.
I noticed some solutions for moving PostgreSQL tablesspaces, on internet.
But some are slow, others cause incompatibility for tools like pgAdmin. I
am not able to find any discussion about moving tablespace location in
mailing lists too. So I am not sure if there is already any conclusion
about supporting or not supporting ALTER TABLESPACE LOCATION.
To me, the first approach above looks like providing more independence to
the user about choice of tablespace location. Also, it is not clear that
why the directory emptiness rule was introduced in first place. Any insight
on that will be useful.

Regards,
Neha

[1]https://www.postgresql.org/message-id/2008148.rxBNyNRHPZ@peanuts2

Cheers,
Neha

On Fri, Apr 7, 2017 at 11:02 AM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> I don't mean that this is the only or best way to go.
>
> I apologize for the possible lack of explanation.
>
> At Thu, 06 Apr 2017 12:03:51 -0400, Tom Lane  wrote in
> <21084.1491494...@sss.pgh.pa.us>
> > Kyotaro HORIGUCHI  writes:
> > > I noticed by the following report, PostgreSQL can share the same
> > > directory as tablespaces of two servers with different
> > > pg-versions.
> >
> > > https://www.postgresql.org/message-id/2008148.rxBNyNRHPZ@peanuts2
> >
> > > 8.4 checked that the tablespace location is empty, but from 9.0,
> > > the check is replaced with creating a PG_PGVER_CATVER
> > > subdirectory. This works for multiple servers with the same
> > > version, but don't for servers with different versions.
> >
> > Please explain why you think it doesn't work.  This patch seems to
> > be reverting an intentional behavioral change, and you haven't
>
> I understand that the change is for in-place upgrade, not for
> sharing a tablespace diretory between two version of PostgreSQL
> servers. It actually rejects the second server with the same
> version to come. If this is correct, it doesn't seem right to
> accept the second server of the different version.
>
> If we allow sharing of the directory, theoretically we can allow
> the same between the same version of servers by adding system
> identifier in the subdirectory name.
>
>
> > really explained why we'd want to.  It certainly doesn't look like
> > it addresses the referenced complaint about pg_basebackup behavior.
>
> My point is that "the direcotry for newly created tablespace is
> really reuiqred to be literary empty or not?"
>
> Practically it doesn't need to be empty and succesful creation of
> PG_VER_CATVER directory is enough as the current implement
> does. If we take this way the documentation and pg_basebackup
> should be changed and the problem will be resolved as the result.
>
> https://www.postgresql.org/docs/9.6/static/manage-ag-tablespaces.html
>
> - The location must be an existing, empty directory that is owned
> - by the PostgreSQL operating system user. All objects subsequently
> - created within the tablespace will be stored in files underneath
> - this 

Re: [HACKERS] Logical replication - TRAP: FailedAssertion in pgstat.c

2017-05-05 Thread Petr Jelinek
On 03/05/17 13:23, Erik Rijkers wrote:
> On 2017-05-03 08:17, Petr Jelinek wrote:
>> On 02/05/17 20:43, Robert Haas wrote:
>>> On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
> 
 code path that calls CommitTransactionCommand() should have one, no?
>>>
>>> Is there anything left to be committed here?
>>>
>>
>> Afaics the fix was not committed. Peter wanted more comprehensive fix
>> which didn't happen. I think something like attached should do the job.
> 
> I'm running my pgbench-over-logical-replication test in chunk of 15
> minutes, wth different pgbench -c (num clients) and -s (scale) values.
> 
> With this patch (and nothing else)  on top of master (8f8b9be51fd7 to be
> precise):
> 
>> fix-statistics-reporting-in-logical-replication-work.patch
> 
> logical replication is still often failing (as expected, I suppose; it
> seems because of "inital snapshot too large") but indeed I do not see

Yes that's different thing that we've been discussing a bit in snapbuild
woes thread.

> the 'TRAP: FailedAssertion in pgstat.c' anymore.
> 
> (If there is any other configuration of patches worth testing please let
> me know)
> 

Thanks, so the patch works.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-05 Thread Petr Jelinek
On 05/05/17 02:00, Andres Freund wrote:
> Hi,
> 
> On 2017-05-02 08:55:53 +0200, Petr Jelinek wrote:
>> Aah, now I understand we talked about slightly different things, I
>> considered the running thing to be first step towards tracking aborted
>> txes everywhere.
>> I think
>> we'll have to revisit tracking of aborted transactions in PG11 then
>> though because of the 'snapshot too large' issue when exporting, at
>> least I don't see any other way to fix that.
> 
> FWIW, that seems unnecessary - we can just check for that using the
> clog.  Should be very simple to check for aborted xacts when exporting
> the snapshot (like 2 lines + comments).  That should address your
> concern, right?

Right, because there isn't practical difference between running and
aborted transaction for us so we don't mind if the abort has happened in
the future. Yeah the export code could do the check seems quite simple.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why type coercion is not performed for parameters?

2017-05-05 Thread Konstantin Knizhnik



On 05.05.2017 13:29, Marko Tiikkaja wrote:


But you know that the type of the literal "10" is int. If you're 
throwing that information away, surely that's a bug in your code.



Yes, in case of integer literal I can easily determine parameter type.
But in case of string literal I have to set UNKNOWNOID type otherwise a 
lot of queries will not work.





.m


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Why type coercion is not performed for parameters?

2017-05-05 Thread Marko Tiikkaja
On Fri, May 5, 2017 at 10:58 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> If I evaluate expression typename('literal'), then type coercion is
> performed and the function is successfully resolved, i.e.
>
> SELECT regnamespace('"pg_catalog"');
>
> But if I want to prepare this query, I get the error:
>
> postgres=#  prepare foo as SELECT regnamespace($1);
> ERROR:  function regnamespace(unknown) does not exist
> LINE 1: prepare foo as SELECT regnamespace($1);
>
> Certainly, I can explicitly specify parameter type:
>
> prepare foo (text) as SELECT regnamespace($1);
>
> and it will work. But it is not always possible.
>

There are other similar examples which have even bigger issues, such as
now() - interval '6 hours'.  now() - interval $1  won't even parse.


> Why do I need it? I want to implement autoprepare.
> My original intention was to let parse_analyze_varparams to infer type of
> parameters from the context.
> But it is not always possible  and sometime leads to different behavior of
> query.
> For example if the query:
>
>  select count(*) from test_range_gist where ir @> 10;
>
> is replaced with
>
>  select count(*) from test_range_gist where ir @> $1;
>
> then type of parameter will be int4range rather then int, which
> corresponds to the different operator.
>

But you know that the type of the literal "10" is int.  If you're throwing
that information away, surely that's a bug in your code.


.m


Re: [HACKERS] [patch] Build pgoutput with MSVC

2017-05-05 Thread Magnus Hagander
On Fri, May 5, 2017 at 11:58 AM, Michael Paquier 
wrote:

> On Fri, May 5, 2017 at 6:10 PM, MauMau  wrote:
> > The pgoutput is not built with MSVC.  The attached patch fixes this.
> > I confirmed that a few INSERTs were replicated correctly.
> >
> > Should I add this matter in the PostgreSQL 10 Open Items page?
>
> Yes, with Peter as committer and Petr as owner.
>
> +my $pgoutput = $solution->AddProject(
> +'pgoutput', 'dll', '',
> +'src/backend/replication/pgoutput');
> +$pgoutput->AddReference($postgres);
> Yup, that's correct.
>
> You have forgotten to update clean.bat, which should clean up pgoutput.dll.
>

If that's all that's required, I'll just go ahead and commit it right away,
including the clean.bat.

I think the problem with clean.bat isn't cleaning up pgoutput.dll -- that
one goes in a different directory. But it does need to clean up the
win32ver.rc file that gets dropped there automaticaly.

The attached patch itself seems broken (it has some sort of byte order
marker at the beginning, but removing that still breaks with "patch
unexpectedly ends in middle of line patch:  Only garbage was found in
the patch input.". But I can just copy/paste it manually :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] [patch] Build pgoutput with MSVC

2017-05-05 Thread Michael Paquier
On Fri, May 5, 2017 at 6:10 PM, MauMau  wrote:
> The pgoutput is not built with MSVC.  The attached patch fixes this.
> I confirmed that a few INSERTs were replicated correctly.
>
> Should I add this matter in the PostgreSQL 10 Open Items page?

Yes, with Peter as committer and Petr as owner.

+my $pgoutput = $solution->AddProject(
+'pgoutput', 'dll', '',
+'src/backend/replication/pgoutput');
+$pgoutput->AddReference($postgres);
Yup, that's correct.

You have forgotten to update clean.bat, which should clean up pgoutput.dll.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [patch] Build pgoutput with MSVC

2017-05-05 Thread MauMau
Hello,

I tried logical replication on Windows, but it failed like this:

postgres=# create subscription mysub connection 'host=localhost
port=5433 user=tuna dbname=postgres' publication mypub with (nocopy
data);
NOTICE:  synchronized table states
ERROR:  could not create replication slot "mysub": ERROR:  could not
access file "pgoutput": No such file or directory
postgres=#

The pgoutput is not built with MSVC.  The attached patch fixes this.
I confirmed that a few INSERTs were replicated correctly.

Should I add this matter in the PostgreSQL 10 Open Items page?

Regards
MauMau


msvc_build_pgoutput.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-05-05 Thread Michael Paquier
On Fri, May 5, 2017 at 5:33 PM, Pavan Deolasee  wrote:
>
>
> On Fri, May 5, 2017 at 10:56 AM, Michael Paquier 
> wrote:
>>
>> On Wed, May 3, 2017 at 12:25 AM, Peter Eisentraut
>>
>>
>> >>> Can we prevent HOT pruning during logical decoding?
>> >>
>> >> It does not sound much difficult to do, couldn't you just make it a
>> >> no-op with am_walsender?
>> >
>> > That's my hope.
>>
>> The only code path doing HOT-pruning and generating WAL is
>> heap_page_prune(). Do you think that we need to worry about FPWs as
>> well?
>
>
> IMO the check should go inside heap_page_prune_opt(). Do we need to worry
> about wal_log_hints or checksums producing WAL because of hint bit updates?
> While I haven't read the thread, I am assuming if HOT pruning can happen,
> surely hint bits can get set too.

Yeah, that's as well what I am worrying about. Experts of logical
decoding will correct me, but it seems to me that we have to cover all
the cases where heap scans can generate WAL.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why type coercion is not performed for parameters?

2017-05-05 Thread Konstantin Knizhnik

Hi hackers,

If I evaluate expression typename('literal'), then type coercion is 
performed and the function is successfully resolved, i.e.


SELECT regnamespace('"pg_catalog"');

But if I want to prepare this query, I get the error:

postgres=#  prepare foo as SELECT regnamespace($1);
ERROR:  function regnamespace(unknown) does not exist
LINE 1: prepare foo as SELECT regnamespace($1);

Certainly, I can explicitly specify parameter type:

prepare foo (text) as SELECT regnamespace($1);

and it will work. But it is not always possible.

Actually coerce_type function can normally handle parameters.
But func_get_detail always allows coercion only for constants:


if (sourceType == UNKNOWNOID && IsA(arg1, Const))
{
/* always treat typename('literal') as coercion */
iscoercion = true;
}

If this condition is changed to:

if (sourceType == UNKNOWNOID && (IsA(arg1, Const) || 
IsA(arg1, Param)))


then the example above will normally work.

Why do I need it? I want to implement autoprepare.
My original intention was to let parse_analyze_varparams to infer type 
of parameters from the context.
But it is not always possible  and sometime leads to different behavior 
of query.

For example if the query:

 select count(*) from test_range_gist where ir @> 10;

is replaced with

 select count(*) from test_range_gist where ir @> $1;

then type of parameter will be int4range rather then int, which 
corresponds to the different operator.


This is why now I infer parameter type from literal value. But in this 
case I get errors in parse_analyze_varparams which is not able to 
resolve some functions.
The fix in func_get_detail functions solves the problem and doesn't 
cause some new issues: all regression tests are passed.


So my question is whether it is possible to use the same rule for type 
coercion of parameters as for constant?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch - Tcl 8.6 version support for PostgreSQL

2017-05-05 Thread Paresh More
Hello Alvaro,

I applied the patch which you mentioned and tried compiling  postgresql
server for windows 32 and 64 and the results are:-

1) postgresql server(96) with tcl version 8.5 is compiling fine with out
any issue.
2) postgresql server(10) snapshot with tcl version 8.6 is also
compiling fine without any issue.






On Thu, May 4, 2017 at 8:26 PM, Dave Page  wrote:

>
>
> On Thu, May 4, 2017 at 3:54 PM, Tom Lane  wrote:
>
>> Alvaro Herrera  writes:
>> > Something like the (untested) attached perhaps?
>>
>> Looks plausible, I'm not in a position to test though.
>
>
> Sandeep/Paresh - can you test please?
>
> --
> Dave Page
> Blog: http://pgsnake.blogspot.com
> Twitter: @pgsnake
>
> EnterpriseDB UK: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 

Thanks & Regards

*Paresh More*

[image: NEW-EDB-logo-4c]

Pune, India.
Cell :  +919922000564 |  www.enterprisedb.com


Re: [HACKERS] Error message on missing SCRAM authentication with older clients

2017-05-05 Thread Heikki Linnakangas

On 05/05/2017 11:26 AM, Magnus Hagander wrote:

On Fri, May 5, 2017 at 10:19 AM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:


psql: SCRAM authentication requires libpq version 10 or above


Sounds good.


+1.


Ok, committed. Thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-05-05 Thread Pavan Deolasee
On Fri, May 5, 2017 at 10:56 AM, Michael Paquier 
wrote:

> On Wed, May 3, 2017 at 12:25 AM, Peter Eisentraut
>
>
> >>> Can we prevent HOT pruning during logical decoding?
> >>
> >> It does not sound much difficult to do, couldn't you just make it a
> >> no-op with am_walsender?
> >
> > That's my hope.
>
> The only code path doing HOT-pruning and generating WAL is
> heap_page_prune(). Do you think that we need to worry about FPWs as
> well?
>

IMO the check should go inside heap_page_prune_opt(). Do we need to worry
about wal_log_hints or checksums producing WAL because of hint bit updates?
While I haven't read the thread, I am assuming if HOT pruning can happen,
surely hint bits can get set too.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Error message on missing SCRAM authentication with older clients

2017-05-05 Thread Magnus Hagander
On Fri, May 5, 2017 at 10:19 AM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> Hi Heikki,
>
> > psql: SCRAM authentication requires libpq version 10 or above
>
> Sounds good.
>
>
+1.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Error message on missing SCRAM authentication with older clients

2017-05-05 Thread Aleksander Alekseev
Hi Heikki,

> psql: SCRAM authentication requires libpq version 10 or above

Sounds good.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] password_encryption, default and 'plain' support

2017-05-05 Thread Magnus Hagander
On Fri, May 5, 2017 at 9:38 AM, Albe Laurenz 
wrote:

> Tom Lane wrote:
> > Robert Haas  writes:
> >> On Wed, May 3, 2017 at 7:31 AM, Heikki Linnakangas 
> wrote:
> >>> So, I propose that we remove support for password_encryption='plain' in
> >>> PostgreSQL 10. If you try to do that, you'll get an error.
>
> >> I have no idea how widely used that option is.
>
> > Is it possible that there are still client libraries that don't support
> > password encryption at all?  If so, are we willing to break them?
> > I'd say "yes" but it's worth thinking about.
>
> We have one application that has been reduced to "password" authentication
> ever since "crypt" authentication was removed, because they implemented the
> line protocol rather than using libpq and never bothered to move to "md5".
>
> But then, it might be a good idea to break this application, because that
> would force the vendor to implement something that is not a
> blatant security problem.
>

It might. But I'm pretty sure the suggestion does not include removing the
"password" authentication type, that one will still exist. This is just
about password *storage*.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-05-05 Thread Dmitriy Sarafannikov
Amit, thanks for comments!

> 1.
> +#define InitNonVacuumableSnapshot(snapshotdata)  \
> + do { \
> + (snapshotdata).satisfies = HeapTupleSatisfiesNonVacuumable; \
> + (snapshotdata).xmin = RecentGlobalDataXmin; \
> + } while(0)
> +
> 
> Can you explain and add comments why you think RecentGlobalDataXmin is
> the right to use it here?  As of now, it seems to be only used for
> pruning non-catalog tables.

Can you explain me, what value for xmin should be used there?

> 2.
> +bool
> +HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
> + Buffer buffer)
> +{
> + return HeapTupleSatisfiesVacuum(htup, snapshot->xmin, buffer)
> + != HEAPTUPLE_DEAD;
> +}
> +
> 
> Add comments on top of this function and for the sake of consistency
> update the file header as well (Summary of visibility functions:)

Yes, i will add comments and send new patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-05-05 Thread Amit Kapila
On Thu, May 4, 2017 at 9:42 PM, Dmitriy Sarafannikov
 wrote:
>
>> Maybe we need another type of snapshot that would accept any
>> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,
>> but a tuple that was live more recently than the xmin horizon seems
>> like it's acceptable enough.  HeapTupleSatisfiesVacuum already
>> implements the right behavior, but we don't have a Snapshot-style
>> interface for it.
>
>
> I have tried to implement this new type of snapshot that accepts any
> non-vacuumable tuples.
> We have tried this patch in our load environment. And it has smoothed out
> and reduced magnitude of the cpu usage peaks.
> But this snapshot does not solve the problem completely.
>
> Patch is attached.

1.
+#define InitNonVacuumableSnapshot(snapshotdata)  \
+ do { \
+ (snapshotdata).satisfies = HeapTupleSatisfiesNonVacuumable; \
+ (snapshotdata).xmin = RecentGlobalDataXmin; \
+ } while(0)
+

Can you explain and add comments why you think RecentGlobalDataXmin is
the right to use it here?  As of now, it seems to be only used for
pruning non-catalog tables.

2.
+bool
+HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
+ Buffer buffer)
+{
+ return HeapTupleSatisfiesVacuum(htup, snapshot->xmin, buffer)
+ != HEAPTUPLE_DEAD;
+}
+

Add comments on top of this function and for the sake of consistency
update the file header as well (Summary of visibility functions:)

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] password_encryption, default and 'plain' support

2017-05-05 Thread Albe Laurenz
Tom Lane wrote:
> Robert Haas  writes:
>> On Wed, May 3, 2017 at 7:31 AM, Heikki Linnakangas  wrote:
>>> So, I propose that we remove support for password_encryption='plain' in
>>> PostgreSQL 10. If you try to do that, you'll get an error.

>> I have no idea how widely used that option is.

> Is it possible that there are still client libraries that don't support
> password encryption at all?  If so, are we willing to break them?
> I'd say "yes" but it's worth thinking about.

We have one application that has been reduced to "password" authentication
ever since "crypt" authentication was removed, because they implemented the
line protocol rather than using libpq and never bothered to move to "md5".

But then, it might be a good idea to break this application, because that
would force the vendor to implement something that is not a
blatant security problem.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change GetLastImportantRecPtr's definition? (wasSkip checkpoints, archiving on idle systems.)

2017-05-05 Thread Amit Kapila
On Fri, May 5, 2017 at 11:57 AM, Andres Freund  wrote:
> Hi,
>
> On 2017-05-05 11:50:12 +0530, Amit Kapila wrote:
>> I see that EndPos can be updated in one of the cases after releasing
>> the lock (refer below code). Won't that matter?
>
> I can't see how it'd in the cases that'd matter for
> GetLastImportantRecPtr() - but it'd probably good to note it in the
> comment.
>

I think it should matter for any record which is not tagged as
XLOG_MARK_UNIMPORTANT, but maybe I am missing something in which case
comment should be fine.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-05-05 Thread Albe Laurenz
Thomas Kellerer wrote:
>> 1) we switch unmarked CTEs as inlineable by default in pg11.
>
> +1 from me for option 1

+1 from me as well, FWIW.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-05 Thread Jeevan Ladhe
Hi Rahila,

I am not able add a new partition if default partition is further
partitioned
with default partition.

Consider example below:

postgres=# CREATE TABLE test ( a int, b int, c int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE test_p1 PARTITION OF test FOR VALUES IN(4, 5, 6, 7,
8);
CREATE TABLE
postgres=# CREATE TABLE test_pd PARTITION OF test DEFAULT PARTITION BY
LIST(b);
CREATE TABLE
postgres=# CREATE TABLE test_pd_pd PARTITION OF test_pd DEFAULT;
CREATE TABLE
postgres=# INSERT INTO test VALUES (20, 24, 12);
INSERT 0 1
*postgres=# CREATE TABLE test_p2 PARTITION OF test FOR VALUES IN(15);*
*ERROR:  could not open file "base/12335/16420": No such file or directory*


Thanks,
Jeevan Ladhe

On Fri, May 5, 2017 at 11:55 AM, Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

> Hi Rahila,
>
> pg_restore is failing for default partition, dump file still storing old
> syntax of default partition.
>
> create table lpd (a int, b int, c varchar) partition by list(a);
> create table lpd_d partition of lpd DEFAULT;
>
> create database bkp owner 'edb';
> grant all on DATABASE bkp to edb;
>
> --take plain dump of existing database
> \! ./pg_dump -f lpd_test.sql -Fp -d postgres
>
> --restore plain backup to new database bkp
> \! ./psql -f lpd_test.sql -d bkp
>
> psql:lpd_test.sql:63: ERROR:  syntax error at or near "DEFAULT"
> LINE 2: FOR VALUES IN (DEFAULT);
>^
>
>
> vi lpd_test.sql
>
> --
> -- Name: lpd; Type: TABLE; Schema: public; Owner: edb
> --
>
> CREATE TABLE lpd (
> a integer,
> b integer,
> c character varying
> )
> PARTITION BY LIST (a);
>
>
> ALTER TABLE lpd OWNER TO edb;
>
> --
> -- Name: lpd_d; Type: TABLE; Schema: public; Owner: edb
> --
>
> CREATE TABLE lpd_d PARTITION OF lpd
> FOR VALUES IN (DEFAULT);
>
>
> ALTER TABLE lpd_d OWNER TO edb;
>
>
> Thanks,
> Rajkumar
>


Re: [HACKERS] Some thoughts about SCRAM implementation

2017-05-05 Thread Heikki Linnakangas

On 04/10/2017 09:28 PM, Álvaro Hernández Tortosa wrote:

On 10/04/17 13:02, Heikki Linnakangas wrote:

On 04/10/2017 12:39 PM, Álvaro Hernández Tortosa wrote:

* The nonce length is not specified by the RFC. I see typical
implementations use 24 chars for the client and 18 for the server.
Current code uses 10. I think it should not hurt making it at least 16
or 18.


Wouldn't hurt, I guess. IIRC I checked some other implementations,
when I picked 10, but I don't remember which ones anymore. Got a
reference for 24/18?


 First reference is the RFC example itself (non-mandatory, of
course). But then I saw many followed this. As a quick example, GNU SASL
defines:

#define SNONCE_ENTROPY_BYTES 18
https://www.gnu.org/software/gsasl/coverage/lib/scram/server.c.gcov.frameset.html


Ok, I bumped up the nonce lengths to 18 raw bytes. Thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Authentication tests, and plain 'password' authentication with a SCRAM verifier

2017-05-05 Thread Heikki Linnakangas

On 03/14/2017 09:25 PM, Heikki Linnakangas wrote:

On 03/14/2017 09:02 PM, Jeff Janes wrote:

The message returned to the client for the wrong password differs between
pg_hba-set scram and pg_hba-set md5/password methods.  Is that OK?

psql: error received from server in SASL exchange: invalid-proof

psql: FATAL:  password authentication failed for user "test"


Ah yeah, I was on the fence on that one. Currently, the server returns
the invalid-proof error to the client, as defined in RFC5802. That
results in that error message from libpq. Alternatively, the server
could elog(FATAL), like the other authentication mechanisms do, with the
same message. The RFC allows that behavior too but returning the
invalid-proof error code is potentially more friendly to 3rd party SCRAM
implementations.

One option would be to recognize the "invalid-proof" message in libpq,
and construct a more informative error message in libpq. Could use the
same wording, "password authentication failed", but it would behave
differently wrt. translation, at least.


I went ahead and changed the backend code to not send the 
"invalid-proof" error. That seemed like the easiest fix for this. You 
now get the same "password authentication failed" error as with MD5 and 
plain password authentication.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication - TRAP: FailedAssertion in pgstat.c

2017-05-05 Thread Noah Misch
On Wed, May 03, 2017 at 01:23:19PM +0200, Erik Rijkers wrote:
> On 2017-05-03 08:17, Petr Jelinek wrote:
> >On 02/05/17 20:43, Robert Haas wrote:
> >>On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
> 
> >>>code path that calls CommitTransactionCommand() should have one, no?
> >>
> >>Is there anything left to be committed here?
> >>
> >
> >Afaics the fix was not committed. Peter wanted more comprehensive fix
> >which didn't happen. I think something like attached should do the job.
> 
> I'm running my pgbench-over-logical-replication test in chunk of 15 minutes,
> wth different pgbench -c (num clients) and -s (scale) values.
> 
> With this patch (and nothing else)  on top of master (8f8b9be51fd7 to be
> precise):
> 
> >fix-statistics-reporting-in-logical-replication-work.patch
> 
> logical replication is still often failing (as expected, I suppose; it seems
> because of "inital snapshot too large") but indeed I do not see the 'TRAP:
> FailedAssertion in pgstat.c' anymore.
> 
> (If there is any other configuration of patches worth testing please let me
> know)

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication ApplyContext bloat

2017-05-05 Thread Noah Misch
On Wed, May 03, 2017 at 12:02:41PM +0300, Stas Kelvich wrote:
> > On 20 Apr 2017, at 17:01, Dilip Kumar  wrote:
> > On Thu, Apr 20, 2017 at 7:04 PM, Stas Kelvich  
> > wrote:
> >> Thanks for noting.
> >> 
> >> Added short description of ApplyContext and ApplyMessageContext to README.
> > 
> > Typo
> > 
> > /analysys/analysis
> > 
> 
> Fixed, thanks.
> 
> Added this to open items list.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time based lag tracking for logical replication

2017-05-05 Thread Noah Misch
On Wed, May 03, 2017 at 08:28:53AM +0200, Simon Riggs wrote:
> On 23 April 2017 at 01:10, Petr Jelinek  wrote:
> > Hi,
> >
> > The time based lag tracking commit [1] added interface for logging
> > progress of replication so that we can report lag as time interval
> > instead of just bytes. But the patch didn't contain patch for the
> > builtin logical replication.
> >
> > So I wrote something that implements this. I didn't like all that much
> > the API layering in terms of exporting the walsender's LagTrackerWrite()
> > for use by plugin directly. Normally output plugin does not have to care
> > if it's running under walsender or not, it uses abstracted write
> > interface for that which can be implemented in various ways (that's how
> > we implement SQL interface to logical decoding after all). So I decided
> > to add another function to the logical decoding write api called
> > update_progress and call that one from the output plugin. The walsender
> > then implements that new API to call the LagTrackerWrite() while the SQL
> > interface just does not implement it at all. This seems like cleaner way
> > of doing it.
> >
> > Thoughts?
> 
> Agree cleaner.
> 
> I don't see any pacing or comments about it, nor handling of
> intermediate messages while we process a large transaction.
> 
> I'll look at adding some pacing code in WalSndUpdateProgress()

[Action required within three days.]

Simon, I understand, from an annotation on the open items list, that you have
volunteered to own this item.  Please observe the policy on open item
ownership[1] and send a status update within three calendar days of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change GetLastImportantRecPtr's definition? (wasSkip checkpoints, archiving on idle systems.)

2017-05-05 Thread Andres Freund
Hi,

On 2017-05-05 11:50:12 +0530, Amit Kapila wrote:
> I see that EndPos can be updated in one of the cases after releasing
> the lock (refer below code). Won't that matter?

I can't see how it'd in the cases that'd matter for
GetLastImportantRecPtr() - but it'd probably good to note it in the
comment.

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-05 Thread Rajkumar Raghuwanshi
Hi Rahila,

pg_restore is failing for default partition, dump file still storing old
syntax of default partition.

create table lpd (a int, b int, c varchar) partition by list(a);
create table lpd_d partition of lpd DEFAULT;

create database bkp owner 'edb';
grant all on DATABASE bkp to edb;

--take plain dump of existing database
\! ./pg_dump -f lpd_test.sql -Fp -d postgres

--restore plain backup to new database bkp
\! ./psql -f lpd_test.sql -d bkp

psql:lpd_test.sql:63: ERROR:  syntax error at or near "DEFAULT"
LINE 2: FOR VALUES IN (DEFAULT);
   ^


vi lpd_test.sql

--
-- Name: lpd; Type: TABLE; Schema: public; Owner: edb
--

CREATE TABLE lpd (
a integer,
b integer,
c character varying
)
PARTITION BY LIST (a);


ALTER TABLE lpd OWNER TO edb;

--
-- Name: lpd_d; Type: TABLE; Schema: public; Owner: edb
--

CREATE TABLE lpd_d PARTITION OF lpd
FOR VALUES IN (DEFAULT);


ALTER TABLE lpd_d OWNER TO edb;


Thanks,
Rajkumar


Re: [HACKERS] Change GetLastImportantRecPtr's definition? (wasSkip checkpoints, archiving on idle systems.)

2017-05-05 Thread Amit Kapila
On Fri, May 5, 2017 at 11:43 AM, Andres Freund  wrote:
> On 2017-05-05 11:04:14 +0530, Amit Kapila wrote:
>> On Fri, May 5, 2017 at 6:54 AM, Andres Freund  wrote:
>> > Hi,
>> >
>> > On 2016-12-22 19:33:30 +, Andres Freund wrote:
>> >> Skip checkpoints, archiving on idle systems.
>> >
>> > As part of an independent bugfix I noticed that Michael & I appear to
>> > have introduced an off-by-one here. A few locations do comparisons like:
>> > /*
>> >  * Only log if enough time has passed and interesting records 
>> > have
>> >  * been inserted since the last snapshot.
>> >  */
>> > if (now >= timeout &&
>> > last_snapshot_lsn < GetLastImportantRecPtr())
>> > {
>> > last_snapshot_lsn = LogStandbySnapshot();
>> > ...
>> >
>> > which looks reasonable on its face.  But LogStandbySnapshot (via 
>> > XLogInsert())
>> >  * Returns XLOG pointer to end of record (beginning of next record).
>> >  * This can be used as LSN for data pages affected by the logged action.
>> >  * (LSN is the XLOG point up to which the XLOG must be flushed to disk
>> >  * before the data page can be written out.  This implements the basic
>> >  * WAL rule "write the log before the data".)
>> >
>> > and GetLastImportantRecPtr
>> >  * GetLastImportantRecPtr -- Returns the LSN of the last important record
>> >  * inserted. All records not explicitly marked as unimportant are 
>> > considered
>> >  * important.
>> >
>> > which means that we'll e.g. not notice if there's exactly a *single* WAL
>> > record since the last logged snapshot (and likely similar in the other
>> > users of GetLastImportantRecPtr()), because XLogInsert() will return
>> > where the next record will most of the time be inserted, and
>> > GetLastImportantRecPtr() returns the beginning of said record.
>> >
>> > This is trivially fixable by replacing < with <=.  But I wonder if the
>> > better fix would be to redefine GetLastImportantRecPtr() to point to the
>> > end of the record, too?
>> >
>>
>> If you think it is straightforward to note the end of the record, then
>> that sounds sensible thing to do.  However, note that we remember the
>> position based on lockno and lock is released before we can determine
>> the EndPos.
>
> I'm not sure I'm following:
>
> XLogRecPtr
> XLogInsertRecord(XLogRecData *rdata,
>  XLogRecPtr fpw_lsn,
>  uint8 flags)
> {
> ...
> /*
>  * Unless record is flagged as not important, update LSN of 
> last
>  * important record in the current slot. When holding all 
> locks, just
>  * update the first one.
>  */
> if ((flags & XLOG_MARK_UNIMPORTANT) == 0)
> {
> int lockno = holdingAllLocks ? 0 : MyLockNo;
>
> WALInsertLocks[lockno].l.lastImportantAt = StartPos;
> }
>
> is the relevant bit - what prevents us from just using EndPos instead?
>

I see that EndPos can be updated in one of the cases after releasing
the lock (refer below code). Won't that matter?

/*
* Even though we reserved the rest of the segment for us, which is
* reflected in EndPos, we return a pointer to just the end of the
* xlog-switch record.
*/

if (inserted)
{
EndPos = StartPos + SizeOfXLogRecord;
if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
{
if (EndPos % XLOG_SEG_SIZE == EndPos % XLOG_BLCKSZ)
EndPos += SizeOfXLogLongPHD;
else
EndPos += SizeOfXLogShortPHD;
}
}




-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change GetLastImportantRecPtr's definition? (wasSkip checkpoints, archiving on idle systems.)

2017-05-05 Thread Andres Freund
On 2017-05-05 11:04:14 +0530, Amit Kapila wrote:
> On Fri, May 5, 2017 at 6:54 AM, Andres Freund  wrote:
> > Hi,
> >
> > On 2016-12-22 19:33:30 +, Andres Freund wrote:
> >> Skip checkpoints, archiving on idle systems.
> >
> > As part of an independent bugfix I noticed that Michael & I appear to
> > have introduced an off-by-one here. A few locations do comparisons like:
> > /*
> >  * Only log if enough time has passed and interesting records 
> > have
> >  * been inserted since the last snapshot.
> >  */
> > if (now >= timeout &&
> > last_snapshot_lsn < GetLastImportantRecPtr())
> > {
> > last_snapshot_lsn = LogStandbySnapshot();
> > ...
> >
> > which looks reasonable on its face.  But LogStandbySnapshot (via 
> > XLogInsert())
> >  * Returns XLOG pointer to end of record (beginning of next record).
> >  * This can be used as LSN for data pages affected by the logged action.
> >  * (LSN is the XLOG point up to which the XLOG must be flushed to disk
> >  * before the data page can be written out.  This implements the basic
> >  * WAL rule "write the log before the data".)
> >
> > and GetLastImportantRecPtr
> >  * GetLastImportantRecPtr -- Returns the LSN of the last important record
> >  * inserted. All records not explicitly marked as unimportant are considered
> >  * important.
> >
> > which means that we'll e.g. not notice if there's exactly a *single* WAL
> > record since the last logged snapshot (and likely similar in the other
> > users of GetLastImportantRecPtr()), because XLogInsert() will return
> > where the next record will most of the time be inserted, and
> > GetLastImportantRecPtr() returns the beginning of said record.
> >
> > This is trivially fixable by replacing < with <=.  But I wonder if the
> > better fix would be to redefine GetLastImportantRecPtr() to point to the
> > end of the record, too?
> >
> 
> If you think it is straightforward to note the end of the record, then
> that sounds sensible thing to do.  However, note that we remember the
> position based on lockno and lock is released before we can determine
> the EndPos.

I'm not sure I'm following:

XLogRecPtr
XLogInsertRecord(XLogRecData *rdata,
 XLogRecPtr fpw_lsn,
 uint8 flags)
{
...
/*
 * Unless record is flagged as not important, update LSN of last
 * important record in the current slot. When holding all 
locks, just
 * update the first one.
 */
if ((flags & XLOG_MARK_UNIMPORTANT) == 0)
{
int lockno = holdingAllLocks ? 0 : MyLockNo;

WALInsertLocks[lockno].l.lastImportantAt = StartPos;
}

is the relevant bit - what prevents us from just using EndPos instead?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

2017-05-05 Thread Amit Kapila
On Fri, May 5, 2017 at 7:07 AM, Andres Freund  wrote:
> On 2017-05-02 15:13:58 -0400, Robert Haas wrote:
>> On Tue, Apr 18, 2017 at 2:48 AM, Amit Khandekar  
>> wrote:
>> The main things that keeps this from being a crippling issue right now
>> is the fact that we tend not to use that many parallel workers in the
>> first place.  We're trying to scale a query that would otherwise use 1
>> process out to 3 or 5 processes, and so the contention effects, in
>> many cases, are not too bad.  Multiple people (including David Rowley
>> as well as folks here at EnterpriseDB) have demonstrated that for
>> certain queries, we can actually use a lot more workers and everything
>> works great.  The problem is that for other queries, using a lot of
>> workers works terribly.  The planner doesn't know how to figure out
>> which it'll be - and honestly, I don't either.
>
> Have those benchmarks, even in a very informal form, been shared /
> collected / referenced centrally?
>

The numbers have been posted on parallel seq. scan the thread and more
formally shared in PGCon presentation ([1], refer slide-15).

I'd be very interested to know where
> the different contention points are. Possibilities:
>
> - in non-resident workloads: too much concurrent IOs submitted, leading
>   to overly much random IO
> - internal contention in the the parallel node, e.g. parallel seqscan
>

I think one of the points of scaling/contention is tuple
communication.  This is what is shown is perf profiles and we (one of
my colleagues is working on it) are already working on some ways to
improve the same, but I don't think we can get anywhere near to linear
scaling by improving the same.


[1] - https://www.pgcon.org/2015/schedule/events/785.en.html

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

2017-05-05 Thread Amit Khandekar
On 5 May 2017 at 07:50, David Rowley  wrote:
>  On 3 May 2017 at 07:13, Robert Haas  wrote:
>> It is of course possible that the Parallel Seq Scan could run into
>> contention problems if the number of workers is large, but in my
>> experience there are bigger problems here.  The non-parallel Seq Scan
>> can also contend -- not of course over the shared mutex because there
>> isn't one, but over access to the blocks themselves.  Every one of
>> those blocks has a content lock and a buffer header and so on, and
>> having multiple processes accessing those things at the same time
>> scales well, but not perfectly.  The Hash node can also contend: if
>> the hash join spills to disk, you've got multiple processes reading
>> and writing to the temp directory at the same time and, of course,
>> that can be worse than just one process doing it -- sometimes much
>> worse.  It can also be better, depending on how much I/O gets
>> generated and how much I/O bandwidth you have.
>
> Yeah, I did get some time to look over the contention in Parallel Seq
> Scan a while back and I discovered that on the machine that I was
> testing on. the lock obtained in heap_parallelscan_nextpage() was
> causing workers to have to wait for other workers to fetch their next
> task to work on.
>
> I ended up writing the attached (which I'd not intended to post until
> some time closer to when the doors open for PG11). At the moment it's
> basically just a test patch to see how it affects things when we give
> workers a bit more to do before they come back to look for more work.
> In this case, I've just given them 10 pages to work on, instead of the
> 1 that's allocated in 9.6 and v10.
>
> A quick test on a pretty large table on a large machine shows:
>
> Unpatched:
>
> postgres=# select count(*) from a;
>count
> 
>  187400
> (1 row)
>
> Time: 5211.485 ms (00:05.211)
>
> Patched:
>
> postgres=# select count(*) from a;
>count
> 
>  187400
> (1 row)
>
> Time: 2523.983 ms (00:02.524)

The result is quite impressive.

>
> So it seems worth looking into. "a" was just a table with a single int
> column. I'm unsure as yet if there are more gains to be had for tables
> with wider tuples. I do suspect the patch will be a bigger win in
> those cases, since there's less work to do for each page, e.g less
> advance aggregate calls, so likely they'll be looking for their next
> page a bit sooner.
>
> Now I'm not going to pretend that this patch is ready for the
> prime-time. I've not yet worked out how to properly report sync-scan
> locations without risking reporting later pages after reporting the
> end of the scan. What I have at the moment could cause a report to be
> missed if SYNC_SCAN_REPORT_INTERVAL is not divisible by the batch
> size. I'm also not sure how batching like this affect read-aheads, but
> at least the numbers above speak for something. Although none of the
> pages in this case came from disk.
>
> I'd had thoughts that the 10 pages wouldn't be constant, but the
> batching size would depend on the size of the relation to be scanned.
> I'd rough ideas to just try to make about 1 million batches. Something
> like batch_pages = Max(parallel_scan->phs_nblocks / 100, 1); so
> that we only take more than 1 page if there's some decent amount to
> process. We don't want to make the batches too big as we might end up
> having to wait on slow workers at the end of a scan.

I was wondering : if we keep increasing the batch size, that might
lead to I/O contention. I mean, the higher the batch size, the higher
is the chance to cause more random I/O, because all workers would be
accessing disk blocks far away from each other in parallel. So there
might be a trade off here. (it's another thing that there needs to be
I/O contention testing done, in general, for many scenarios).

I believe there are certain parallel scans (parallel bitmap heap scan
? ) where the logic to go to the next block consumes time, so more
waits consequently.

What if we supply for each worker with a sequence of blocks to be
scanned, rather than a range of blocks. Each worker would have a list
of next few blocks, say :
w1 : 1, 5, 9, 13
w2 : 2, 6, 10, 14
w3 : 3, 7, 11, 15.
w4 : .

May be the leader worker would do the accounting and store the
instructions for each of the workers at individual locations in shared
memory, so there won't be any contention while accessing them.

This may be simple/applicable for a sequential scan, but not for other
scans, some of which this may not be even possible. But basically I
was thinking of a way around to tackle shared memory contention as
well as random I/O.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers