date:20161115

Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2016-11-15 Thread Etsuro Fujita


On 2016/11/16 13:10, Ashutosh Bapat wrote:

On Wed, Nov 16, 2016 at 8:25 AM, Etsuro Fujita
 wrote:

On 2016/11/15 19:04, Rushabh Lathia wrote:

Your latest patch doesn't not get apply cleanly apply on master branch.



Did you apply the patch set in [1] (postgres-fdw-subquery-support-v4.patch
and postgres-fdw-phv-pushdown-v4.patch in this order) before applying the
latest patch?



I don't see any reason why DML/UPDATE pushdown should depend upon
subquery deparsing or least PHV patch. Combined together they can help
in more cases, but without those patches, we will be able to push-down
more stuff. Probably, we should just restrict push-down only for the
cases when above patches are not needed. That makes reviews easy. Once
those patches get committed, we may add more functionality depending
upon the status of this patch. Does that make sense?


OK, I'll extract from the patch the minimal part that wouldn't depend on 
the two patches.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Document how to set up TAP tests for Perl 5.8.8

2016-11-15 Thread Noah Misch

On Wed, Nov 16, 2016 at 12:48:03PM +0800, Craig Ringer wrote:
> --- a/src/test/perl/README
> +++ b/src/test/perl/README
> @@ -64,3 +64,20 @@ For available PostgreSQL-specific test methods and some 
> example tests read the
>  perldoc for the test modules, e.g.:
>  
>  perldoc src/test/perl/PostgresNode.pm
> +
> +Required Perl
> +-
> +
> +Tests must run on perl 5.8.8 and newer. perlbrew is a good way to obtain

Tests must run on Perl 5.8.0 and newer.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding the optional clause 'AS' in CREATE TRIGGER

2016-11-15 Thread Okano, Naoki

> But in any case it would be a serious mistake to do this without first 
> implementing CREATE OR REPLACE TRIGGER.  I think that's an entirely separate 
> proposal and you would be well advised to treat it as such.
I see. There are more problems than I expected...
Let me start with 'OR REPLACE' clause.

At least, adding only 'OR REPLACE' clause has the following advantages.
* It enables users to redefine a trigger in single command.
* The trigger can always be referenced when redefining a trigger.
  # But it is not so when using 'DROP' and 'CREATE' commands.
  It is useful when users change the function or change the condition of 'WHEN' 
clause.

Regard,
Okano Naoki
Fujitsu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Add putenv support for msvcrt from Visual Studio 2013

2016-11-15 Thread Noah Misch

On Tue, Apr 26, 2016 at 07:39:29PM +0200, Christian Ullrich wrote:
> * Christian Ullrich wrote:
> 
> >wrong even without considering the debug/release split. If we load a
> >compiled extension built with a CRT we have not seen yet, _after_ the
> >first call to pgwin32_putenv(), that module's CRT's view of its
> >environment will be frozen because we will never attempt to update it.
> 
> Four patches attached:
> 
> master --- 0001 --- 0002 --- 0003
>  \
>   `- 0004
> 
> 0001 is just some various fixes to set the stage.

Patch 1 looks good, except that it should be three patches:

- cosmetic parts: change whitespace and s/0/NULL/
- remove CloseHandle() call
- probe for debug CRT modules, not just release CRT modules

Please split along those boundaries.  I plan to back-patch all of that.  I've
seen some gettext builds mysteriously ignore "SET lc_messages = ..."; ignoring
debug CRTs could have been the cause.

> I tested this with a project
> () that contains two DLLs:

That's a pithy test; thanks for assembling it.

> Even with patch 0004, there is still a race condition between the main
> thread and a theoretical additional thread created by some other component
> that unloads some CRT between GetProcAddress() and the _putenv() call, but
> that is hardly fixable.

I think you can fix it by abandoning GetModuleHandle() in favor of
GetModuleHandleEx() + GetProcessAddress() + FreeLibrary().  I recommend also
moving the SetEnvironmentVariable() call before the putenv calls.  That way,
if a CRT loads while pgwin32_putenv() is executing, the newly-loaded CRT will
always have the latest value.  (I'm assuming CRTs populate their environment
from GetEnvironmentStrings(), because I can't think of an alternative.)

As a separate patch, I am inclined to remove the "#ifdef _MSC_VER" directive,
activating its enclosed code under all compilers.  A MinGW-built postgres.exe
has the same need to update all CRTs.

> The fact that master looks like it does means that there have not been many
> (or any) complaints about missing cross-module environment variables. If
> nobody ever needs to see a variable set elsewhere, we have a very simple
> solution: Why don't we simply throw out the whole #ifdef _MSC_VER block?

pgwin32_putenv() originated, in commit 0154345, to make "SET lc_messages =
..." effective when gettext uses a different CRT from postgres.exe.  I suspect
it also makes krb_server_keyfile effective when GSS uses a different CRT.
Those are achievements worth keeping.  I'm not surprised about the lack of
complaints, because environment variables don't often change after backend
startup.  Here are some ways one could notice the difference between master
and patches 2+3 or 2+4:

- Use shared_preload_libraries to load a module that reads LC_CTYPE or
  LC_COLLATE.  CheckMyDatabase() sets those variables subsequent to
  process_shared_preload_libraries().

- Load, at any time, a module that reads LC_MESSAGES.  There's little reason
  to read that variable outside of gettext.  A module could use a gettext DLL
  other than the postgres.exe gettext DLL, but such a module would need to
  work around pg_bindtextdomain() always using the postgres.exe gettext.

- Load, at any time, a module that itself changes environment variables, other
  than LC_MESSAGES, after backend startup.  I suspect PL/Python suffices.

Those are plausible scenarios, but they're sufficiently specialized that
problems could lie unnoticed or undiagnosed for years.  I lean against
back-patching anything from patches 2, 3 or 4.

> There is another possible fix, ugly as sin, if we really need to keep the
> whole environment update machinery *and* cannot do the full loop each time.
> Patch 0003 pins each CRT when we see it for the first time.
> GET_MODULE_HANDLE_EX_FLAG_PIN is documented as "The module stays loaded
> until the process is terminated, no matter how many times FreeLibrary is
> called", so the unload race cannot occur anymore.

I prefer the simplicity of abandoning the cache (patch 4), if it performs
decently.  Would you compare the performance of patch 1, patches 1+2+3, and
patches 1+2+4?  This should measure the right thing (after substituting
@libdir@ for your environment):

CREATE FUNCTION putenv(text)
   RETURNS void
   AS '@libdir@/regress.dll', 'regress_putenv'
   LANGUAGE C STRICT;
\timing on
SELECT count(putenv('foo=' || n)) FROM generate_series(1,100) t(n);

(I'm interested for the sake of backend startup time.  I recall nine putenv()
in every backend startup, seven in main() and two in CheckMyDatabase().)

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-15 Thread Tatsuo Ishii

> JDBC is sending "show transaction_read_only" to find whether it is master
> or not.
> Victor's patch also started with it, but later it was transformed into
> pg_is_in_recovery
> by him as it appeared more appropriate to identify the master / slave.

In my understanding pg_is_in_recovery() returns true if it's a standby
node. However, even if it returns other than true, the server is not
necessarily a primary. Even it's not configured as a streaming
replication primary, it returns other than true.

So if your intention is finding a primary, I am not sure if
pg_is_in_recovery() is the best solution.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] regression tests fails

2016-11-15 Thread Pavel Stehule

Hi

I have a repeated problem with regress tests

master, Fedora 25,

running on port 50848 with PID 5548
== creating database "regression" ==
CREATE DATABASE
ALTER DATABASE
== running regression test queries==
test ddl  ... ok
test xact ... ok
test rewrite  ... ok
test toast... ok
test permissions  ... ok
test decoding_in_xact ... ok
test decoding_into_rel... ok
test binary   ... ok
test prepared ... ok
test replorigin   ... ok
test time ... ok
test messages ... ok
test spill... FAILED
== shutting down postmaster   ==

The result is in unstable order.

Regards

Pavel


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Logical decoding timeline following take II

2016-11-15 Thread Craig Ringer

On 16 November 2016 at 12:44, Craig Ringer  wrote:

> Despite that, I've attached a revised version of the current approach
> incorporating fixes for the issues you mentioned. It's preceded by the
> patch to add an --endpos option to pg_recvlogical so that we can
> properly test the walsender interface too.

I didn't rebase the patch that made the timeline following tests use
the recvlogical support in PostgreNode.pm. Now attached.

Even if timeline following isn't accepted as-is, I'd greatly
appreciate inclusion of the first two patches as they add basic
coverage of pg_recvlogical and a helper to make using it in tests
simple and reliable.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From b1266ad1adba8619bf43d4297d1ed6392e302198 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 1 Sep 2016 12:37:40 +0800
Subject: [PATCH 1/3] Add an optional --endpos LSN argument to pg_recvlogical
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

pg_recvlogical usually just runs until cancelled or until the upstream
server disconnects. For some purposes, especially testing, it's useful
to have the ability to stop receive at a specified LSN without having
to parse the output and deal with buffering issues, etc.

Add a --endpos parameter that takes the LSN at which no further
messages should be written and receive should stop.

Craig Ringer, Álvaro Herrera
---
 doc/src/sgml/ref/pg_recvlogical.sgml   |  34 
 src/bin/pg_basebackup/pg_recvlogical.c | 145 +
 2 files changed, 164 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/ref/pg_recvlogical.sgml b/doc/src/sgml/ref/pg_recvlogical.sgml
index b35881f..d066ce8 100644
--- a/doc/src/sgml/ref/pg_recvlogical.sgml
+++ b/doc/src/sgml/ref/pg_recvlogical.sgml
@@ -38,6 +38,14 @@ PostgreSQL documentation
constraints as , plus those for logical
replication (see ).
   
+
+  
+   pg_recvlogical has no equivalent to the logical decoding
+   SQL interface's peek and get modes. It sends replay confirmations for
+   data lazily as it receives it and on clean exit. To examine pending data on
+a slot without consuming it, use
+   pg_logical_slot_peek_changes.
+  
  
 
  
@@ -155,6 +163,32 @@ PostgreSQL documentation
  
 
  
+  -E lsn
+  --endpos=lsn
+  
+   
+In --start mode, automatically stop replication
+and exit with normal exit status 0 when receiving reaches the
+specified LSN.  If specified when not in --start
+mode, an error is raised.
+   
+
+   
+If there's a record with LSN exactly equal to lsn,
+the record will be output.
+   
+
+   
+The --endpos option is not aware of transaction
+boundaries and may truncate output partway through a transaction.
+Any partially output transaction will not be consumed and will be
+replayed again when the slot is next read from. Individual messages
+are never truncated.
+   
+  
+ 
+
+ 
   --if-not-exists
   

diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index cb5f989..c700edf 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -40,6 +40,7 @@ static int	noloop = 0;
 static int	standby_message_timeout = 10 * 1000;		/* 10 sec = default */
 static int	fsync_interval = 10 * 1000; /* 10 sec = default */
 static XLogRecPtr startpos = InvalidXLogRecPtr;
+static XLogRecPtr endpos = InvalidXLogRecPtr;
 static bool do_create_slot = false;
 static bool slot_exists_ok = false;
 static bool do_start_slot = false;
@@ -63,6 +64,9 @@ static XLogRecPtr output_fsync_lsn = InvalidXLogRecPtr;
 static void usage(void);
 static void StreamLogicalLog(void);
 static void disconnect_and_exit(int code);
+static bool flushAndSendFeedback(PGconn *conn, TimestampTz *now);
+static void prepareToTerminate(PGconn *conn, XLogRecPtr endpos,
+   bool keepalive, XLogRecPtr lsn);
 
 static void
 usage(void)
@@ -81,6 +85,7 @@ usage(void)
 			 " time between fsyncs to the output file (default: %d)\n"), (fsync_interval / 1000));
 	printf(_("  --if-not-existsdo not error if slot already exists when creating a slot\n"));
 	printf(_("  -I, --startpos=LSN where in an existing slot should the streaming start\n"));
+	printf(_("  -E, --endpos=LSN   exit after receiving the specified LSN\n"));
 	printf(_("  -n, --no-loop  do not loop on connection lost\n"));
 	printf(_("  -o, --option=NAME[=VALUE]\n"
 			 " pass option NAME with optional value VALUE to the\n"
@@ -281,6 +286,7 @@ StreamLogicalLog(void)
 		int			bytes_written;
 		int64		now;
 		int			hdr_len;
+		XLogRecPtr	cur_record_lsn = InvalidXLogRecPtr;
 
 		if (copybuf != NULL)
 		{
@@ -454,6 +460,7 @@ StreamLogicalLog(void)
 			int			po

[HACKERS] Document how to set up TAP tests for Perl 5.8.8

2016-11-15 Thread Craig Ringer

I found it a real pain to set up Perl 5.8.8 to re-check recent TAP
changes against it. It turns out there's a much easier way than
messing with VMs or manually doing a source install, so I documented
it to save others the future pain.

It'd be nice to commit this at least to 9.6 and v10. Trivial one-file
docs patch. There's no README to patch in 9.4 or 9.5.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 4ea6e7bff66ac2026bbca130bd036618b4012c35 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 15 Nov 2016 15:45:41 +0800
Subject: [PATCH] Document that perl 5.8.8 is required

---
 src/test/perl/README | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/test/perl/README b/src/test/perl/README
index cfb45a1..f15e4f0 100644
--- a/src/test/perl/README
+++ b/src/test/perl/README
@@ -64,3 +64,20 @@ For available PostgreSQL-specific test methods and some example tests read the
 perldoc for the test modules, e.g.:
 
 perldoc src/test/perl/PostgresNode.pm
+
+Required Perl
+-
+
+Tests must run on perl 5.8.8 and newer. perlbrew is a good way to obtain
+such a Perl; see https://metacpan.org/pod/distribution/App-perlbrew/bin/perlbrew .
+Just install and
+
+perlbrew --force install 5.8.8
+perlbrew use 5.8.8
+perlbrew install-cpanm
+cpanm install IPC::Run
+
+then re-run configure to ensure the correct Perl is used when running tests. To verify
+that the right Perl was found:
+
+grep ^PERL= config.log
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Logical decoding timeline following take II

2016-11-15 Thread Craig Ringer

On 12 November 2016 at 23:07, Andres Freund  wrote
> On 2016-10-24 17:49:13 +0200, Petr Jelinek wrote:
>> + * Determine which timeline to read an xlog page from and set the
>> + * XLogReaderState's state->currTLI to that timeline ID.
>
> "XLogReaderState's state->currTLI" - the state's a bit redundant.

Thanks for taking a look at the patch.

Agreed re above, fixed.

You know, having returned to this work after a long break doing other
things, it's clear that so much of the same work is being done in
XLogSendPhysical(...) and walsender.c's XLogRead(...) that maybe it is
worth facing the required refactoring so that we can use that in
logical decoding from both the walsender and the SQL interface.

The approach I originally took focused above all else on being
minimally intrusive, rather than clean and clear. Maybe it's better to
tidy this up instead.

Instead of coupling timeline following in logical decoding to the
XLogReader struct and having effectively duplicated logic to that for
physical walsender, move the walsender.c globals

static TimeLineID sendTimeLine = 0;
static TimeLineID sendTimeLineNextTLI = 0;
static bool sendTimeLineIsHistoric = false;
static XLogRecPtr sendTimeLineValidUpto = InvalidXLogRecPtr;

into new strut in timeline.h and move the logic into timeline.c . So
we stop relying on so many magic globals in the walsender. Don't keep
this state as a global, instead init it in StartReplication and pass
it as a param to WalSndLoop. For logical decoding from walsender,
store the walsender's copy in the XLogReaderState. For logical
decoding from SQL, set it up in pg_logical_slot_get_changes_guts(...)
and again store it in XLogReaderState.

In the process, finally unify the two XLogRead(...) functions in
xlogutils.c and walsender.c and merge walsender's
logical_read_xlog_page(...) with xlogutils'
logical_read_xlog_page(...) . Sure, we can't rely on a normal
backend's latch being set when wal is written like we can for the
walsender, but do we have to duplicate everything else? Can always add
a miscadmin.h var like IsWALSender and test that to see if we can
expect to have our latch set when new WAL comes in and adjust our
latch wait timeout interval appropriately.

The downside is of course that it touches physical replication, unlike
the current approach which avoids touching anything that could affect
physical replication at the cost of increasing the duplication between
physical and logical replication logic.

>> + * The caller must also make sure it doesn't read past the current redo 
>> pointer
>> + * so it doesn't fail to notice that the current timeline became historical.
>> + */
>
> Not sure what that means? The redo pointer usually menas the "logical
> start" of the last checkpoint, but I don't see how thta could be meant
> here?

What I was trying to say was the current replay position on a standby. Amended.

>> +static void
>> +XLogReadDetermineTimeline(XLogReaderState *state, XLogRecPtr wantPage, 
>> uint32 wantLength)
>> +{
>> + const XLogRecPtr lastReadPage = state->readSegNo * XLogSegSize + 
>> state->readOff;
>> +
>> + elog(DEBUG4, "Determining timeline for read at %X/%X+%X",
>> + (uint32)(wantPage>>32), (uint32)wantPage, wantLength);
>
> Doing this on every single read from a page seems quite verbose.   It's
> also (like most or all the following debug elogs) violating the casing
> prescribed by the message style guidelines.

Agreed. It's unnecessary now. It's the sort of thing I'd want to keep
to have if Pg had fine grained module level log control, but we don't.

>> + /*
>> +  * If we're reading from the current timeline, it hasn't become 
>> historical
>> +  * and the page we're reading is after the last page read, we can again
>> +  * just carry on. (Seeking backwards requires a check to make sure the 
>> older
>> +  * page isn't on a prior timeline).
>> +  */
>
> How can ThisTimeLineID become historical?

It can't, right now. Though I admit I didn't realise that at the time.
Presently ThisTimeLineID is only set in the walsender by
GetStandbyFlushRecPtr as called by IdentifySystem, and in
StartReplication, but not afterwards for logical replication, so
logical rep won't notice timeline transitions when on a standby unless
a decoding session is restarted. We don't support decoding sessions in
recovery, so it can't happen yet.

It will, when we're on a cascading standby and the upstream is
promoted, once we support logical decoding on standby. As part of that
we'll need to maintain the timeline in the walsender in logical
decoding like we do in physical, and limit logical decoding to the
currently replayed position with something like walsender's
GetStandbyFlushRecPtr(). But usable form the SQL interface too, of
course.

I'm currently implementing logical decoding on standby on top of this
and the catalog_xmin feedback patch, and in the process will be adding
tests for logical decoding on a physical replica where its upstr

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-15 Thread Mithun Cy

>
> > So I am tempted to just
> > hold my nose and hard-code the SQL as JDBC is presumably already
> doing.


JDBC is sending "show transaction_read_only" to find whether it is master
or not.
Victor's patch also started with it, but later it was transformed into
pg_is_in_recovery
by him as it appeared more appropriate to identify the master / slave.

ConnectionFactoryImpl.java:685

+ private boolean isMaster(QueryExecutor queryExecutor, Logger logger)
+ throws SQLException, IOException {
+byte[][] results = SetupQueryRunner.run(queryExecutor, "show
transaction_read_only", true);
+String value = queryExecutor.getEncoding().decode(results[0]);
+return value.equalsIgnoreCase("off");
}

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2016-11-15 Thread Ashutosh Bapat

On Wed, Nov 16, 2016 at 8:25 AM, Etsuro Fujita
 wrote:
> On 2016/11/15 19:04, Rushabh Lathia wrote:
>>
>> Thanks Fujita-san for working on this. I've signed up to review this
>> patch.
>
>
> Thank you for reviewing the patch!
>
>> Your latest patch doesn't not get apply cleanly apply on master branch.
>
>
> Did you apply the patch set in [1] (postgres-fdw-subquery-support-v4.patch
> and postgres-fdw-phv-pushdown-v4.patch in this order) before applying the
> latest patch?
>

I don't see any reason why DML/UPDATE pushdown should depend upon
subquery deparsing or least PHV patch. Combined together they can help
in more cases, but without those patches, we will be able to push-down
more stuff. Probably, we should just restrict push-down only for the
cases when above patches are not needed. That makes reviews easy. Once
those patches get committed, we may add more functionality depending
upon the status of this patch. Does that make sense?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-11-15 Thread Masahiko Sawada

On Mon, Nov 14, 2016 at 5:39 PM, Masahiko Sawada  wrote:
> On Tue, Nov 8, 2016 at 10:12 PM, Michael Paquier
>  wrote:
>> On Tue, Nov 8, 2016 at 12:25 AM, Masahiko Sawada  
>> wrote:
>>> On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier
>>>  wrote:
 +   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
 +   return SyncRepGetSyncStandbysPriority(am_sync);
 +   else /* SYNC_REP_QUORUM */
 +   return SyncRepGetSyncStandbysQuorum(am_sync);
 Both routines share the same logic to detect if a WAL sender can be
 selected as a candidate for sync evaluation or not, still per the
 selection they do I agree that it is better to keep them as separate.

 +   /* In quroum method, all sync standby priorities are always 1 */
 +   if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
 +   priority = 1;
 Honestly I don't understand why you are enforcing that. Priority can
 be important for users willing to switch from ANY to FIRST to have a
 look immediately at what are the standbys that would become sync or
 potential.
>>>
>>> I thought that since all standbys appearing in s_s_names list are
>>> treated equally in quorum method, these standbys should have same
>>> priority.
>>> If these standby have different sync_priority, it looks like that
>>> master server replicates to standby server based on priority.
>>
>> No actually, because we know that they are a quorum set, and that they
>> work in the same set. The concept of priorities has no real meaning
>> for quorum as there is no ordering of the elements. Another, perhaps
>> cleaner idea may be to mark the field as NULL actually.
>
> We know that but I'm concerned it might confuse the user.
> If these priorities are the same, it can obviously imply that all of
> the standby listed in s_s_names are handled equally.
>
 It is not possible to guess from how many standbys this needs to wait
 for. One idea would be to mark the sync_state not as "quorum", but
 "quorum-N", or just add a new column to indicate how many in the set
 need to give a commit confirmation.
>>>
>>> As Simon suggested before, we could support another feature that
>>> allows the client to control the quorum number.
>>> Considering adding that feature, I thought it's better to have and
>>> control that information as a GUC parameter.
>>> Thought?
>>
>> Similarly that would be a SIGHUP parameter? Why not. Perhaps my worry
>> is not that much legitimate, users could just look at s_s_names to
>> guess how many in hte set a commit needs to wait for.
>
> It would be PGC_USRSET similar to synchronous_commit. The user can
> specify it in statement level.
>
>> +
>> +FIRST and ANY are case-insensitive word
>> +and the standby name having these words are must be double-quoted.
>> +
>> s/word/words/.
>>
>> +FIRST and ANY specify the method of
>> +that how master server controls the standby servers.
>> A little bit hard to understand, I would suggest:
>> FIRST and ANY specify the method used by the master to control the
>> standby servers.
>>
>> +The keyword FIRST, coupled with an integer
>> +number N higher-priority standbys and makes transaction commit
>> +when their WAL records are received.
>> This is unclear to me. Here is a correction:
>> The keyword FIRST, coupled with an integer N, makes transaction commit
>> wait until WAL records are received fron the N standbys with higher
>> priority number.
>>
>> +synchronous_standby_names. For example, a setting
>> +of ANY 3 (s1, s2, s3, s4) makes transaction commits
>> +wait until receiving receipts from at least any three standbys
>> +of four listed servers s1, s2, 
>> s3,
>> This could just mention WAL records instead of "receipts".
>>
>> Instead of saying "an integer number N", we could use num_sync.
>>
>> + If FIRST or ANY are not specified,
>> this parameter
>> + behaves as ANY. Note that this grammar is
>> incompatible with
>> + PostgresSQL 9.6, where no keyword specified
>> is equivalent
>> + as if FIRST was specified. In short, there is no
>> real need to
>> + specify num_sync as 
>> this
>> + behavior does not have changed, as well as it is not
>> necessary to mention
>> + pre-9.6 versions are the multi-sync grammar has been added in 9.6.
>> This paragraph could be reworked, say:
>> if FIRST or ANY are not specified this parameter behaves as if ANY is
>> used. Note that this grammar is incompatible with PostgreSQL 9.6 which
>> is the first version supporting multiple standbys with synchronous
>> replication, where no such keyword FIRST or ANY can be used. Note that
>> the grammar behaves as if FIRST is used, which is incompatible with
>> the post-9.6 behavior.
>>
>> + Synchronous state of this standby server. quorum-N
>> + , where N is the number of synchronous standbys that transactions
>> + need to

Re: [HACKERS] WIP: About CMake v2

2016-11-15 Thread Mark Kirkwood

Yeah, there seems to be a lot of these. Looking through them almost all 
concern the addition of piece of code to wrap putenv. e.g:


--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1350,7 +1350,7 @@ exec_command(const char *cmd,
char   *newval;

newval = psprintf("%s=%s", envvar, envval);
-   putenv(newval);
+   pg_putenv_proxy(newval);
success = true;

/*

Where pg_putenv_proxy either calls putenv or pgwin32_putenv (the latter 
on windows I'd guess). I wonder if this could have been avoided, since 
the original code handles this sort of thing. There are also some minor 
- and not immediately obvious - changes to a number of macros in various 
includes...If I'm feeling keen I'll experiment to see how far I can get 
without any source changes at all.



regards


Mark


On 09/11/16 08:37, Peter Eisentraut wrote:


There are a number of changes in .[ch] and .pl files that are unclear
and not explained.  Please explain them.  You can also submit separate
preliminary patches if you need to do some refactoring.  Ultimately, I
would expect this patch not to require C code changes.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2016-11-15 Thread Etsuro Fujita


On 2016/11/15 19:04, Rushabh Lathia wrote:

Thanks Fujita-san for working on this. I've signed up to review this patch.


Thank you for reviewing the patch!


Your latest patch doesn't not get apply cleanly apply on master branch.


Did you apply the patch set in [1] 
(postgres-fdw-subquery-support-v4.patch and 
postgres-fdw-phv-pushdown-v4.patch in this order) before applying the 
latest patch?


Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/11eafd10-d3f8-ac8a-b642-b0e65037c76b%40lab.ntt.co.jp





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: About CMake v2

2016-11-15 Thread Mark Kirkwood

I had a bit a play around to see if I could get this building properly 
again with v10 (i.e master). I've attached a minimal *incremental* patch 
that needs to be applied after v1. This gets everything under the src 
tree building (added the new file_utils.c and reordered some link libs 
and removed some duplicates).



I haven't made any changes with respect to it trying to detect and build 
everything. One added nit I see is that unless I'm missing something 
there appears to be no way to stop it trying to build all the 
contribs...so an option to enable/disable their build is needed.



To make it display what options there are I use:

$ mkdir build; cd build ; cmake .. -LH


And to do a build that works:

$ cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local/pgsql/10 -DWITH_PERL=OFF 
-DWITH_OPENSSL=OFF -DWITH_PYTHON=OFF



With reference to Tom's comments, I'm thinking that these should all 
default to 'OFF' plus an additional variable WITH_CONTRIB (or similar) 
should default to OFF too.



regards


Mark



On 09/11/16 08:37, Peter Eisentraut wrote:

On 9/17/16 1:21 PM, Yury Zhuravlev wrote:

Now, I published the first version of the patch.

I tried this out.  Because of some file moves in initdb and
pg_basebackup, the build fails:

[ 74%] Linking C executable initdb
  Undefined symbols for architecture x86_64:
"_fsync_pgdata", referenced from:
_main in initdb.c.o
  ld: symbol(s) not found for architecture x86_64
  collect2: error: ld returned 1 exit status
  make[2]: *** [src/bin/initdb/CMakeFiles/initdb.dir/build.make:177:
src/bin/initdb/initdb] Error 1
  make[1]: *** [CMakeFiles/Makefile2:2893:
src/bin/initdb/CMakeFiles/initdb.dir/all] Error 2
  make: *** [Makefile:128: all] Error 2

Please submit an updated patch.

I suggest you use git format-patch to produce patches.  This is easier
to apply, especially when there are a lot of new files involved.  Also
use the git facilities to check for whitespace errors.

Please supply some documentation, such as

- what are the basic commands
- how to set include/library paths, choose configure options
- how to set CFLAGS
- how to see raw build commands
- what are the targets for all/world/check/docs etc.
- explain directory structure

I suggest for now you could put this into a README.cmake file in your
patch.  We don't need to commit it that way, but it would help in the
meantime.

When I run cmake without options, it seems to do opportunistic feature
checking.  For example, it turns on OpenSSL or Python support if it can
find it, otherwise it turns it off.  We need this to be deterministic.
Without options, choose the basic feature set, require all other
features to be turned on explicitly, fail if they can't be found.
Whatever the Autoconf-based build does now has been fairly deliberately
tuned, so there should be very little reason to deviate from that.

The Python check appears to be picking up pieces from two different
Python installations:

  -- Found PythonInterp: /usr/local/bin/python (found version "2.7.12")
  -- Found PythonLibs: /usr/lib/libpython2.7.dylib (found version "2.7.10")

The check results otherwise look OK, but I'm a bit confused about the
order.  It checks for some functions before all the header files are
checked for.  Is that intentional?

There are a number of changes in .[ch] and .pl files that are unclear
and not explained.  Please explain them.  You can also submit separate
preliminary patches if you need to do some refactoring.  Ultimately, I
would expect this patch not to require C code changes.



diff --git a/src/bin/pg_archivecleanup/CMakeLists.txt b/src/bin/pg_archivecleanup/CMakeLists.txt
index 7c79ac3..06d7be4 100644
--- a/src/bin/pg_archivecleanup/CMakeLists.txt
+++ b/src/bin/pg_archivecleanup/CMakeLists.txt
@@ -7,9 +7,8 @@ include_directories(BEFORE
 add_executable(pg_archivecleanup pg_archivecleanup.c)
 
 target_link_libraries(pg_archivecleanup
-	port
-#	pq
 	pgcommon
+	port
 	${M_LIB}
 )
 
diff --git a/src/bin/pg_basebackup/CMakeLists.txt b/src/bin/pg_basebackup/CMakeLists.txt
index a985e08..a9bdd66 100644
--- a/src/bin/pg_basebackup/CMakeLists.txt
+++ b/src/bin/pg_basebackup/CMakeLists.txt
@@ -12,9 +12,12 @@ endif()
 add_library(common_basebackup STATIC
 	receivelog.c
 	streamutil.c
+	walmethods.c
 )
 
 target_link_libraries(common_basebackup
+	pgfeutils
+	pgcommon
 	port
 	pq
 )
@@ -29,12 +32,8 @@ foreach(loop_var IN ITEMS ${basebackup_list})
 	add_executable(${loop_var} ${loop_var}.c)
 
 	target_link_libraries(${loop_var}
-		pgfeutils
-		port
-		pq
-		pgcommon
-		${M_LIB}
 		common_basebackup
+		${M_LIB}
 	)
 	CMAKE_SET_TARGET_FOLDER(${loop_var} bin)
 	if(ZLIB_FOUND)
diff --git a/src/common/CMakeLists.txt b/src/common/CMakeLists.txt
index 7a24165..9338482 100644
--- a/src/common/CMakeLists.txt
+++ b/src/common/CMakeLists.txt
@@ -18,6 +18,7 @@ set(pgcommon_srv_SRCS
 set(pgcommon_SRCS
 	${pgcommon_srv_SRCS}
 	fe_memutils.c
+	file_utils.c
 	restricted_token.c
 	${PROJECT_SOURCE_DIR}/src/include/parser/gram.h
 )

[HACKERS] Use of pg_proc.probin is legal?

2016-11-15 Thread Kohei KaiGai

Hello,

I have a question as subject line.

https://www.postgresql.org/docs/devel/static/catalog-pg-proc.html
According to the documentation, the purpose of pg_proc.probin is
introduced as follows:
| Additional information about how to invoke the function. Again, the
interpretation is language-specific.

On the other hands, interpret_AS_clause() raises an ereport if SQL
function tries to use probin except
for C-language. Is it illegal for other languages to use probin field
to store something useful?

In my case, PL/CUDA language allows to define SQL function with a CUDA
code block.
It saves a raw CUDA source code on the pg_proc.prosrc and its
intermediate representation
on the pg_proc.probin; which is automatically constructed on the
validator callback of the language
handler.

I noticed the problematic scenario because pg_dump generates CREATE
FUNCTION statement
with two AS arguments, then pg_restore tries to run the statement but failed.

Solution seems to me either of them:
1. Admit CREATE FUNCTION with two AS arguments. If language does not
support is, probin is
simply ignored.
2. pg_dump does not dump pg_proc.probin if language is not C-language.
3. Update documentation to inform not to provide the probin if not C-language.

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic shared memory areas

2016-11-15 Thread Robert Haas

On Thu, Nov 10, 2016 at 12:37 AM, Thomas Munro
 wrote:
> On Tue, Nov 1, 2016 at 5:06 PM, Thomas Munro
>  wrote:
>> On Wed, Oct 5, 2016 at 11:28 PM, Thomas Munro
>>  wrote:
>>> [dsa-v3.patch]
>>
>> Here is a new version which just adds CLOBBER_FREED_MEMORY support to 
>> dsa_free.
>
> Here is a new version that fixes a bug I discovered in freepage.c today.
>
> Details:  When dsa_free decides to give back a whole superblock back
> to the free page manager for a segment with FreePageManagerPut, and
> there was already exactly one span of exactly one free page in that
> segment, and the span being 'put' is not adjacent to that existing
> free page, then the singleton format must be converted to a btree with
> the existing page as root and the newly put span as the sole leaf.
> But in that special case we forgot to add the newly put span to the
> appropriate free list.  Not only did we lose track of it, but a future
> call to FreePageManagerPut might try to merge it with another adjacent
> span, which will try to manipulate the freelist that it expects it to
> be in and blow up.  The fix is just to add a call to
> FreePagePushSpanLeader in this corner case before the early return.

Since a lot of the design of this patch is mine - from my earlier work
on sb_alloc - I don't expect to have a ton of objections to it. And
I'd like to get it committed, because other parallelism work depends
on it (bitmap heap scan and parallel hash join in particular), and
because it's got other uses as well.  However, I don't want to be
perceived as slamming my code (or that of my colleagues) into the tree
without due opportunity for other people to provide feedback, so if
anyone has questions, comments, concerns, or review to offer, please
do.

I think we should develop versions of this that (1) allocate from the
main shared memory segment and (2) allocate from backend-private
memory.  Per my previous benchmarking results, allocating from
backend-private memory would be a substantial win for tuplesort.c
because this allocator is substantially more memory-efficient for
large memory contexts than aset.c, and Tomas Vondra tested it out and
found that it is also faster for logical decoding than the approach he
proposed.  Perhaps that's not an argument for holding up his proposed
patches for that problem, but I think it IS a good argument for
pressing forward with a backend-private version of this allocator.
I'm not saying that should be part of the initial commit of this code,
but I think it's a good direction to pursue.

One question that we need to resolve is where the code should live in
the source tree.  When I wrote the original patches upon which this
work was based, I think that I put all the code in
src/backend/utils/mmgr, since it's all memory-management code.  In
this patch, Thomas left the free page manager code there, but put the
allocator itself in src/backend/storage/ipc.  There's a certain logic
to that because dynamic shared areas (dsa.c) sit right next to dynamic
shared memory (dsm.c) but it feels a bit schizophrenic to have half of
the code in storage/ipc and the other half in utils/mmgr.  I guess my
view is that utils/mmgr is a better fit, because I think that this is
basically memory management code that happens to use shared memory,
rather than basically IPC that happens to be an allocator.  If we
decide that this stuff goes in storage/ipc then that's basically
saying that everything that uses dynamic shared memory is going to end
up in that directory, which seems like a mess.  The fact that the
free-page manager, at least, could be used for allocations not based
on DSM strengthens that argument in my view. Other opinions?

The #ifdef HAVE__BUILTIN_TYPES_COMPATIBLE_P hack in relptr.h, for
which I believe I'm responsible, is ugly.  There is probably a
compiler out there that has __typeof__ but not
__builtin_types_compatible_p, and we could cater to that by adding a
separate configure test for __typeof__.  A little browsing of the
documentation on at https://gcc.gnu.org/onlinedocs/ seems to suggest
that __builtin_types_compatible_p didn't exist before GCC 3.1, but
__typeof__ seems to be there even in 2.95.3.  That's not very
interesting, honestly, because 3.1 came out in 2002, but there might
be non-GCC compilers that implement __typeof__ but not
__builtin_types_compatible_p.  I am inclined not to worry about this
unless somebody feels otherwise, but it's not beautiful.

I wonder if it wouldn't be a good idea to allow the dsa_area_control
to be stored wherever instead of insisting that it's got to be located
inside the first DSM segment backing the pool itself.  For example,
you could make dsa_create_dynamic() take a third argument which is a
pointer to enough space for a dsa_area_control, and it would
initialize it in place.  Then you could make dsa_attach_dynamic() take
a pointer to that same structure instead of taking a dsa_handle.
Actually, I think dsa_handle goes away: it becomes the caller's
responsibility to

Re: [HACKERS] Re: Do we need use more meaningful variables to replace 0 in catalog head files?

2016-11-15 Thread Tom Lane

Greg Stark  writes:
> Just throwing this out there

> It would be neat if the file format was precisely a tab or comma
> separated file suitable for loading into the appropriate table with
> COPY or loading into a spreadsheet.

Actually, I'd say that's a very accurate description of what we DO NOT
want.  That has all of the disadvantages of the DATA-line format, and
more besides, namely that we don't even get any macro-substitution-like
abilities.  The right way to think about this, IMO, is that we want to
abstract the representation as much as we easily can.  We definitely
need a concept of default values for omitted columns, and we need at least
some limited ability to insert symbolic values that will be resolved at
compile or initdb time (see FLOAT8PASSBYVAL and PGUID for existing
examples in that line).  And we want symbolic representations for OID
references, whether the associated column is declared as reg-something
or just a plain OID column.  (I don't want to end up having to invent
a reg-something type for every system catalog, so the idea that was
mentioned upthread of having that be driven off the declared column
type seems like a nonstarter to me, even if we were willing to take
the compatibility hit of changing the declared column types of a lot
of system catalog columns.)

Now, some of that could be gotten by brute force in a CSV-type file
format, but I do not see how replacing

DATA(insert OID = 1242 (  boolin   PGNSP PGUID 12 1 0 0 0 f f f 
f t f i s 1 0 16 "2275" _null_ _null_ _null_ _null_ _null_ boolin _null_ _null_ 
_null_ ));

with

1242,boolin,PGNSP,PGUID,internal,1,0,0,0,f,f,f,f,t,f,i,s,1,0,bool,{cstring},,boolin,,,

is any real improvement --- it certainly isn't making it any more readily
editable --- and replacing most of those fields with some spelling of
"default" isn't much better.

I follow the point about wishing that you could do bulk transformations in
some kind of SQL environment, but I think that direction leads to the same
sort of fruitless discussions we've had about adopting tooling for images
in the SGML docs.  Namely that any tooling you do like that is probably
going to have a hard time producing reproducible reductions to text form,
which is going to create issues when reviewing and when tracking git
changes.  I think our reference representation needs to be what's in git,
not some theoretically-equivalent form in a database somewhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Do we need use more meaningful variables to replace 0 in catalog head files?

2016-11-15 Thread Greg Stark

On Tue, Nov 15, 2016 at 4:50 PM, Robert Haas  wrote:
> On Sun, Nov 13, 2016 at 9:48 AM, Andrew Dunstan  wrote:
>> I'm not convinced the line prefix part is necessary, though. What I'm
>> thinking of is something like this:
>>
>> PROCDATA( oid=1242 name=boolin isstrict=t volatile=i parallel=s nargs=1
>> rettype=bool argtypes="cstring" src=boolin );
>
> I liked Tom's format a lot better.  If we put this in a separate file
> rather than in the header, which I favor, the PROCDATA stuff is just
> noise.  On the other hand, having the name as the first thing on the
> line seems *excellent* for readability.

Just throwing this out there

It would be neat if the file format was precisely a tab or comma
separated file suitable for loading into the appropriate table with
COPY or loading into a spreadsheet. Then we might be able to maintain
it by editing the table using SQL updates and/or other tools without
having to teach them a particular input format.

The trick would then be to have a preprocessing step in the build
which loaded the CSV/TSV files into hash tables and replaced all the
strings or other tokens with OIDs and magic values.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-11-15 Thread Michael Paquier

On Tue, Nov 15, 2016 at 12:40 PM, Robert Haas  wrote:
> On Tue, Nov 15, 2016 at 2:24 PM, Michael Paquier
>  wrote:
>> How do you plug in that with OpenSSL? Are you suggesting to use a set
>> of undef definitions in the new header in the same way as pgcrypto is
>> doing, which is rather ugly? Because that's what the deal is about in
>> this patch.
>
> Perhaps that justifies renaming them -- although I would think the
> fact that they are static would prevent conflicts -- but why reorder
> them and change variable names?

Yeah... Perhaps I should not have done that, which was just for
consistency's sake, and even if the new reordering makes more sense
actually...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-15 Thread Karl O. Pinc

On Mon, 7 Nov 2016 23:29:28 +0100
Gilles Darold  wrote:

> Here is the v13 of the patch,

Just to keep things up to date...  Attached are 3 re-worked patches
that I see submitting along with the pg_current_logfile patch.
They apply on top of v13.

(I still have one more I'm working on to ensure that current_logfiles
always has valid content should an attempt to open it returns ENFILE
or EMFILE.)


patch_pg_current_logfile-v13.diff.writeonce

Writes the current_logfiles file only once when rotating both syslog
and csvlog logfiles. 

This patch pushes bool types onto the stack instead of using int.
This is a practice which is not optimal given traditional instruction
sets and compilers.  I used bool types out of clarity.  If the
compiler does not optimize the problem away it won't impact
performance anyway because the code is not run that often.
If you find this ugly change the bools to ints.

I believe this patch also fixes a bug where, when both syslog and
csvlog are on and the syslog is rotated due to size but the csvlog is
not rotated (for whatever reason), then the syslog filename fails to
be updated in the current_logfiles file content.


patch_pg_current_logfile-v13.diff.abstract_guc_part1

Creates symbols for some GUC values which appear in multiple
places in the code.


patch_pg_current_logfile-v13.diff.abstract_guc_part2

Uses symbols for some GUC values in the pg_current_logfile
patch.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


patch_pg_current_logfile-v13.diff.writeonce
Description: Binary data


patch_pg_current_logfile-v13.diff.abstract_guc_part1
Description: Binary data


patch_pg_current_logfile-v13.diff.abstract_guc_part2
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-15 Thread Karl O. Pinc

On Sat, 12 Nov 2016 12:59:46 -0600
"Karl O. Pinc"  wrote:

> On Mon, 7 Nov 2016 23:29:28 +0100
> Gilles Darold  wrote:
> > Here is the v13 of the patch,  

> >   - Do not write current_logfiles when log_collector is activated
> > but log_destination doesn't contained stderr or csvlog. This was
> > creating an empty file that can confuse the user.  
> 
> Good catch.

I don't see a good reason to go past 80 characters on a line here.
Patch attached to be applied on top of v13.

Whether to write current_logfiles only when there's a stderr or csvlog
seems dependent up on whether the current_logfiles file is for human
or program use.  If for humans, don't write it unless it has content.
If for programs, why make the program always have to check to see
if the file exists before reading it?  Failing to check is just one
more cause of bugs.

A casual read of the v13 code does not show me that the current_logfiles
file is deleted when the config file is changed so that stderr and
csvlog is no longer output and the user sends a SIGHUP.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

patch_pg_current_logfile-v13.diff.80char
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something is broken about connection startup

2016-11-15 Thread Tom Lane

Robert Haas  writes:
> On Mon, Nov 14, 2016 at 4:42 PM, Tom Lane  wrote:
>> ... I'm inclined to hold my nose and stick a call into step (1) of the
>> main loop instead.

> Seems like a good idea.

>> Also, wherever we end up putting those calls, is it worth providing a
>> variant invalidation function that only kills the catalog snapshot when
>> it's the only one outstanding?  (If it isn't, the transaction snapshot
>> should be older, so there's no chance of advancing our xmin by killing
>> it.)  In principle this would save some catalog snapshot rebuilds for
>> inside-a-transaction-block cases, but I'm not sure it's worth sweating
>> that when we're doing client message exchange anyway.

> I think that would be a fairly worthwhile thing to do.

>> Lastly, I find myself disliking the separate CatalogSnapshotStale flag
>> variable.  The other special snapshots in snapmgr.c are managed by setting
>> the pointer to NULL when it's not valid, so I wonder why CatalogSnapshot
>> wasn't done that way.  Since this patch is touching almost every use of
>> that flag already, it wouldn't take much to switch it over.

> I think I had some reason why I did it that way, but I don't think it
> was anything important, so I don't object to you revising it.

I've pushed a patch incorporating those changes.  I can no longer
reproduce piculet's failure on branches back to 9.4.  Still need to
look at the secondary issues I mentioned upthread.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel execution and prepared statements

2016-11-15 Thread Tobias Bussmann

As the patch in [1] targeting the execution of the plan in ExecutePlan 
depending on the destination was declined, I hacked around a bit to find 
another way to use parallel mode with SQL prepared statements while disabling 
the parallel execution in case of an non read-only execution. For this I used 
the already present test for an existing intoClause in ExecuteQuery to set the 
parallelModeNeeded flag of the prepared statement. This results in a non 
parallel execution of the parallel plan, as we see with a non-zero fetch count 
used with the extended query protocol. Despite this patch seem to work in my 
tests, I'm by no means confident this being a proper way of handling the 
situation in question.

Best 
Tobias 

[1] 
https://www.postgresql.org/message-id/CAA4eK1KxiYm8F9Pe9xvqzoZocK43w%3DTRPUNHZpe_iOjF%3Dr%2B_Vw%40mail.gmail.com



prepared_stmt_execute_parallel_query.patch
Description: Binary data




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-11-15 Thread Robert Haas

On Tue, Nov 15, 2016 at 2:24 PM, Michael Paquier
 wrote:
> How do you plug in that with OpenSSL? Are you suggesting to use a set
> of undef definitions in the new header in the same way as pgcrypto is
> doing, which is rather ugly? Because that's what the deal is about in
> this patch.

Perhaps that justifies renaming them -- although I would think the
fact that they are static would prevent conflicts -- but why reorder
them and change variable names?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SERIALIZABLE on standby servers

2016-11-15 Thread Thomas Munro

On Tue, Nov 8, 2016 at 5:56 PM, Thomas Munro
 wrote:
> Here is an experimental WIP patch to allow SERIALIZABLE READ ONLY
> DEFERRABLE transactions on standby servers without serialisation
> anomalies, based loosely on an old email from Kevin Grittner[1].  I'm
> not sure how far this is from what he had in mind or whether I've
> misunderstood something fundamental here, but I hope this can at least
> serve as a starting point and we can try to get something into
> Postgres 10.

While out walking I realised what was wrong with that.  It's going to
take me a while to find the time to get back to this, so I figured I
should share this realisation in case anyone else is interested in the
topic.

The problem is that it determines snapshot safety in
PreCommit_CheckForSerializationFailure, and then races other backends
to XactLogCommitRecord.  It could determine that a hypothetical
snapshot taken after this commit is safe, but then other activity
resulting in a hypothetical snapshot of unknown safety could happen
and be logged before we record our determination in the log.

One solution could be to serialise XactLogCommitRecord for SSI
transactions using SerializableXactHashLock, and determine
hypothetical snapshot safety at the same time, so that commit replay
order matches safety determination order.  But it would suck to add
another point of lock contention to SSI commits.  Another solution
could be to have recovery on the standby detect tokens (CSNs
incremented by PreCommit_CheckForSerializationFailure) arriving out of
order, but I don't know what exactly it should do about that when it
is detected: you shouldn't respect an out-of-order claim of safety,
but then what should you wait for?  Perhaps if the last replayed
commit record before that was marked SNAPSHOT_SAFE then it's OK to
leave it that way, and if it was marked SNAPSHOT_SAFETY_UNKNOWN then
you have to wait for that one to be resolved by a follow-up snapshot
safety message and then rince-and-repeat (take a new snapshot etc).  I
think that might work, but it seems strange to allow random races on
the primary to create extra delays on the standby.  Perhaps there is
some much simpler way to do all this that I'm missing.

Another detail is that standbys that start up from a checkpoint and
don't see any SSI transactions commit don't yet have any snapshot
safety information, but defaulting to assuming that this point is safe
doesn't seem right, so I suspect it needs to be in checkpoints.

Attached is a tidied up version which doesn't try to address the above
problems yet.  When time permits I'll come back to this.

-- 
Thomas Munro
http://www.enterprisedb.com

ssi-standby-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Brad DeJong

Magnus wrote:
> Just to be clear, you're suggesting 'One or more rows may have already been 
> removed from "%s"?

Perhaps just 'This query attempted to access a page in "%s" that was modified 
after the snapshot was acquired.'

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Alvaro Herrera

Magnus Hagander wrote:
> On Tue, Nov 15, 2016 at 8:22 PM, Robert Haas  wrote:
> 
> > On Tue, Nov 15, 2016 at 2:21 PM, Kevin Grittner  wrote:

> > > That particular language would be misleading.  All we know about
> > > the page is that it was modified since the referencing (old)
> > > snapshot was taken.  We don't don't know in what way it was
> > > modified, so we must assume that it *might* have been pruned of
> > > rows that the snapshot should still be able to see.
> >
> > Oh, yeah.  So maybe "may have already been removed".
> 
> Just to be clear, you're suggesting 'One or more rows may have already been
> removed from "%s"?

Focusing on the relation itself for a second, I think the name should be
schema-qualified.  What about using errtable()?

Can this happen for relation types other than tables, say materialized
views?  (Your suggested wording omits relation type so it wouldn't be
affected, but it's worth considering I think.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)

2016-11-15 Thread Andres Freund

Hi Dilip,

thanks for noticing that one.

On 2016-11-09 21:17:22 +0530, Dilip Kumar wrote:
> While testing bitmap performance, I have observed that some of the
> TPCH queries taking huge time in BitmapIndexScan node, when there are
> lossy pages.

It's not really related to lossy pages, it's just that due to deletions
/ insertions a lot more "shapes" of the hashtable are hit.

I suspect that this is with parallelism disabled? Without that the query
ends up using a parallel sequential scan for me.

I've a working fix for this, and for a similar issue Robert found. I'm
still playing around with it, but basically the fix is to make the
growth policy a bit more adaptive.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Magnus Hagander

On Tue, Nov 15, 2016 at 8:22 PM, Robert Haas  wrote:

> On Tue, Nov 15, 2016 at 2:21 PM, Kevin Grittner  wrote:
> > On Tue, Nov 15, 2016 at 12:43 PM, Robert Haas 
> wrote:
> >
> >> I think it would be better not to include either the snapshot or the
> >> block number, and just find some way to reword the error message so
> >> that it mentions which relation was involved without implying that all
> >> access to the relation would necessarily fail.  For example:
> >>
> >> ERROR: snapshot too old
> >> DETAIL: One or more rows required by this query have already been
> >> removed from "%s".
> >
> > That particular language would be misleading.  All we know about
> > the page is that it was modified since the referencing (old)
> > snapshot was taken.  We don't don't know in what way it was
> > modified, so we must assume that it *might* have been pruned of
> > rows that the snapshot should still be able to see.
>
> Oh, yeah.  So maybe "may have already been removed".
>

Just to be clear, you're suggesting 'One or more rows may have already been
removed from "%s"?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-11-15 Thread Michael Paquier

On Tue, Nov 15, 2016 at 10:40 AM, Robert Haas  wrote:
> On Fri, Nov 4, 2016 at 11:58 AM, Peter Eisentraut
>  wrote:
>> The organization of these patches makes sense to me.
>>
>> On 10/20/16 1:14 AM, Michael Paquier wrote:
>>> - 0001, moving all the SHA2 functions to src/common/ and introducing a
>>> PG-like interface. No actual changes here.
>>
>> That's probably alright, although the patch contains a lot more changes
>> than I would imagine for a simple file move.  I'll still have to review
>> that in detail.
>
> Even with git diff -M, reviewing 0001 is very difficult.  It does
> things that are considerably in excess of what is needed to move these
> files from point A to point B, such as:
>
> - Renaming static functions to have a "pg" prefix.
> - Changing the order of the functions in the file.
> - Renaming an argument called "context" to "cxt".
>
> I think that is a bad plan.  I think we should insist that 0001
> content itself with a minimal move of the files changing no more than
> is absolutely necessary.  If refactoring is needed, those changes can
> be submitted separately, which will be much easier to review.  My
> preliminary judgement is that most of this change is pointless and
> should be reverted.

How do you plug in that with OpenSSL? Are you suggesting to use a set
of undef definitions in the new header in the same way as pgcrypto is
doing, which is rather ugly? Because that's what the deal is about in
this patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Robert Haas

On Tue, Nov 15, 2016 at 2:21 PM, Kevin Grittner  wrote:
> On Tue, Nov 15, 2016 at 12:43 PM, Robert Haas  wrote:
>
>> I think it would be better not to include either the snapshot or the
>> block number, and just find some way to reword the error message so
>> that it mentions which relation was involved without implying that all
>> access to the relation would necessarily fail.  For example:
>>
>> ERROR: snapshot too old
>> DETAIL: One or more rows required by this query have already been
>> removed from "%s".
>
> That particular language would be misleading.  All we know about
> the page is that it was modified since the referencing (old)
> snapshot was taken.  We don't don't know in what way it was
> modified, so we must assume that it *might* have been pruned of
> rows that the snapshot should still be able to see.

Oh, yeah.  So maybe "may have already been removed".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-11-15 Thread Robert Haas

On Tue, Nov 15, 2016 at 5:30 AM, Amit Langote
 wrote:
> On 2016/11/11 19:30, Amit Langote wrote:
>>
>> Attached updated patches.
>
> Here is the latest version of the patches with some fixes along with those
> mentioned below (mostly in 0003):
>
> - Fixed the logic to skip the attach partition validation scan such that
>   it won't skip scanning a list partition *that doesn't accept NULLs* if
>   the partition key column is not set NOT NULL (it similarly doesn't skip
>   scanning a range partition if either of the partition key columns is not
>   set NOT NULL, because a range partition key cannot contain NULLs at all)
>
> - Added some more regression tests for ATTACH PARTITION command
>
> - Some fixes to documentation and source code comments

Have you done any performance testing on the tuple routing code?
Suppose we insert a million (or 10 million) tuples into an
unpartitioned table, a table with 10 partitions, a table with 100
partitions, a table with 1000 partitions, and a table that is
partitioned into 10 partitions each of which has 10 subpartitions.
Ideally, the partitioned cases would run almost as fast as the
unpartitioned case, but probably there will be some overhead.
However, it would be useful to know how much.  Also, it would be
useful to set up the same cases with inheritance using a PL/pgsql ON
INSERT trigger for tuple routing and compare.  Hopefully the tuple
routing code is far faster than a trigger, but we should make sure
that's the case and look for optimizations if not.  Also, it would be
useful to know how much slower the tuple-mapping-required case is than
the no-tuple-mapping-required case.

I think the comments in some of the later patches could use some work
yet.  For example, in 0007, FormPartitionKeyDatum()'s header comment
is largely uninformative, get_partition_for_tuple()'s header comment
doesn't explain what the return value means in the non-failure case,
and ExecFindPartition() doesn't have a header comment at all.

The number of places in these patches where you end up reopening a
hopefully-already-locked relation with NoLock (or sometimes with
AccessShareLock) is worrying to me.  I think that's a coding pattern
we should be seeking to avoid; every one of those is not only a hazard
(what if we reach that point in the code without a lock?) but a
possible performance issue (we have to look up the OID in the
backend-private hash table; and if you are passing AccessShareLock
then you might also hit the lock manager which could be slow or create
deadlock hazards).  It would be much better to pass the Relation
around rather than the OID whenever possible.

Also, in 0006:

- I doubt that PartitionTreeNodeData's header comment will survive
contact with pgindent.

In 0007:

- oid_is_foreign_table could/should do a syscache lookup instead of
opening the heap relation.  But actually I think you could drop it
altogether and use get_rel_relkind().

- The XXX in EndCopy will need to be fixed somehow.

- I suspect that many of the pfree's in this patch are pointless
because the contexts in which those allocations are performed will be
reset or deleted shortly afterwards anyway.  Only pfree data that
might otherwise live substantially longer than we want, because pfree
consumes some cycles.

- The code in make_modifytable() to swap out the rte_array for a fake
one looks like an unacceptable kludge.  I don't know offhand what a
better design would look like, but what you've got is really ugly.

- I don't understand how it can be right for get_partition_for_tuple()
to be emitting an error that says "range partition key contains null".
A defective partition key should be detected at partition creation
time, not in the middle of tuple routing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Kevin Grittner

On Tue, Nov 15, 2016 at 12:43 PM, Robert Haas  wrote:

> I think it would be better not to include either the snapshot or the
> block number, and just find some way to reword the error message so
> that it mentions which relation was involved without implying that all
> access to the relation would necessarily fail.  For example:
>
> ERROR: snapshot too old
> DETAIL: One or more rows required by this query have already been
> removed from "%s".

That particular language would be misleading.  All we know about
the page is that it was modified since the referencing (old)
snapshot was taken.  We don't don't know in what way it was
modified, so we must assume that it *might* have been pruned of
rows that the snapshot should still be able to see.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Magnus Hagander

On Tue, Nov 15, 2016 at 7:43 PM, Robert Haas  wrote:

> On Tue, Nov 15, 2016 at 1:30 PM, Magnus Hagander 
> wrote:
> > On Tue, Nov 15, 2016 at 7:27 PM, Robert Haas 
> wrote:
> >>
> >> On Tue, Nov 15, 2016 at 1:18 PM, Magnus Hagander 
> >> wrote:
> >> > Is there value in showing which snapshot as well? Something like:
> >> > DETAIL: snapshot  is too old to access relation 
> >>
> >> IIUC, the granularity is per-block, not per-relation, so that might be
> >> misleading.
> >
> > Does it help to let the user know which number? I'm not really sure what
> I'd
> > do with that information, whereas knowing the relation would be very
> useful.
> >
> > But we could certainly say "snapshot  is too old to access block
> 
> > of relation ".
>
> I think it would be better not to include either the snapshot or the
> block number, and just find some way to reword the error message so
> that it mentions which relation was involved without implying that all
> access to the relation would necessarily fail.  For example:
>
> ERROR: snapshot too old
> DETAIL: One or more rows required by this query have already been
> removed from "%s".
>
>
Sounds good to me. I've only really found use for the relation name so far
:)

How about the attached?

And if ppl are OK with it, thoughts on backpatching to 9.6?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 58b0a97..04e6450 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4336,5 +4336,6 @@ TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
 		&& (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
 		ereport(ERROR,
 (errcode(ERRCODE_SNAPSHOT_TOO_OLD),
- errmsg("snapshot too old")));
+ errmsg("snapshot too old"),
+ errdetail("One or more rows required by this query have already been removed from \"%s\"", RelationGetRelationName(relation;
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Kevin Grittner

On Tue, Nov 15, 2016 at 12:45 PM, Tom Lane  wrote:
> Magnus Hagander  writes:
>> Is there value in showing which snapshot as well? Something like:
>> DETAIL: snapshot  is too old to access relation 
>
> Snapshots don't have names, and I can't think of a useful way of
> identifying them to users.

The time that the snapshot was taken might be useful; I can't think
of anything else that would be.  The main thing would be to include
the relation, with an indication that it's not necessarily true
that access to *any* part of the relation would be a problem.
Mentioning the block number would tend to support that idea, and
might be of some use.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Tom Lane

Magnus Hagander  writes:
> Is there value in showing which snapshot as well? Something like:
> DETAIL: snapshot  is too old to access relation 

Snapshots don't have names, and I can't think of a useful way of
identifying them to users.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Robert Haas

On Tue, Nov 15, 2016 at 1:30 PM, Magnus Hagander  wrote:
> On Tue, Nov 15, 2016 at 7:27 PM, Robert Haas  wrote:
>>
>> On Tue, Nov 15, 2016 at 1:18 PM, Magnus Hagander 
>> wrote:
>> > Is there value in showing which snapshot as well? Something like:
>> > DETAIL: snapshot  is too old to access relation 
>>
>> IIUC, the granularity is per-block, not per-relation, so that might be
>> misleading.
>
> Does it help to let the user know which number? I'm not really sure what I'd
> do with that information, whereas knowing the relation would be very useful.
>
> But we could certainly say "snapshot  is too old to access block 
> of relation ".

I think it would be better not to include either the snapshot or the
block number, and just find some way to reword the error message so
that it mentions which relation was involved without implying that all
access to the relation would necessarily fail.  For example:

ERROR: snapshot too old
DETAIL: One or more rows required by this query have already been
removed from "%s".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-11-15 Thread Robert Haas

On Fri, Nov 4, 2016 at 11:58 AM, Peter Eisentraut
 wrote:
> The organization of these patches makes sense to me.
>
> On 10/20/16 1:14 AM, Michael Paquier wrote:
>> - 0001, moving all the SHA2 functions to src/common/ and introducing a
>> PG-like interface. No actual changes here.
>
> That's probably alright, although the patch contains a lot more changes
> than I would imagine for a simple file move.  I'll still have to review
> that in detail.

Even with git diff -M, reviewing 0001 is very difficult.  It does
things that are considerably in excess of what is needed to move these
files from point A to point B, such as:

- Renaming static functions to have a "pg" prefix.
- Changing the order of the functions in the file.
- Renaming an argument called "context" to "cxt".

I think that is a bad plan.  I think we should insist that 0001
content itself with a minimal move of the files changing no more than
is absolutely necessary.  If refactoring is needed, those changes can
be submitted separately, which will be much easier to review.  My
preliminary judgement is that most of this change is pointless and
should be reverted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Brad DeJong

On Tue, Nov 15, 2016 at 12:20 PM Magnus Hagander wrote:
> Is there value in showing the snapshot as well?

I don't think so. Knowing the relname let's you look at your report/job and 
figure out if the access to that relation can be moved. Having the exact 
snapshot version isn't going to change that. And if you're doing auto-retry, 
you probably only care if it is happening on the same relation consistently.

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Magnus Hagander

On Tue, Nov 15, 2016 at 7:27 PM, Robert Haas  wrote:

> On Tue, Nov 15, 2016 at 1:18 PM, Magnus Hagander 
> wrote:
> > Is there value in showing which snapshot as well? Something like:
> > DETAIL: snapshot  is too old to access relation 
>
> IIUC, the granularity is per-block, not per-relation, so that might be
> misleading.
>

Does it help to let the user know which number? I'm not really sure what
I'd do with that information, whereas knowing the relation would be very
useful.

But we could certainly say "snapshot  is too old to access block 
of relation ".

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-15 Thread Robert Haas

On Tue, Nov 15, 2016 at 12:53 PM, Catalin Iacob  wrote:
> On Tue, Nov 15, 2016 at 5:58 PM, Robert Haas  wrote:
>> On Tue, Nov 15, 2016 at 9:42 AM, Alvaro Herrera
>>  wrote:
>>> I would rather come up with something that works in both cases that we
>>> can extend internally later, say pg_is_primary_node() or something like
>>> that instead; and we implement it initially by returning the inverse of
>>> pg_is_in_recovery() for replicated non-logical flocks, while we figure
>>> out what to do in logical replication.  Otherwise it will be harder to
>>> change later if we embed it in libpq, and we may be forced into
>>> supporting nonsensical situations such as having pg_is_in_recovery()
>>> return true for logical replication primary nodes.
>>
>> I don't think we'll be backed into a corner like that, because we can
>> always make this contingent on server version.  libpq will have that
>> available.
>
> But even with server version checking code, that code will be inside
> libpq so there will be old libpq versions in the field that won't know
> the proper query to send to new server versions.

Good point.  pg_is_writable_node() sounds good, then, and we can still
send pg_is_in_recovery() if we're connected to a pre-10 version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Robert Haas

On Tue, Nov 15, 2016 at 1:18 PM, Magnus Hagander  wrote:
> Is there value in showing which snapshot as well? Something like:
> DETAIL: snapshot  is too old to access relation 

IIUC, the granularity is per-block, not per-relation, so that might be
misleading.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Brad DeJong

On Tue, Nov 15, 2016 at 10:30 AM, Tom Lane wrote:
> 
> Kevin Grittner  writes:
> > On Tue, Nov 15, 2016 at 3:38 AM, Magnus Hagander 
> wrote:
> >> Is there a reason why we don't log which relation triggered the
> >> snapshot too old error when it happens?
> 
> > I would probably not want to mess with the text of the error itself,
> > in case any client-side software bases recovery on that rather than
> > the SQLSTATE value;
> 
> Any such code is broken on its face because of localization.
> Perhaps including the relname in the main message would make it unduly
> long, but if not I'd vote for doing it that way.

+1 for relname in the main message.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Magnus Hagander

On Tue, Nov 15, 2016 at 5:29 PM, Tom Lane  wrote:

> Kevin Grittner  writes:
> > On Tue, Nov 15, 2016 at 3:38 AM, Magnus Hagander 
> wrote:
> >> Is there a reason why we don't log which relation triggered the
> snapshot too
> >> old error when it happens?
>
> > I would probably not want to mess with the text of the error
> > itself, in case any client-side software bases recovery on that
> > rather than the SQLSTATE value;
>
> Any such code is broken on its face because of localization.
> Perhaps including the relname in the main message would make it
> unduly long, but if not I'd vote for doing it that way.
>
>
Agreed.

Is there value in showing which snapshot as well? Something like:
DETAIL: snapshot  is too old to access relation 

Putting both those into the main message will probably make it too long.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-15 Thread Alvaro Herrera

Catalin Iacob wrote:
> On Tue, Nov 15, 2016 at 3:42 PM, Alvaro Herrera
>  wrote:
> > FWIW I'm not sure "primary" is the right term here either.  I think what
> > we really want to know is whether the node can accept writes; maybe
> > "pg_is_writable_node".
> 
> This made me think of another complication: what about cascading
> replication where the middle node is both a master and a slave? You'd
> probably not want to fallback to the middle node even though it is a
> master for its downstream.

Exactly my thought, hence the above suggestion.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw and defaults

2016-11-15 Thread Andrew Dunstan




On 11/15/2016 10:49 AM, Tom Lane wrote:

Andrew Dunstan  writes:

I know we've discussed this before, but I have just had the unpleasant
experience of trying to get around the difficulty of inserting into a
foreign table with a serial field, surely a common enough scenario that
we should try to deal with it better. The solution of using a local
sequence really doesn't work, as there might be multiple users of the
table, as there will be in my scenario. I opted instead to get a value
from the foreign sequence explicitly before inserting, but that's pretty
ugly. So I am wondering (without having looked at all closely at it) if
we could set an option to tell the FDW that we want the foreign default
to be used instead of a local one. Is the difficulty that we don't know
if a value has been explicitly supplied or not? Maybe we could have some
magic value that we could use instead ('foreign_default'?). I'm just
throwing out ideas here, but this is really a wart that could well do
with attention.

I'm not awake enough to recall the previous discussions of remote
default-value insertion in any detail, but they were extensive, and
no one has proposed solutions to the problems we hit.  Please consult
the archives.



I will look back further, But I see in 2013 Stephen said this:


At first blush, with 'simple' writable views, perhaps that can just be a
view definition on the remote side which doesn't include that column and
therefore that column won't be sent to the remote side explicitly but,
but the view, running on the remote, would turn around and pick up the
default value for any fields which aren't in the view definition when
inserting into the table underneath.



and you replied


Yeah, I think the possibility of such a workaround was one of the
reasons we decided it was okay to support only locally-computed
defaults for now.



The trouble in my case is I actually need to know the serial column 
value, so using a view that hides it doesn't work.




cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-15 Thread Catalin Iacob

On Tue, Nov 15, 2016 at 3:42 PM, Alvaro Herrera
 wrote:
> FWIW I'm not sure "primary" is the right term here either.  I think what
> we really want to know is whether the node can accept writes; maybe
> "pg_is_writable_node".

This made me think of another complication: what about cascading
replication where the middle node is both a master and a slave? You'd
probably not want to fallback to the middle node even though it is a
master for its downstream.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-15 Thread Catalin Iacob

On Tue, Nov 15, 2016 at 5:58 PM, Robert Haas  wrote:
> On Tue, Nov 15, 2016 at 9:42 AM, Alvaro Herrera
>  wrote:
>> I would rather come up with something that works in both cases that we
>> can extend internally later, say pg_is_primary_node() or something like
>> that instead; and we implement it initially by returning the inverse of
>> pg_is_in_recovery() for replicated non-logical flocks, while we figure
>> out what to do in logical replication.  Otherwise it will be harder to
>> change later if we embed it in libpq, and we may be forced into
>> supporting nonsensical situations such as having pg_is_in_recovery()
>> return true for logical replication primary nodes.
>
> I don't think we'll be backed into a corner like that, because we can
> always make this contingent on server version.  libpq will have that
> available.

But even with server version checking code, that code will be inside
libpq so there will be old libpq versions in the field that won't know
the proper query to send to new server versions.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel execution and prepared statements

2016-11-15 Thread Tobias Bussmann

Thanks Laurenz, for your help with debugging on this topic. I was preparing a 
message to the list myself and found an earlier discussion [1] on the topic. If 
I understand it correctly, the issue is with CREATE TABLE ... AS EXECUTE  
So it seems parallel execution of prepared statements was intentionally 
disabled in 7bea19d. However, what is missing is at least a mention of that 
current limitation in the 9.6 docs at "When Can Parallel Query Be Used?" [2] 

Best regards,
Tobias

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoaxyXVr6WPDvPQduQpFhD9VRWExXU7axhDpJ7jZBvqxfQ%40mail.gmail.com
[2] 
https://www.postgresql.org/docs/current/static/when-can-parallel-query-be-used.html

> Am 15.11.2016 um 16:41 schrieb Albe Laurenz :
> 
> Tobias Bussmann has discovered an oddity with prepared statements.
> 
> Parallel scan is used with prepared statements, but only if they have
> been created with protocol V3 "Parse".
> If a prepared statement has been prepared with the SQL statement PREPARE,
> it will never use a parallel scan.
> 
> I guess that is an oversight in commit 57a6a72b, right?
> PrepareQuery in commands/prepare.c should call CompleteCachedPlan
> with cursor options CURSOR_OPT_PARALLEL_OK, just like
> exec_prepare_message in tcop/postgres.c does.
> 
> The attached patch fixes the problem for me.
> 
> Yours,
> Laurenz Albe
> <0001-Consider-parallel-plans-for-statements-prepared-with.patch>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-15 Thread Michael Paquier

On Tue, Nov 15, 2016 at 7:50 AM, Kuntal Ghosh
 wrote:
> I've modified the guc parameter name as wal_consistency_check (little
> hesitant for a participle in suffix :) ). Also, updated the sgml and
> variable name accordingly.

The changes look good to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-15 Thread Robert Haas

On Tue, Nov 15, 2016 at 9:42 AM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Mon, Nov 14, 2016 at 2:38 AM, Mithun Cy  
>> wrote:
>> > I have not tested with logical replication. Currently we identify the
>> > primary to connect based on result of "SELECT pg_is_in_recovery()". So I
>> > think it works. Do you want me test a particular setup?
>>
>> If logical replication is in use, none of the servers involved would
>> be in recovery.  I'm not sure what command would need to be used to
>> assess whether we've got a master or a standby, but probably not that
>> one.  This gets at one of my earlier complaints about this part of the
>> functionality, which is that hardcoding that particular SQL statement
>> into libpq seems like a giant hack.  However, I'm not sure what to do
>> about it.  The functionality is clearly useful, because JDBC has it,
>> and Victor proposed this patch to add it to libpq, and - totally
>> independently of any of that - EnterpriseDB has a customer who has
>> requested libpq support for this as well.  So I am tempted to just
>> hold my nose and hard-code the SQL as JDBC is presumably already
>> doing.  If we figure out what the equivalent for logical replication
>> would be we can add something to cover that case, too.  It's ugly, but
>> I don't have a better idea, and I think there's value in being
>> compatible with what JDBC has already done (even if it's not what we
>> would have chosen to do tabula rasa).
>
> I would rather come up with something that works in both cases that we
> can extend internally later, say pg_is_primary_node() or something like
> that instead; and we implement it initially by returning the inverse of
> pg_is_in_recovery() for replicated non-logical flocks, while we figure
> out what to do in logical replication.  Otherwise it will be harder to
> change later if we embed it in libpq, and we may be forced into
> supporting nonsensical situations such as having pg_is_in_recovery()
> return true for logical replication primary nodes.

I don't think we'll be backed into a corner like that, because we can
always make this contingent on server version.  libpq will have that
available.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?

2016-11-15 Thread Robert Haas

On Sun, Nov 13, 2016 at 9:48 AM, Andrew Dunstan  wrote:
> I'm not convinced the line prefix part is necessary, though. What I'm
> thinking of is something like this:
>
> PROCDATA( oid=1242 name=boolin isstrict=t volatile=i parallel=s nargs=1
> rettype=bool argtypes="cstring" src=boolin );

I liked Tom's format a lot better.  If we put this in a separate file
rather than in the header, which I favor, the PROCDATA stuff is just
noise.  On the other hand, having the name as the first thing on the
line seems *excellent* for readability.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: psql \setfileref

2016-11-15 Thread Pavel Stehule

2016-11-15 17:39 GMT+01:00 Daniel Verite :

> Corey Huinker wrote:
>
> > I am not asking for this feature now, but do you see any barriers to
> later
> > adding x/xq/xb/xbq equivalents for executing a shell command?
>
> For text contents, we already have this (pasted right from the doc):
>
> testdb=> \set content `cat my_file.txt`
> testdb=> INSERT INTO my_table VALUES (:'content');
>
> Maybe we might look at why it doesn't work for binary string and fix that?
>
> I can think of three reasons:
>
> - psql use C-style '\0' terminated string implying no nul byte in
> variables.
> That could potentially be fixed by tweaking the data structures and
> accessors.
>
> - the backtick operator trims ending '\n' from the result of the command
> (like the shell), but we could add a variant that doesn't have this
> behavior.
>
> - we don't have "interpolate as binary", an operator that will
> essentially apply PQescapeByteaConn rather than PQescapeStringConn.
>
> For example, I'd suggest this syntax:
>
> -- slurp a file into a variable, with no translation whatsoever
> \set content ``cat my_binary_file``
>
> -- interpolate as binary (backquotes instead of quotes)
> INSERT INTO my_table VALUES(:`content`);
>
> If we had something like this, would we still need filerefs? I feel like
> the point of filerefs is mainly to work around lack of support for
> binary in variables, but maybe supporting the latter directly would
> be an easier change to accept.
>

I leaved a concept of fileref - see Tom's objection. I know, so I can hack
``, but I would not do it. It is used for shell (system) calls, and these
functionality can depends on used os. The proposed content commands can be
extended more, and you are doesn't depends on o.s. Another issue of your
proposal is hard support for tab complete (file path).

Please, try to test my last patch.

Regards

Pavel



>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>

Re: [HACKERS] Logical Replication WIP

2016-11-15 Thread Robert Haas

On Sun, Nov 13, 2016 at 4:21 AM, Andres Freund  wrote:
> It can be extended from what core provides, for extended versions of
> replication solutions, for one. I presume publications/subscriptions
> aren't only going to be used by built-in code.

Hmm, I would not have presumed that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: psql \setfileref

2016-11-15 Thread Daniel Verite

Corey Huinker wrote:

> I am not asking for this feature now, but do you see any barriers to later
> adding x/xq/xb/xbq equivalents for executing a shell command?

For text contents, we already have this (pasted right from the doc):

testdb=> \set content `cat my_file.txt`
testdb=> INSERT INTO my_table VALUES (:'content');

Maybe we might look at why it doesn't work for binary string and fix that?

I can think of three reasons:

- psql use C-style '\0' terminated string implying no nul byte in variables.
That could potentially be fixed by tweaking the data structures and
accessors.

- the backtick operator trims ending '\n' from the result of the command
(like the shell), but we could add a variant that doesn't have this behavior.

- we don't have "interpolate as binary", an operator that will
essentially apply PQescapeByteaConn rather than PQescapeStringConn.

For example, I'd suggest this syntax:

-- slurp a file into a variable, with no translation whatsoever
\set content ``cat my_binary_file``

-- interpolate as binary (backquotes instead of quotes)
INSERT INTO my_table VALUES(:`content`);

If we had something like this, would we still need filerefs? I feel like
the point of filerefs is mainly to work around lack of support for
binary in variables, but maybe supporting the latter directly would
be an easier change to accept.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Tom Lane

Kevin Grittner  writes:
> On Tue, Nov 15, 2016 at 3:38 AM, Magnus Hagander  wrote:
>> Is there a reason why we don't log which relation triggered the snapshot too
>> old error when it happens?

> I would probably not want to mess with the text of the error
> itself, in case any client-side software bases recovery on that
> rather than the SQLSTATE value;

Any such code is broken on its face because of localization.
Perhaps including the relname in the main message would make it
unduly long, but if not I'd vote for doing it that way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something is broken about connection startup

2016-11-15 Thread Robert Haas

On Mon, Nov 14, 2016 at 4:42 PM, Tom Lane  wrote:
>> No, I'm not at all proposing to assume that.  But I may be willing to
>> assume that "we don't hold a CatalogSnapshot between Bind and Execute
>> unless we're also holding a transaction snapshot".  I need to do a bit
>> more research to see if that's actually true, though.
>
> Turns out it's not true.
>
> I still don't much like having the main loop in PostgresMain know about
> this hack, so I experimented with putting the InvalidateCatalogSnapshot()
> calls into places in postgres.c that were already dealing with transaction
> commit/abort or snapshot management.  I ended up needing five such calls
> (as in the attached patch), which is just about equally ugly.  So at this
> point I'm inclined to hold my nose and stick a call into step (1) of the
> main loop instead.

Seems like a good idea.

> Also, wherever we end up putting those calls, is it worth providing a
> variant invalidation function that only kills the catalog snapshot when
> it's the only one outstanding?  (If it isn't, the transaction snapshot
> should be older, so there's no chance of advancing our xmin by killing
> it.)  In principle this would save some catalog snapshot rebuilds for
> inside-a-transaction-block cases, but I'm not sure it's worth sweating
> that when we're doing client message exchange anyway.

I think that would be a fairly worthwhile thing to do.

> Lastly, I find myself disliking the separate CatalogSnapshotStale flag
> variable.  The other special snapshots in snapmgr.c are managed by setting
> the pointer to NULL when it's not valid, so I wonder why CatalogSnapshot
> wasn't done that way.  Since this patch is touching almost every use of
> that flag already, it wouldn't take much to switch it over.

I think I had some reason why I did it that way, but I don't think it
was anything important, so I don't object to you revising it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump, pg_dumpall and data durability

2016-11-15 Thread Robert Haas

On Sun, Nov 13, 2016 at 4:18 AM, Andres Freund  wrote:
> On 2016-11-08 18:18:01 -0500, Tom Lane wrote:
>> I think this might be better addressed by adding something to backup.sgml
>> pointing out that you'd better fsync or sync your backups before assuming
>> that they can't be lost.
>
> How does a normal user do that? I don't think there's a cross-platform
> advice we can give, and even on *nix the answer basically is 'sync;
> sync;' which is a pretty big hammer, and might be completely
> unacceptable on a busy server.

Yeah, that's a pretty fair point.  I see the point of this patch
pretty clearly but somehow it makes me nervous anyway.  I'm not sure
there's any better alternative to what's being proposed, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: psql \setfileref

2016-11-15 Thread Pavel Stehule

Hi

2016-11-13 19:46 GMT+01:00 Pavel Stehule :

> Hi
>
> I am sending a initial implementation of psql content commands. This
> design is reaction to Tom's objections against psql file ref variables.
> This design is more cleaner, more explicit and more practical - import can
> be in one step.
>
> Now supported commands are:
>
> r - read file without any modification
> rq - read file, escape as literal and use outer quotes
> rb - read binary file - transform to hex code string
> rbq - read binary file, transform to hex code string and use outer quotes
>
> Usage:
>
> create table testt(a xml);
> insert into test values( {rq /home/pavel/.local/share/rhythmbox/rhythmdb.xml}
> );
> \set xxx {r /home/pavel/.local/share/rhythmbox/rhythmdb.xml}
>
> This patch is demo of this design - one part is redundant - I'll clean it
> in next iteration.
>
> Regards
>
>
here is cleaned patch

* the behave is much more psqlish - show error only when the command is
exactly detected - errors are related to processed files only - the behave
is similar to psql variables - we doesn't raise a error, when the variable
doesn't exists.
* no more duplicate code
* some basic doc
* some basic regress tests

Regards

Pavel





> Pavel
>
>
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 2410bee..141df6f 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2991,6 +2991,57 @@ testdb=> \setenv LESS -imx4F
   
  
 
+  
+Content Commands
+
+
+ These commands inject some content to processed query.
+
+
+testdb=> CREATE TABLE my_table(id int, image bytea);
+testdb=> INSERT INTO my_table VALUES(1, {rbq ~/avatar.gif } ); 
+
+
+ 
+  
+{r filename}
+
+
+ Returns content of file.
+
+
+  
+
+  
+{rb filename}
+
+
+ Returns content of binary file encoded to hex code.
+
+
+  
+
+  
+{rq filename}
+
+
+ Returns content of file, escaped and quoted as string literal
+
+
+  
+
+  
+{rb filename}
+
+
+ Returns content of binary file encoded to hex code and quoted.
+
+
+  
+ 
+
+  
+
  
   Advanced Features
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index a7fdd8a..43f699f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -462,6 +462,7 @@ static void setalarm(int seconds);
 /* callback functions for our flex lexer */
 static const PsqlScanCallbacks pgbench_callbacks = {
 	NULL,		/* don't need get_variable functionality */
+	NULL,		/* don't need eval_content_command functionality */
 	pgbench_error
 };
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index a7789df..de7fecd 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifndef WIN32
 #include /* for write() */
 #else
@@ -168,6 +169,211 @@ psql_get_variable(const char *varname, bool escape, bool as_ident)
 	return result;
 }
 
+#define READFILE_NONE(0)
+#define READFILE_NOESCAPE			(1 << 0)
+#define READFILE_ESCAPE_TEXT		(1 << 1)
+#define READFILE_ESCAPE_BINARY		(1 << 2)
+#define READFILE_QUOTE(1 << 3)
+
+/*
+ * file-content-fetching callback for read file content commands.
+ */
+static char *
+get_file_content(char *fname, int mode)
+{
+	FILE	   *fd;
+	char	   *result = NULL;
+
+	expand_tilde(&fname);
+	canonicalize_path(fname);
+
+	fd = fopen(fname, PG_BINARY_R);
+	if (fd)
+	{
+		struct stat		fst;
+
+		if (fstat(fileno(fd), &fst) != -1)
+		{
+			if (S_ISREG(fst.st_mode))
+			{
+if (fst.st_size <= ((int64) 1024) * 1024 * 1024)
+{
+	size_tsize;
+	PQExpBufferData		raw_data;
+	charbuf[512];
+
+	initPQExpBuffer(&raw_data);
+
+	while ((size = fread(buf, 1, sizeof(buf), fd)) > 0)
+		appendBinaryPQExpBuffer(&raw_data, buf, size);
+
+	if (!ferror(fd) && !(PQExpBufferDataBroken(raw_data)))
+	{
+		if (!(mode & READFILE_NOESCAPE))
+		{
+			if (mode & READFILE_ESCAPE_BINARY)
+			{
+unsigned char	   *escaped_value;
+size_tescaped_size;
+
+escaped_value = PQescapeByteaConn(pset.db,
+		(const unsigned char *) raw_data.data, raw_data.len,
+  &escaped_size);
+
+if (escaped_value)
+{
+	if (mode & READFILE_QUOTE)
+	{
+		PQExpBufferData		resultbuf;
+
+		initPQExpBuffer(&resultbuf);
+
+		appendPQExpBufferChar(&resultbuf, '\'');
+		appendBinaryPQExpBuffer(&resultbuf,
+(const char *) escaped_value, escaped_size - 1);
+		appendPQExpBufferChar(&resultbuf, '\'');
+		PQfreemem(escaped_value);
+
+		if (PQExpBufferDataBroken(resultbuf))
+			psql_error("out of memory\n");
+		else
+			result = resultbuf.data;
+	}
+	else
+

Re: [HACKERS] Snapshot too old logging

2016-11-15 Thread Kevin Grittner

On Tue, Nov 15, 2016 at 3:38 AM, Magnus Hagander  wrote:
> Is there a reason why we don't log which relation triggered the snapshot too
> old error when it happens?
>
> Since we do have Relation available in TestForOldSnapshot_impl(), shouldn't
> we be able to include it?
>
> Having that in the log message would be very helpful, but is there a reason
> why it wouldn't work?

I would probably not want to mess with the text of the error
itself, in case any client-side software bases recovery on that
rather than the SQLSTATE value; but including relation (and maybe
the timestamp on the snapshot?) in detail or context portions seems
like a great idea.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] jsonb_delete with arrays

2016-11-15 Thread Magnus Hagander

Attached is an implantation of jsonb_delete that instead of taking a single
key to remove accepts an array of keys (it still does just keys, so it's
using the - operator, it's not like the path delete function that also
takes an array, but uses a different operator).

In some simple testing of working through a real world usecases where we
needed to delete 7 keys from jsonb data, it shows approximately a 9x
speedup over calling the - operator multiple times. I'm guessing since we
copy a lot less and don't have to re-traverse the structure.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 17ee4e4..a7c92d6 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -3438,6 +3438,92 @@ jsonb_delete(PG_FUNCTION_ARGS)
 }
 
 /*
+ * SQL function jsonb_delete (jsonb, text[])
+ *
+ * return a copy of the jsonb with the indicated items
+ * removed.
+ */
+Datum
+jsonb_delete_array(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB(0);
+	ArrayType  *keys = PG_GETARG_ARRAYTYPE_P(1);
+	Datum	   *keys_elems;
+	bool	   *keys_nulls;
+	int			keys_len;
+	JsonbParseState *state = NULL;
+	JsonbIterator *it;
+	JsonbValue	v,
+			   *res = NULL;
+	bool		skipNested = false;
+	JsonbIteratorToken r;
+
+	if (ARR_NDIM(keys) > 1)
+		ereport(ERROR,
+(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+ errmsg("wrong number of array subscripts")));
+
+	if (JB_ROOT_IS_SCALAR(in))
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot delete from scalar")));
+
+	if (JB_ROOT_COUNT(in) == 0)
+		PG_RETURN_JSONB(in);
+
+	deconstruct_array(keys, TEXTOID, -1, false, 'i',
+	  &keys_elems, &keys_nulls, &keys_len);
+
+	if (keys_len == 0)
+		PG_RETURN_JSONB(in);
+
+	it = JsonbIteratorInit(&in->root);
+
+	while ((r = JsonbIteratorNext(&it, &v, skipNested)) != 0)
+	{
+		skipNested = true;
+
+		if ((r == WJB_ELEM || r == WJB_KEY) && v.type == jbvString)
+		{
+			int			i;
+			bool		found = false;
+
+			for (i = 0; i < keys_len; i++)
+			{
+char	   *keyptr;
+int			keylen;
+
+if (keys_nulls[i])
+	continue;
+
+keyptr = VARDATA_ANY(keys_elems[i]);
+keylen = VARSIZE_ANY_EXHDR(keys_elems[i]);
+if (keylen == v.val.string.len &&
+	memcmp(keyptr, v.val.string.val, keylen) == 0)
+{
+	found = true;
+	break;
+}
+			}
+			if (found)
+			{
+/* skip corresponding value as well */
+if (r == WJB_KEY)
+	JsonbIteratorNext(&it, &v, true);
+
+continue;
+			}
+		}
+
+		res = pushJsonbValue(&state, r, r < WJB_BEGIN_ARRAY ? &v : NULL);
+	}
+
+	Assert(res != NULL);
+
+	PG_RETURN_JSONB(JsonbValueToJsonb(res));
+}
+
+/*
  * SQL function jsonb_delete (jsonb, int)
  *
  * return a copy of the jsonb with the indicated item
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 26fa618..347b2e1 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -1820,6 +1820,8 @@ DATA(insert OID = 3284 (  "||"	   PGNSP PGUID b f f 3802 3802 3802 0 0 jsonb_con
 DESCR("concatenate");
 DATA(insert OID = 3285 (  "-"	   PGNSP PGUID b f f 3802 25 3802 0 0 3302 - - ));
 DESCR("delete object field");
+DATA(insert OID = 3344 (  "-"  PGNSP PGUID b f f 3802 1009 3802 0 0 3343 - -));
+DESCR("delete object fields");
 DATA(insert OID = 3286 (  "-"	   PGNSP PGUID b f f 3802 23 3802 0 0 3303 - - ));
 DESCR("delete array element");
 DATA(insert OID = 3287 (  "#-"	   PGNSP PGUID b f f 3802 1009 3802 0 0 jsonb_delete_path - - ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 17ec71d..80c9ddc 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -4895,6 +4895,7 @@ DESCR("GIN support");
 DATA(insert OID = 3301 (  jsonb_concat	   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3802 "3802 3802" _null_ _null_ _null_ _null_ _null_ jsonb_concat _null_ _null_ _null_ ));
 DATA(insert OID = 3302 (  jsonb_delete	   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3802 "3802 25" _null_ _null_ _null_ _null_ _null_ jsonb_delete _null_ _null_ _null_ ));
 DATA(insert OID = 3303 (  jsonb_delete	   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3802 "3802 23" _null_ _null_ _null_ _null_ _null_ jsonb_delete_idx _null_ _null_ _null_ ));
+DATA(insert OID = 3343 ( jsonb_delete  PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3802 "3802 1009" _null_ _null_ _null_ _null_ _null_ jsonb_delete_array _null_ _null_ _null_ ));
 DATA(insert OID = 3304 (  jsonb_delete_pathPGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 3802 "3802 1009" _null_ _null_ _null_ _null_ _null_ jsonb_delete_path _null_ _null_ _null_ ));
 DATA(insert OID = 3305 (  jsonb_setPGNSP PGUID 12 1 0 0 0 f f f f t f i s 4 0 3802 "3802 1009 3802 16" _null_ _null_ _null_ _null_ _null_ jsonb_set _null_ _null_ _null_ ));
 DESCR("Set part of a jsonb");
diff --git a/src/include/utils/jsonb.h b/src/include/utils

Re: [HACKERS] WAL consistency check facility

2016-11-15 Thread Kuntal Ghosh

On Tue, Nov 15, 2016 at 7:23 PM, Robert Haas  wrote:
> On Sat, Nov 12, 2016 at 10:06 PM, Peter Eisentraut
>  wrote:
>> On 11/9/16 11:55 PM, Michael Paquier wrote:
>>> + 
>>> +  wal_consistency (string)
>>> +  
>>> +   wal_consistency configuration 
>>> parameter
>>> +  
>>> +  
>>> +  
>>> +   
>>> +This parameter is used to check the consistency of WAL records, 
>>> i.e,
>>> +whether the WAL records are inserted and applied correctly. When
>>> +wal_consistency is enabled for a WAL record, it
>>> +stores a full-page image along with the record. When a full-page 
>>> image
>>> +arrives during redo, it compares against the current page to check 
>>> whether
>>> +both are consistent.
>>> +   
>>
>> Could we name this something like wal_consistency_checking?
>>
>> Otherwise it sounds like you use this to select the amount of
>> consistency in the WAL (similar to, say, wal_level).
>
> +1.  I like that name.
I've modified the guc parameter name as wal_consistency_check (little
hesitant for a participle in suffix :) ). Also, updated the sgml and
variable name accordingly.
FYI, regression test will fail because of an inconsistency in brin
page. I've already submitted a patch for that. Following is the thread
for the same:
https://www.postgresql.org/message-id/flat/CAGz5QCJ%3D00UQjScSEFbV%3D0qO5ShTZB9WWz_Fm7%2BWd83zPs9Geg%40mail.gmail.com#CAGz5QCJ=00UQjScSEFbV=0qo5shtzb9wwz_fm7+wd83zps9...@mail.gmail.com
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


walconsistency_v15.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw and defaults

2016-11-15 Thread Tom Lane

Andrew Dunstan  writes:
> I know we've discussed this before, but I have just had the unpleasant 
> experience of trying to get around the difficulty of inserting into a 
> foreign table with a serial field, surely a common enough scenario that 
> we should try to deal with it better. The solution of using a local 
> sequence really doesn't work, as there might be multiple users of the 
> table, as there will be in my scenario. I opted instead to get a value 
> from the foreign sequence explicitly before inserting, but that's pretty 
> ugly. So I am wondering (without having looked at all closely at it) if 
> we could set an option to tell the FDW that we want the foreign default 
> to be used instead of a local one. Is the difficulty that we don't know 
> if a value has been explicitly supplied or not? Maybe we could have some 
> magic value that we could use instead ('foreign_default'?). I'm just 
> throwing out ideas here, but this is really a wart that could well do 
> with attention.

I'm not awake enough to recall the previous discussions of remote
default-value insertion in any detail, but they were extensive, and
no one has proposed solutions to the problems we hit.  Please consult
the archives.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: psql \setfileref

2016-11-15 Thread Pavel Stehule

2016-11-15 16:41 GMT+01:00 Corey Huinker :

>
> On Sun, Nov 13, 2016 at 1:46 PM, Pavel Stehule 
> wrote:
>
>> r - read file without any modification
>> rq - read file, escape as literal and use outer quotes
>> rb - read binary file - transform to hex code string
>> rbq - read binary file, transform to hex code string and use outer quotes
>>
>> Usage:
>
>
> I am not asking for this feature now, but do you see any barriers to later
> adding x/xq/xb/xbq equivalents for executing a shell command?
>

any other new commands can be appended simply - this is really generic

Regards

Pavel

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-11-15 Thread Robert Haas

On Wed, Nov 9, 2016 at 10:18 PM, Peter Geoghegan  wrote:
>> Maybe another way of putting this is that, while there's clearly a
>> benefit to having some kind of a cap, it's appropriate to pick a large
>> value, such as 500.  Having no cap at all risks creating many extra
>> tapes that just waste memory, and also risks an unduly
>> cache-inefficient final merge.  Reigning that in makes sense.
>> However, we can't reign it in too far or we'll create slow polyphase
>> merges in case that are reasonably likely to occur in real life.
>
> I completely agree with your analysis.

Cool.  BTW, my run with 10 tapes completed in 10696528.377 ms
(02:58:16.528) - i.e. almost 3 minutes slower than with no tape limit.
Building runs took 4260.16 s, and the final merge pass began at
8239.12 s.  That's certainly better than I expected, and it seems to
show that even if the number of tapes is grossly inadequate for the
number of runs, you can still make up most of the time that you lose
to I/O with improved cache efficiency -- at least under favorable
circumstances.  Of course, on many systems I/O bandwidth will be a
scarce resource, so that argument can be overdone -- and even if not,
a 10-tape sort version takes FAR longer to deliver the first tuple.

I also tried this out with work_mem = 512MB.  Doubling work_mem
reduces the number of runs enough that we don't get a polyphase merge
in any case.  With no limit on tapes:

2016-11-10 11:24:45 UTC [54042] LOG:  switching to external sort with
1873 tapes: CPU: user: 11.34 s, system: 0.48 s, elapsed: 12.13 s
2016-11-10 12:36:22 UTC [54042] LOG:  finished writing run 308 to tape
307: CPU: user: 4096.63 s, system: 156.88 s, elapsed: 4309.66 s
2016-11-10 12:36:22 UTC [54042] LOG:  using 516563 KB of memory for
read buffers among 308 input tapes
2016-11-10 12:36:30 UTC [54042] LOG:  performsort done (except 308-way
final merge): CPU: user: 4097.75 s, system: 157.24 s, elapsed: 4317.76
s
2016-11-10 13:54:07 UTC [54042] LOG:  external sort ended, 6255577
disk blocks used: CPU: user: 8638.72 s, system: 177.42 s, elapsed:
8974.44 s

With a max_sort_tapes = 501:

2016-11-10 14:23:50 UTC [54042] LOG:  switching to external sort with
502 tapes: CPU: user: 10.99 s, system: 0.54 s, elapsed: 11.57 s
2016-11-10 15:36:47 UTC [54042] LOG:  finished writing run 278 to tape
277: CPU: user: 4190.31 s, system: 155.33 s, elapsed: 4388.86 s
2016-11-10 15:36:47 UTC [54042] LOG:  using 517313 KB of memory for
read buffers among 278 input tapes
2016-11-10 15:36:54 UTC [54042] LOG:  performsort done (except 278-way
final merge): CPU: user: 4191.36 s, system: 155.68 s, elapsed: 4395.66
s
2016-11-10 16:53:39 UTC [54042] LOG:  external sort ended, 6255699
disk blocks used: CPU: user: 8673.07 s, system: 175.93 s, elapsed:
9000.80 s

0.3% slower with the tape limit, but that might be noise.  Even if
not, it seems pretty silly to create 1873 tapes when we only need
~300.

At work_mem = 2GB:

2016-11-10 18:08:00 UTC [54042] LOG:  switching to external sort with
7490 tapes: CPU: user: 44.28 s, system: 1.99 s, elapsed: 46.33 s
2016-11-10 19:23:06 UTC [54042] LOG:  finished writing run 77 to tape
76: CPU: user: 4342.10 s, system: 156.21 s, elapsed: 4551.95 s
2016-11-10 19:23:06 UTC [54042] LOG:  using 2095202 KB of memory for
read buffers among 77 input tapes
2016-11-10 19:23:12 UTC [54042] LOG:  performsort done (except 77-way
final merge): CPU: user: 4343.36 s, system: 157.07 s, elapsed: 4558.79
s
2016-11-10 20:24:24 UTC [54042] LOG:  external sort ended, 6255946
disk blocks used: CPU: user: 7894.71 s, system: 176.36 s, elapsed:
8230.13 s

At work_mem = 2GB, max_sort_tapes = 501:

2016-11-10 21:28:23 UTC [54042] LOG:  switching to external sort with
502 tapes: CPU: user: 44.09 s,
 system: 1.94 s, elapsed: 46.07 s
2016-11-10 22:42:28 UTC [54042] LOG:  finished writing run 68 to tape
67: CPU: user: 4278.49 s, system: 154.39 s, elapsed: 4490.25 s
2016-11-10 22:42:28 UTC [54042] LOG:  using 2095427 KB of memory for
read buffers among 68 input tapes
2016-11-10 22:42:34 UTC [54042] LOG:  performsort done (except 68-way
final merge): CPU: user: 4279.60 s, system: 155.21 s, elapsed: 4496.83
s
2016-11-10 23:42:10 UTC [54042] LOG:  external sort ended, 6255983
disk blocks used: CPU: user: 7733.98 s, system: 173.85 s, elapsed:
8072.55 s

Roughly 2% faster.  Maybe still noise, but less likely.  7490 tapes
certainly seems over the top.

At work_mem = 8GB:

2016-11-14 19:17:28 UTC [54042] LOG:  switching to external sort with
29960 tapes: CPU: user: 183.80 s, system: 7.71 s, elapsed: 191.61 s
2016-11-14 20:32:02 UTC [54042] LOG:  finished writing run 20 to tape
19: CPU: user: 4431.44 s, system: 176.82 s, elapsed: 4665.16 s
2016-11-14 20:32:02 UTC [54042] LOG:  using 8388083 KB of memory for
read buffers among 20 input tapes
2016-11-14 20:32:26 UTC [54042] LOG:  performsort done (except 20-way
final merge): CPU: user: 4432.99 s, system: 181.29 s, elapsed: 4689.52
s
2016-11-14 21:30:56 UTC [54042] LOG:  external sort ended, 6256

Re: [HACKERS] proposal: psql \setfileref

2016-11-15 Thread Corey Huinker

On Sun, Nov 13, 2016 at 1:46 PM, Pavel Stehule 
wrote:

> r - read file without any modification
> rq - read file, escape as literal and use outer quotes
> rb - read binary file - transform to hex code string
> rbq - read binary file, transform to hex code string and use outer quotes
>
> Usage:

I am not asking for this feature now, but do you see any barriers to later
adding x/xq/xb/xbq equivalents for executing a shell command?

[HACKERS] Parallel execution and prepared statements

2016-11-15 Thread Albe Laurenz

Tobias Bussmann has discovered an oddity with prepared statements.

Parallel scan is used with prepared statements, but only if they have
been created with protocol V3 "Parse".
If a prepared statement has been prepared with the SQL statement PREPARE,
it will never use a parallel scan.

I guess that is an oversight in commit 57a6a72b, right?
PrepareQuery in commands/prepare.c should call CompleteCachedPlan
with cursor options CURSOR_OPT_PARALLEL_OK, just like
exec_prepare_message in tcop/postgres.c does.

The attached patch fixes the problem for me.

Yours,
Laurenz Albe


0001-Consider-parallel-plans-for-statements-prepared-with.patch
Description: 0001-Consider-parallel-plans-for-statements-prepared-with.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Danger of automatic connection reset in psql

2016-11-15 Thread Jim Nasby


On 11/14/16 5:41 AM, Oleksandr Shulgin wrote:

Automatic connection reset is a nice feature for server development,
IMO.  Is it really useful for anything else is a good question.


I use it all the time for application development; my rebuild script 
will forcibly kick everyone out to re-create the database. I put that in 
because I invariably end up with a random psql sitting somewhere that I 
don't want to track down.


What currently stinks though is if the connection is dead and the next 
command I run is a \i, psql just dies instead of re-connecting. It'd be 
nice if before reading the script it checked connection status and 
attempted a reconnect.



At least an option to control that behavior seems like a good idea,
maybe even set it to 'no reconnect' by default, so that people who
really use it can make conscious choice about enabling it in their
.psqlrc or elsewhere.


+1, I don't think it needs to be the default.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Physical append-only tables

2016-11-15 Thread Magnus Hagander

On Mon, Nov 14, 2016 at 9:43 PM, Greg Stark  wrote:

> On Sun, Nov 13, 2016 at 3:45 PM, Magnus Hagander 
> wrote:
> > For a scenario like this, would it make sense to have an option that
> could
> > be set on an individual table making it physical append only? Basically
> > VACUUM would run as normal and clean up the old space when rows are
> deleted
> > back in history, but when new space is needed for a row the system would
> > never look at the old blocks, and only append to the end.
>
> I don't think "appending" is the right way to think about this. It
> happens to address the problem but only accidentally and only
> partially. More generally what you have is two different kinds of data
> with two different access patterns and storage requirements in the
> same table. They're logically similar but have different practical
> requirements.
>
> If there was some way to teach the database that your table is made of
> two different types of data and how to distinguish the two types then
> when the update occurs it could move the row to the right section of
> storage... This might be something the new partitioning could handle
> or it might need something more low-level and implicit.
>

Agreed, though in the cases I've looked at this has not been a static
thing. Some of it might be driven off dynamic data somewhere else, some of
it may be "this data has to be deleted for regulatory reasons". That can
show up on a case-by-case basis. I don't think the partitioning can really
be *that* flexible, though it might be able to pick up some of the issues.

The problem with the partitioning is also that it only works if you can
ensure the partitioning key is in every query, which often doesn't work out
against this.

> That said, I don't think the "maintain clustering a bit better using
> BRIN" is a bad idea. It's just the bit about turning a table
> append-only to deal with update-once data that I think is overreach.
>

In the use-cases I've had it's really the DELETE that's the problem. In all
those cases UPDATEs only happen on fairly recent data, so it doesn't really
screw with the BRIN. It's DELETEs of old data that's the big issue.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] A bug in UCS_to_most.pl

2016-11-15 Thread Robert Haas

On Tue, Nov 8, 2016 at 3:56 AM, Kyotaro HORIGUCHI
 wrote:
> I was surprised to find that
> src/backend/utils/mb/Unicode/Makefile got changed (3a47c70) to
> have download feature of character conversion authority files. It
> seems fine according to the previous discussion as the commnet of
> that is saying.
>
> So, I did 'make maintainer-clean; make' on it but got the
> following error.
>
>> Hash %filename missing the % in argument 1 of keys() at UCS_to_most.pl line 
>> 51.'/usr/bin/perl' UCS_to_most.pl
>
> What is more surprising to find that it is there since
> 2006. Maybe Heikki unintentionally fixed that in the patch in the
> thread, or older perls might not have complained about it (my
> perl is 5.16).
>
> Anyway, the attached small patch fixes it.

It's quite amazing to me that that ever worked for anyone.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-15 Thread Alvaro Herrera

Robert Haas wrote:
> On Mon, Nov 14, 2016 at 2:38 AM, Mithun Cy  wrote:

> > I have not tested with logical replication. Currently we identify the
> > primary to connect based on result of "SELECT pg_is_in_recovery()". So I
> > think it works. Do you want me test a particular setup?
> 
> If logical replication is in use, none of the servers involved would
> be in recovery.  I'm not sure what command would need to be used to
> assess whether we've got a master or a standby, but probably not that
> one.  This gets at one of my earlier complaints about this part of the
> functionality, which is that hardcoding that particular SQL statement
> into libpq seems like a giant hack.  However, I'm not sure what to do
> about it.  The functionality is clearly useful, because JDBC has it,
> and Victor proposed this patch to add it to libpq, and - totally
> independently of any of that - EnterpriseDB has a customer who has
> requested libpq support for this as well.  So I am tempted to just
> hold my nose and hard-code the SQL as JDBC is presumably already
> doing.  If we figure out what the equivalent for logical replication
> would be we can add something to cover that case, too.  It's ugly, but
> I don't have a better idea, and I think there's value in being
> compatible with what JDBC has already done (even if it's not what we
> would have chosen to do tabula rasa).

I would rather come up with something that works in both cases that we
can extend internally later, say pg_is_primary_node() or something like
that instead; and we implement it initially by returning the inverse of
pg_is_in_recovery() for replicated non-logical flocks, while we figure
out what to do in logical replication.  Otherwise it will be harder to
change later if we embed it in libpq, and we may be forced into
supporting nonsensical situations such as having pg_is_in_recovery()
return true for logical replication primary nodes.

FWIW I'm not sure "primary" is the right term here either.  I think what
we really want to know is whether the node can accept writes; maybe
"pg_is_writable_node".

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-15 Thread Robert Haas

On Mon, Nov 14, 2016 at 8:09 PM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Mithun Cy
>> Thanks, my concern is suppose you have 3 server in cluster A(new version),
>> B(new version), C(old version). If we implement as above only new servers
>> will send ParameterStatus message to indicate what type of server we are
>> connected. Server C will not send same. So we will not be able to use new
>> feature "failover to new master" for such a kind of cluster.
>
> No, the streaming replication requires the same major release for all member 
> servers, so there's no concern about the mixed-version cluster.

True, but there is a concern about a newer libpq connecting to older
servers.  If we mimic what JDBC is already doing, we'll be compatible
and you'll be able to use this feature with a v10 libpq without
worrying about whether the target server is also v10.  If we invent
something new on the server side, then you'll need to be sure you have
both a v10 libpq and v10 server.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-15 Thread Robert Haas

On Mon, Nov 14, 2016 at 2:38 AM, Mithun Cy  wrote:
> On Fri, Nov 11, 2016 at 7:33 PM, Peter Eisentraut
>  wrote:
>> We tend to use the term "primary" instead of "master".
>
> Thanks, I will use primary instead of master in my next patch.
>
>>Will this work with logical replication?
>
> I have not tested with logical replication. Currently we identify the
> primary to connect based on result of "SELECT pg_is_in_recovery()". So I
> think it works. Do you want me test a particular setup?

If logical replication is in use, none of the servers involved would
be in recovery.  I'm not sure what command would need to be used to
assess whether we've got a master or a standby, but probably not that
one.  This gets at one of my earlier complaints about this part of the
functionality, which is that hardcoding that particular SQL statement
into libpq seems like a giant hack.  However, I'm not sure what to do
about it.  The functionality is clearly useful, because JDBC has it,
and Victor proposed this patch to add it to libpq, and - totally
independently of any of that - EnterpriseDB has a customer who has
requested libpq support for this as well.  So I am tempted to just
hold my nose and hard-code the SQL as JDBC is presumably already
doing.  If we figure out what the equivalent for logical replication
would be we can add something to cover that case, too.  It's ugly, but
I don't have a better idea, and I think there's value in being
compatible with what JDBC has already done (even if it's not what we
would have chosen to do tabula rasa).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

2016-11-15 Thread Amit Kapila

On Tue, Nov 15, 2016 at 7:51 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
>> It looks like the code in 9.3 or later version uses the recptr as the target
>> segment location
>> (targetSegmentPtr) whereas 9.2 uses recptr as beginning of segment (readOff
>> = 0;).  If above understanding is right then it will set different values
>> for latestPagePtr in 9.2 and 9.3 onwards code.
>>
>
> In 9.2, the relevant variable is not recptr but recaddr.  recaddr in 9.2 and 
> recptr in later releases point to the beginning of a page just read, which is 
> not always the beginning of the segment (targetSegmentPtr).
>

I think it beginning of segment (aka the first page of the segment),
even the comment indicates the same.

/*
* Whenever switching to a new WAL segment, we read the first page of
* the file and validate its header, even if that's not where the
* target record is. ...
..
*/

However, on again looking at the code, it seems that part of code
behaves similarly for both 9.2 and 9.3.

..Because node3 found a WAL
!  * record fragment at the end of segment 10, it expects to find the
!  * remaining fragment at the beginning of WAL segment 11 streamed from
!  * node2. But there was a fragment of a different WAL record, because
!  * node2 overwrote a different WAL record at the end of segment 10 across
!  * to 11.

How does node3 ensure that the fragment of WAL in segment 11 is
different?  Isn't it possible that when node2 overwrites the last
record in WAL segment 10, it writes a record of slightly different
contents but which is of the same size as an original record in WAL
segment 10?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-15 Thread Robert Haas

On Sat, Nov 12, 2016 at 10:06 PM, Peter Eisentraut
 wrote:
> On 11/9/16 11:55 PM, Michael Paquier wrote:
>> + 
>> +  wal_consistency (string)
>> +  
>> +   wal_consistency configuration 
>> parameter
>> +  
>> +  
>> +  
>> +   
>> +This parameter is used to check the consistency of WAL records, i.e,
>> +whether the WAL records are inserted and applied correctly. When
>> +wal_consistency is enabled for a WAL record, it
>> +stores a full-page image along with the record. When a full-page 
>> image
>> +arrives during redo, it compares against the current page to check 
>> whether
>> +both are consistent.
>> +   
>
> Could we name this something like wal_consistency_checking?
>
> Otherwise it sounds like you use this to select the amount of
> consistency in the WAL (similar to, say, wal_level).

+1.  I like that name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] postgres_fdw and defaults

2016-11-15 Thread Andrew Dunstan



I know we've discussed this before, but I have just had the unpleasant 
experience of trying to get around the difficulty of inserting into a 
foreign table with a serial field, surely a common enough scenario that 
we should try to deal with it better. The solution of using a local 
sequence really doesn't work, as there might be multiple users of the 
table, as there will be in my scenario. I opted instead to get a value 
from the foreign sequence explicitly before inserting, but that's pretty 
ugly. So I am wondering (without having looked at all closely at it) if 
we could set an option to tell the FDW that we want the foreign default 
to be used instead of a local one. Is the difficulty that we don't know 
if a value has been explicitly supplied or not? Maybe we could have some 
magic value that we could use instead ('foreign_default'?). I'm just 
throwing out ideas here, but this is really a wart that could well do 
with attention.



cheers


andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-15 Thread Amit Kapila

On Tue, Nov 15, 2016 at 9:27 AM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Mon, 14 Nov 2016 16:53:35 +0530, Amit Kapila  
> wrote in
>> >
>>
>> The progress parameter is used not only for checkpoint activity but by
>> bgwriter as well for logging standby snapshot.  If you want to keep
>> this record under no_progress category (which I don't endorse), then
>> it might be better to add a comment, so that it will be easier for the
>> readers of this code to understand the reason.
>
> I rather agree to that. But how detailed it should be?
>
>>LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks)
>>{
>>...
>>   XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock));
>>   /* Needs XLOG_SKIP_PROGRESS because called from LogStandbySnapshot */
>>   XLogSetFlags(XLOG_SKIP_PROGRESS);
>
> or
>
>>   /*
>>* Needs XLOG_SKIP_PROGRESS because called from LogStandbySnapshot.
>>* See the comment for LogCurrentRunningXact for the detail.
>>*/
>
> or more detiled?
>

I think referring to a place where we have explained why skipping XLOG
progress is okay for this or related WAL records (like comments for
struct WALInsertLock)  will be more suitable.  Also, maybe it is worth
mentioning that this code will skip updating XLOG progress even when
we want to log AccessExclusiveLocks for operations other than a
snapshot.


> The term "WAL activity' is used in the comment for
> GetProgressRecPtr. Its meaning is not clear but not well
> defined. Might need a bit detailed explanation about that or "WAL
> activity tracking". What do you think about this?
>

I would have written it as below:

GetProgressRecPtr -- Returns the WAL progress.  WAL progress is
determined by scanning each WALinsertion lock by taking directly the
light-weight lock associated to it.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-15 Thread Etsuro Fujita

On 2016/11/11 20:50, Etsuro Fujita wrote:

On 2016/11/11 19:22, Ashutosh Bapat wrote:

The patch looks in good shape now. Here are some comments. I have also
made several changes to comments correcting grammar, typos, style and
at few places logic. Let me know if the patch looks good.

OK, will look into that.

Done.  +1 for the changes you made, except for few things; (1) You added 
the following comments to deparseSelectSql:

+   /*
+* For a non-base relation, use the input tlist. If a base 
relation is
+* being deparsed as a subquery, use input tlist, if the caller 
has passed
+* one. The column aliases of the subquery are crafted based on 
the input
+* tlist. If tlist is NIL, there will not be any expression 
referring to
+* any of the columns of the base relation being deparsed. Hence 
it doesn't
+* matter whether the base relation is being deparsed as 
subquery or not.

+*/

The last sentence seems confusing to me.  My understanding is: if a base 
relation has tlist=NIL, then the base relation *must* be deparsed as 
ordinary, not as a subquery.  Maybe I'm missing something, though.  (I 
left that as-is, but I think we need to reword that to be more clear, at 
least.)

(2) You added the following comments to deparseRangeTblRef:

> +  * If make_subquery is true, deparse the relation as a subquery. 
Otherwise,

> +  * deparse it as relation name with alias.

The second sentence seems confusing to me, too, because it looks like 
the relation being deparsed is assumed to be a base relation, but the 
relation can be a join relation, which might join base relations, lower 
join relations, and/or lower subqueries.  So, I modified the sentence a bit.

(3) I don't think we need this in isSubqueryExpr, so I removed it from 
the patch:

+   /* Keep compiler happy. */
+   return false;

Also, I revised comments you added a little bit.

I guess, below code
+   if (!fpinfo->subquery_rels)
+   return false;
can be changed to
if (!bms_is_member(node->varno, fpinfo->subquery_rels))
return false;
Also the return values from the recursive calls to isSubqueryExpr()
can be
returned as is. I have included this change in the patch.

Will look into that too.

Done.  That's a good idea!

deparse.c seems to be using capitalized names for function which
actually deparse something and an non-capitalized form for helper
functions.

From that perspective attached patch renames isSubqueryExpr
as is_subquery_var() and getSubselectAliasInfo() as
get_alias_id_for_var(). Actually both these functions accept a Var
node but somehow their names refer to expr.

OK, I changed isSubqueryExpr to is_subquery_expr; I kept to refer to 
expr because I think we would soon extend that function so that it can 
handle PHVs, as I said upthread.  For getSubselectAliasInfo, I changed 
the name to get_subselect_alias_id, because (1) the word "alias" seems 
general and (2) the "for_var" part would make the name a bit long.

This patch is using make_tlist_from_pathtarget() to create tlist to be
deparsed but search in RelOptInfo::reltarget::exprs for a given Var.
As long as the relations deparsed do not carry expressions, this might
work, but it will certainly break once we start deparsing relations
with expressions since the upper most query's tlist contains only
Vars. Instead, we should probably, create tlist and save it in fpinfo
and use it later for searching (tlist_member()?). Possibly use using
build_tlist_to_deparse(), to create the tlist similar so that
targetlist list creation logic is same for all the relations being
deparsed. I haven't included this change in the patch.

Seems like a good idea.  Will revise.

Done.  I modified the patch as proposed; create the tlist by 
build_tlist_to_deparse in foreign_join_ok, if needed, and search the 
tlist by tlist_member.  I also added a new member "tlist" to 
PgFdwRelationInfo to save the tlist created in foreign_join_ok.

Another idea on the "tlist" member would be to save a tlist created for 
EXPLAIN into that member in estimate_patch_cost_size, so that we don't 
need to generate the tlist again in postgresGetForeignPlan, when 
use_remote_estimate=true.  But I'd like to leave that for another patch.

Please find attached an updated version of the patch.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 109,114  typedef struct deparse_expr_cxt
--- 109,116 
  /* Handy macro to add relation name qualification */
  #define ADD_REL_QUALIFIER(buf, varno)	\
  		appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno))
+ #define SS_TAB_ALIAS_PREFIX	"s"
+ #define SS_COL_ALIAS_PREFIX	"c"

  /*
   * Functions to determine whether an expression can be evaluated safely on
***
*** 167,172  static void appendConditions(List *exprs, deparse_expr_cxt *context);
--- 169,180 
  static void deparseFromExprForRel(Str

Re: [HACKERS] Improvements in psql hooks for variables

2016-11-15 Thread Daniel Verite

  Hi,

I'm attaching v3 of the patch with error reporting for
FETCH_COUNT as mentioned upthread, and rebased
on the most recent master.

> I was suggesting change on the lines of extending generic_boolean_hook to
> include enum(part in enum_hooks which calls ParseVariableBool) and
> integer parsing as well.

Well, generic_boolean_hook() is meant to change this, for instance:

  static void
  on_error_stop_hook(const char *newval)
  {
 pset.on_error_stop = ParseVariableBool(newval, "ON_ERROR_STOP");
  }

into that:

  static bool
  on_error_stop_hook(const char *newval)
  {
 return generic_boolean_hook(newval, "ON_ERROR_STOP",
   &pset.on_error_stop);
   }

with the goal that the assignment does not occur if "newval" is bogus.
The change is really minimal.

When we're dealing with enum-or-bool variables, such as for instance
ECHO_HIDDEN, the patch replaces this:

  static void
  echo_hidden_hook(const char *newval)
  {
if (newval == NULL)
  pset.echo_hidden = PSQL_ECHO_HIDDEN_OFF;
else if (pg_strcasecmp(newval, "noexec") == 0)
  pset.echo_hidden = PSQL_ECHO_HIDDEN_NOEXEC;
else if (ParseVariableBool(newval, "ECHO_HIDDEN"))
  pset.echo_hidden = PSQL_ECHO_HIDDEN_ON;
else  /* ParseVariableBool printed msg if needed */
  pset.echo_hidden = PSQL_ECHO_HIDDEN_OFF;
  }

with that:

  static bool
  echo_hidden_hook(const char *newval)
  {
if (newval == NULL)
  pset.echo_hidden = PSQL_ECHO_HIDDEN_OFF;
else if (pg_strcasecmp(newval, "noexec") == 0)
  pset.echo_hidden = PSQL_ECHO_HIDDEN_NOEXEC;
else
{
  bool isvalid;
  bool val = ParseVariableBool(newval, "ECHO_HIDDEN", &isvalid);
  if (!isvalid)
return false; /* ParseVariableBool printed msg */
  pset.echo_hidden = val ? PSQL_ECHO_HIDDEN_ON : PSQL_ECHO_HIDDEN_OFF;
}
return true;
  }

The goal being again to reject a bogus assignment, as opposed to replacing
it with any hardwired value.
Code-wise, we can't call generic_boolean_hook() here because we need
to assign a non-boolean specific value after having parsed the ON/OFF
user-supplied string.

More generally, it turns out that the majority of hooks are concerned
by this patch, as they parse user-supplied values, but there
are 4 distinct categories of variables:

1- purely ON/OFF vars:
   AUTOCOMMIT, ON_ERROR_STOP, QUIET, SINGLELINE, SINGLESTEP

2- ON/OFF mixed with enum values:
  ECHO_HIDDEN, ON_ERROR_ROLLBACK

3- purely enum values:
  COMP_KEYWORD_CASE, HISTCONTROL, ECHO, VERBOSITY, SHOW_CONTEXT

4- numeric values:
  FETCH_COUNT

If you suggest that the patch should refactor the implementation
of hooks for case #2, only two hooks are concerned and they consist
of non-mergeable enum-specific code interleaved with generic code,
so I don't foresee any gain in fusioning. I have the same opinion about
merging any of #1, #2, #3, #4 together.
But feel free to post code implementing your idea if you disagree,
maybe I just don't figure out what you have in mind.
For case #3, these hooks clearly follow a common pattern, but I also
don't see any benefit in an opportunistic rewrite given the nature of
the functions.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index a9a2fdb..61b2304 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -254,7 +254,7 @@ exec_command(const char *cmd,
if (opt1 != NULL && strncmp(opt1, prefix, sizeof(prefix) - 1) 
== 0)
{
reuse_previous =
-   ParseVariableBool(opt1 + sizeof(prefix) - 1, 
prefix) ?
+   ParseVariableBool(opt1 + sizeof(prefix) - 1, 
prefix, NULL) ?
TRI_YES : TRI_NO;
 
free(opt1);
@@ -1548,7 +1548,7 @@ exec_command(const char *cmd,

 OT_NORMAL, NULL, false);
 
if (opt)
-   pset.timing = ParseVariableBool(opt, "\\timing");
+   pset.timing = ParseVariableBool(opt, "\\timing", NULL);
else
pset.timing = !pset.timing;
if (!pset.quiet)
@@ -2660,7 +2660,7 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
if (value && pg_strcasecmp(value, "auto") == 0)
popt->topt.expanded = 2;
else if (value)
-   popt->topt.expanded = ParseVariableBool(value, param);
+   popt->topt.expanded = ParseVariableBool(value, param, 
NULL);
else
popt->topt.expanded = !popt->topt.expanded;
}
@@ -2669,7 +2669,7 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
else if (strcmp(param, "numericlocale") ==

Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-15 Thread Amit Kapila

On Tue, Nov 15, 2016 at 7:14 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
>> Okay, not a problem.  However, I am not sure the results in this thread
>> are sufficient proof as for read-only tests, there is no noticeable win
>> by increasing shared buffers and read-write tests seems to be quite short
>> (60 seconds) to rely on it.
>
> I think the reason why increasing shared_buffers didn't give better 
> performance for read-only tests than you expect is that the relation files 
> are cached in the filesystem cache.  The purpose of this verification is to 
> know that the effective upper limit is not 512MB (which is too small now), 
> and I think the purpose is achieved.  There may be another threshold, say 
> 32GB or 128GB, over which the performance degrades due to PostgreSQL 
> implementation, but that's another topic which also applies to other OSes.
>

If we don't get any benefit by increasing the shared_buffers on
windows, then what advantage do you see in recommending higher value?

> How about 3 minutes for read-write tests?  How long do you typically run?
>

I generally run it for 20 to 30 mins for read-write tests.  Also, to
ensure consistent data, please consider changing following parameters
in postgresql.conf
checkpoint_timeout = 35 minutes or so, min_wal_size = 5GB or so,
max_wal_size = 20GB or so and checkpoint_completion_target=0.9.

Apart from above, ensure to run manual checkpoint (checkpoint command)
after each test.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2016-11-15 Thread Rushabh Lathia

Thanks Fujita-san for working on this. I've signed up to review this patch.

Your latest patch doesn't not get apply cleanly apply on master branch.

patching file contrib/postgres_fdw/deparse.c
6 out of 17 hunks FAILED -- saving rejects to file
contrib/postgres_fdw/deparse.c.rej
patching file contrib/postgres_fdw/expected/postgres_fdw.out
2 out of 2 hunks FAILED -- saving rejects to file
contrib/postgres_fdw/expected/postgres_fdw.out.rej
patching file contrib/postgres_fdw/postgres_fdw.c
2 out of 22 hunks FAILED -- saving rejects to file
contrib/postgres_fdw/postgres_fdw.c.rej
patching file contrib/postgres_fdw/postgres_fdw.h
1 out of 3 hunks FAILED -- saving rejects to file
contrib/postgres_fdw/postgres_fdw.h.rej

Please share the patch which get apply clean on master branch.



On Fri, Nov 11, 2016 at 5:00 PM, Etsuro Fujita 
wrote:

> On 2016/09/08 19:55, Etsuro Fujita wrote:
>
>> On 2016/09/07 13:21, Ashutosh Bapat wrote:
>>
>>> * with the patch:
>>> postgres=# explain verbose delete from ft1 using ft2 where ft1.a =
>>> ft2.a;
>>>  QUERY PLAN
>>>
>>> 
>>> -
>>>
>>>  Delete on public.ft1  (cost=100.00..102.04 rows=1 width=38)
>>>->  Foreign Delete  (cost=100.00..102.04 rows=1 width=38)
>>>  Remote SQL: DELETE FROM public.t1 r1 USING (SELECT ROW(a,
>>> b), a FROM public.t2) ss1(c1, c2) WHERE ((r1.a = ss1.c2))
>>> (3 rows)
>>>
>>
>> The underlying scan on t2 requires ROW(a,b) for locking the row for
>>> update/share. But clearly it's not required if the full query is being
>>> pushed down.
>>>
>>
> Is there a way we can detect that ROW(a,b) is useless
>>> column (not used anywhere in the other parts of the query like
>>> RETURNING, DELETE clause etc.) and eliminate it?
>>>
>>
> I don't have a clear solution for that yet, but I'll try to remove that
>> in the next version.
>>
>
> Similarly for a, it's
>>> part of the targetlist of the underlying scan so that the WHERE clause
>>> can be applied on it. But it's not needed if we are pushing down the
>>> query. If we eliminate the targetlist of the query, we could construct a
>>> remote query without having subquery in it, making it more readable.
>>>
>>
> Will try to do so also.
>>
>
> I addressed this by improving the deparse logic so that a remote query for
> performing an UPDATE/DELETE on a join directly on the remote can be created
> as proposed if possible.  Attached is an updated version of the patch,
> which is created on top of the patch set [1].  The patch is still WIP (ie,
> needs more comments and regression tests, at least), but any comments would
> be gratefully appreciated.
>
> Best regards,
> Etsuro Fujita
>
> [1] https://www.postgresql.org/message-id/11eafd10-d3f8-ac8a-b64
> 2-b0e65037c76b%40lab.ntt.co.jp
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Rushabh Lathia

Re: [HACKERS] Declarative partitioning - another take

2016-11-15 Thread Amit Langote

On 2016/11/11 20:49, Ashutosh Bapat wrote:
> I have not looked at the latest set of patches, but in the version
> that I have we create one composite type for every partition. This
> means that if there are thousand partitions, there will be thousand
> identical entries in pg_type. Since all the partitions share the same
> definition (by syntax), it doesn't make sense to add so many identical
> entries. Moreover, in set_append_rel_size(), while translating the
> expressions from parent to child, we add a ConvertRowtypeExpr instead
> of whole-row reference if reltype of the parent and child do not match
> (adjust_appendrel_attrs_mutator())
>if (appinfo->parent_reltype != appinfo->child_reltype)
> {
> ConvertRowtypeExpr *r = makeNode(ConvertRowtypeExpr);
> 
> I guess, we should set reltype of child to that of parent for
> declarative partitions.

Thanks for the suggestion.  I agree that partitions having the same
reltype as the parent will help a number of cases including the one you
mentioned, but I haven't yet considered how invasive such a change will be.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Snapshot too old logging

2016-11-15 Thread Magnus Hagander

Is there a reason why we don't log which relation triggered the snapshot
too old error when it happens?

Since we do have Relation available in TestForOldSnapshot_impl(), shouldn't
we be able to include it?

Having that in the log message would be very helpful, but is there a reason
why it wouldn't work?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] [PATCH] Generic type subscription

2016-11-15 Thread Aleksander Alekseev

Hello.

I took a look on the latest -v4 patch. I would like to note that this
patch breaks a backward compatibility. For instance sr_plan extension[1]
stop to compile with errors like:

```
serialize.c:38:2: error: unknown type name ‘ArrayRef’
  JsonbValue *ArrayRef_ser(const ArrayRef *node, JsonbParseState *state,
bool sub_object);
  ^
serialize.c:913:2: error: unknown type name ‘ArrayRef’
  JsonbValue *ArrayRef_ser(const ArrayRef *node, JsonbParseState *state,
bool sub_object)
  ^
In file included from sr_plan.h:4:0,
 from serialize.c:1:

...

```

Other extensions could be affected as well. I'm not saying that it's a
fatal drawback, but it's definitely something you should be aware of.

I personally strongly believe that we shouldn't break extensions between
major releases or at least make it trivial to properly rewrite them.
Unfortunately it's not a case currently.

[1] https://github.com/postgrespro/sr_plan

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

85 matches

Mail list logo