Re: [HACKERS] Measuring replay lag

2017-03-04 Thread Simon Riggs
On 1 March 2017 at 10:47, Thomas Munro  wrote:

>>> I added a fourth case 'overwhelm.png' which you might find
>>> interesting.  It's essentially like one 'burst' followed by a 100% ide
>>> primary.  The primary stops sending new WAL around 50 seconds in and
>>> then there is no autovacuum, nothing happening at all.  The standby
>>> start is still replaying its backlog of WAL, but is sending back
>>> replies only every 10 seconds (because no WAL arriving so no other
>>> reason to send replies except status message timeout, which could be
>>> lowered).  So we see some big steps, and then we finally see it
>>> flat-line around 60 seconds because there is still now new WAL so we
>>> keep showing the last measured lag.  If new WAL is flushed it will pop
>>> back to 0ish, but until then its last known measurement is ~14
>>> seconds, which I don't think is technically wrong.
>>
>> If I understand what you're saying, "14 secs" would not be seen as the
>> correct answer by our users when the delay is now zero.
>>
>> Solving that is where the keepalives need to come into play. If no new
>> WAL, send a keepalive and track the lag on that.
>
> Hmm.  Currently it works strictly with measurements of real WAL write,
> flush and apply times.  I rather like the simplicity of that
> definition of the lag numbers, and the fact that they move only as a
> result of genuine measured activity WAL.  A keepalive message is never
> written, flushed or applied, so if we had special cases here to show
> constant 0 or measure keepalive round-trip time when we hit the end of
> known WAL or something like that, the reported lag times for those
> three operations wouldn't be true.  In any real database cluster there
> is real WAL being generated all the time, so after a big backload is
> finally processed by a standby the "14 secs" won't linger for very
> long, and during the time when you see that, it really is the last
> true measured lag time.
>
> I do see why a new user trying this feature for the first time might
> expect it to show a lag of 0 just as soon as sent LSN =
> write/flush/apply LSN or something like that, but after some
> reflection I suspect that it isn't useful information, and it would be
> smoke and mirrors rather than real data.

Perhaps I am misunderstanding the way it works.

If the last time WAL was generated the lag was 14 secs, then nothing
occurs for 2 hours aftwards AND all changes have been successfully
applied then it should not continue to show 14 secs for the next 2
hours.

IMHO the lag time should drop to zero in a reasonable time and stay at
zero for those 2 hours because there is no current lag.

If we want to show historical lag data, I'm supportive of the idea,
but we must report an accurate current value when the system is busy
and when the system is quiet.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-03-04 Thread Robins Tharakan
On 5 March 2017 at 18:00, Simon Riggs  wrote:

> I'm looking to commit the patch version I posted, so I would like your
> comments that it does continue to solve the problems you raised.
>

​Thanks Simon, for confirming.

Yes, the updated patch does solve the problem.​

-
robins


Re: [HACKERS] Measuring replay lag

2017-03-04 Thread Simon Riggs
On 1 March 2017 at 10:47, Thomas Munro  wrote:
> On Fri, Feb 24, 2017 at 9:05 AM, Simon Riggs  wrote:
>> On 21 February 2017 at 21:38, Thomas Munro
>>  wrote:
>>> However, I think a call like LagTrackerWrite(SendRqstPtr,
>>> GetCurrentTimestamp()) needs to go into XLogSendLogical, to mirror
>>> what happens in XLogSendPhysical.  I'm not sure about that.
>>
>> Me neither, but I think we need this for both physical and logical.
>>
>> Same use cases graphs for both, I think. There might be issues with
>> the way LSNs work for logical.
>
> This seems to be problematic.  Logical peers report LSN changes for
> all three operations (write, flush, commit) only on commit.  I suppose
> that might work OK for synchronous replication, but it makes it a bit
> difficult to get lag measurements that don't look really strange and
> sawtoothy when you have long transactions, and overlapping
> transactions might interfere with the measurements in odd ways.  I
> wonder if the way LSNs are reported by logical rep would need to be
> changed first.  I need to study this some more and would be grateful
> for ideas from any of the logical rep people.

I have no doubt there are problems with the nature of logical
replication that affect this. Those things are not the problem of this
patch but that doesn't push everything away.

What we want from this patch is something that works for both, as much
as that is possible.

With that in mind, this patch should be able to provide sensible lag
measurements from a simple case like logical replication of a standard
pgbench run. If that highlights problems with this patch then we can
fix them here.

Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dropping partitioned tables without CASCADE

2017-03-04 Thread Simon Riggs
On 27 February 2017 at 02:38, Amit Langote
 wrote:
> On 2017/02/26 5:30, Simon Riggs wrote:
>> On 23 February 2017 at 16:33, Simon Riggs  wrote:
>>
>>>  I'll be happy to review
>>
>> Patch looks OK so far, but fails on a partition that has partitions,
>> probably because of the way we test relkind in the call to
>> StoreCatalogInheritance1().
>
> Thanks for the review.
>
> I could not reproduce the failure you are seeing; could you perhaps share
> the failing test case?

I used a slight modification of the case mentioned on the docs. I
confirm this fails repeatably for me on current HEAD.

CREATE TABLE cities (
 city_id  bigserial not null,
 name text not null,
 population   bigint
) PARTITION BY LIST (left(lower(name), 1));

CREATE TABLE cities_ab
PARTITION OF cities (
  CONSTRAINT city_id_nonzero CHECK (city_id != 0)
) FOR VALUES IN ('a', 'b')
PARTITION BY RANGE (population);

drop table cities;
ERROR:  cannot drop table cities because other objects depend on it
DETAIL:  table cities_ab depends on table cities
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

I notice also that
  \d+ 
does not show which partitions have subpartitions.

I'm worried that these things illustrate something about the catalog
representation that we may need to improve, but I don't have anything
concrete to say on that at present.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2017-03-04 Thread Masahiko Sawada
On Sun, Mar 5, 2017 at 12:14 PM, David Steele  wrote:
> On 3/4/17 9:08 PM, Masahiko Sawada wrote:
>> On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas  wrote:
>>> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada  
>>> wrote:
 Yes, it's taking a time to update logic and measurement but it's
 coming along. Also I'm working on changing deadlock detection. Will
 post new patch and measurement result.
>>>
>>> I think that we should push this patch out to v11.  I think there are
>>> too many issues here to address in the limited time we have remaining
>>> this cycle, and I believe that if we try to get them all solved in the
>>> next few weeks we're likely to end up getting backed into some choices
>>> by time pressure that we may later regret bitterly.  Since I created
>>> the deadlock issues that this patch is facing, I'm willing to try to
>>> help solve them, but I think it's going to require considerable and
>>> delicate surgery, and I don't think doing that under time pressure is
>>> a good idea.
>>>
>>> From a fairness point of view, a patch that's not in reviewable shape
>>> on March 1st should really be pushed out, and we're several days past
>>> that.
>>>
>>
>> Agreed. There are surely some rooms to discuss about the design yet,
>> and it will take long time. it's good to push this out to CF2017-07.
>> Thank you for the comment.
>
> I have marked this patch "Returned with Feedback."  Of course you are
> welcome to submit this patch to the 2017-07 CF, or whenever you feel it
> is ready.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-03-04 Thread Simon Riggs
On 28 February 2017 at 17:49, Simon Riggs  wrote:

> I've edited the stated reason for the patch on the CF app, so its
> clearer as to why this might be acceptable.

Robins,

I'm looking to commit the patch version I posted, so I would like your
comments that it does continue to solve the problems you raised.

Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2017-03-04 Thread David Steele
On 3/4/17 9:08 PM, Masahiko Sawada wrote:
> On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas  wrote:
>> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada  
>> wrote:
>>> Yes, it's taking a time to update logic and measurement but it's
>>> coming along. Also I'm working on changing deadlock detection. Will
>>> post new patch and measurement result.
>>
>> I think that we should push this patch out to v11.  I think there are
>> too many issues here to address in the limited time we have remaining
>> this cycle, and I believe that if we try to get them all solved in the
>> next few weeks we're likely to end up getting backed into some choices
>> by time pressure that we may later regret bitterly.  Since I created
>> the deadlock issues that this patch is facing, I'm willing to try to
>> help solve them, but I think it's going to require considerable and
>> delicate surgery, and I don't think doing that under time pressure is
>> a good idea.
>>
>> From a fairness point of view, a patch that's not in reviewable shape
>> on March 1st should really be pushed out, and we're several days past
>> that.
>>
> 
> Agreed. There are surely some rooms to discuss about the design yet,
> and it will take long time. it's good to push this out to CF2017-07.
> Thank you for the comment.

I have marked this patch "Returned with Feedback."  Of course you are
welcome to submit this patch to the 2017-07 CF, or whenever you feel it
is ready.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP SUBSCRIPTION and ROLLBACK

2017-03-04 Thread Masahiko Sawada
On Sat, Mar 4, 2017 at 1:46 PM, Peter Eisentraut
 wrote:
> On 3/3/17 13:58, Petr Jelinek wrote:
>> On 23/02/17 08:24, Masahiko Sawada wrote:
>>> Attached updated version patches. Please review these.
>>>
>>
>> This version looks good to me, I'd only change the
>>
>>> +PreventTransactionChain(isTopLevel, "CREATE SUBSCRIPTION 
>>> CREATE SLOT");
>>
>> to "CREATE SUBSCRIPTION ... CREATE SLOT" as that's afaik how we do it
>> for other commands (and same with DROP).
>
> I have committed fixes for these issues.

Thanks!

>
> I didn't like the syntax change in DROP SUBSCRIPTION, so I have just
> fixed the parsing of the existing syntax.  We can discuss syntax changes
> separately.

Understood.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2017-03-04 Thread Masahiko Sawada
On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas  wrote:
> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada  wrote:
>> Yes, it's taking a time to update logic and measurement but it's
>> coming along. Also I'm working on changing deadlock detection. Will
>> post new patch and measurement result.
>
> I think that we should push this patch out to v11.  I think there are
> too many issues here to address in the limited time we have remaining
> this cycle, and I believe that if we try to get them all solved in the
> next few weeks we're likely to end up getting backed into some choices
> by time pressure that we may later regret bitterly.  Since I created
> the deadlock issues that this patch is facing, I'm willing to try to
> help solve them, but I think it's going to require considerable and
> delicate surgery, and I don't think doing that under time pressure is
> a good idea.
>
> From a fairness point of view, a patch that's not in reviewable shape
> on March 1st should really be pushed out, and we're several days past
> that.
>

Agreed. There are surely some rooms to discuss about the design yet,
and it will take long time. it's good to push this out to CF2017-07.
Thank you for the comment.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mat views stats

2017-03-04 Thread Jim Mlodgenski
On Wed, Mar 1, 2017 at 8:39 PM, Michael Paquier 
wrote:

> On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski  wrote:
> >
> >
> > On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas 
> wrote:
> >>
> >> On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby 
> >> wrote:
> >> > Certainly easier, but I don't think it'd be better. Matviews really
> >> > aren't
> >> > the same thing as tables. Off-hand (without reviewing the patch),
> update
> >> > and
> >> > delete counts certainly wouldn't make any sense. "Insert" counts
> might,
> >> > in
> >> > as much as it's how many rows have been added by refreshes. You'd
> want a
> >> > refresh count too.
> >>
> >> Regular REFRESH truncates the view and repopulates it, but REFRESH
> >> CONCURRENTLY does inserts, updates, and deletes as needed to adjust
> >> the contrs that make sense for
> >> regular tables are also sensible here.
> >>
> >
> > After digging into things further, just making refresh report the stats
> for
> > what is it basically doing simplifies and solves it and it is something
> we
> > can back patch if that the consensus. See the attached patch.
>
> This is unhappy:
> $ git diff master --check
> src/backend/commands/matview.c:155: indent with spaces.
> +uint64  processed = 0;
>
> +/*
> + * Send the stats to mimic what we are essentially doing.
> + * A truncate and insert
> + */
> This sentence is unfinished.
>
> There is also no need to report the number of inserts if WITH NO DATA is
> used.
>


Here is the cleaned up patch
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index a18c917..94a69dd 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -30,6 +30,7 @@
 #include "executor/spi.h"
 #include "miscadmin.h"
 #include "parser/parse_relation.h"
+#include "pgstat.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/lmgr.h"
 #include "storage/smgr.h"
@@ -59,7 +60,7 @@ static void transientrel_startup(DestReceiver *self, int operation, TupleDesc ty
 static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self);
 static void transientrel_shutdown(DestReceiver *self);
 static void transientrel_destroy(DestReceiver *self);
-static void refresh_matview_datafill(DestReceiver *dest, Query *query,
+static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query,
 		 const char *queryString);
 
 static char *make_temptable_name_n(char *tempname, int n);
@@ -151,6 +152,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	uint64  processed = 0;
 	ObjectAddress address;
 
 	/* Determine strength of lock needed. */
@@ -322,7 +324,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 
 	/* Generate the data, if wanted. */
 	if (!stmt->skipData)
-		refresh_matview_datafill(dest, dataQuery, queryString);
+		processed = refresh_matview_datafill(dest, dataQuery, queryString);
 
 	heap_close(matviewRel, NoLock);
 
@@ -345,8 +347,18 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 		Assert(matview_maintenance_depth == old_depth);
 	}
 	else
+	{
 		refresh_by_heap_swap(matviewOid, OIDNewHeap, relpersistence);
 
+		/*
+		 * Send the stats to mimic what we are essentially doing. Swapping the heap
+		 * is equilivant to truncating the relation and inserting the new data.
+		 */
+		pgstat_count_truncate(matviewRel);
+		if (!stmt->skipData)
+			pgstat_count_heap_insert(matviewRel, processed);
+	}
+
 	/* Roll back any GUC changes */
 	AtEOXact_GUC(false, save_nestlevel);
 
@@ -361,7 +373,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 /*
  * refresh_matview_datafill
  */
-static void
+static uint64
 refresh_matview_datafill(DestReceiver *dest, Query *query,
 		 const char *queryString)
 {
@@ -369,6 +381,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	PlannedStmt *plan;
 	QueryDesc  *queryDesc;
 	Query	   *copied_query;
+	uint64 processed;
 
 	/* Lock and rewrite, using a copy to preserve the original query. */
 	copied_query = copyObject(query);
@@ -406,6 +419,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	/* run the plan */
 	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
 
+	processed = queryDesc->estate->es_processed;
+
 	/* and clean up */
 	ExecutorFinish(queryDesc);
 	ExecutorEnd(queryDesc);
@@ -413,6 +428,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	FreeQueryDesc(queryDesc);
 
 	PopActiveSnapshot();
+
+	return processed;
 }
 
 DestReceiver *

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 6:00 AM, Stephen Frost  wrote:
>> It is, but I was using that with index size, not table size. I can
>> change it to be table size, based on what you said. But the workMem
>> related cap, which probably won't end up being applied all that often
>> in practice, *should* still do something with projected index size,
>> since that really is what we're sorting, which could be very different
>> (e.g. with partial indexes).
>
> Isn't that always going to be very different, unless you're creating a
> single index across every column in the table..?  Or perhaps I've
> misunderstood what you're comparing as being 'very different' in your
> last sentence.

I mean: though a primary key index or similar is smaller than the
table by maybe 5X, they are still generally within an order of
magnitude. Given that the number of workers is determined at
logarithmic intervals, it may not actually matter that much whether
the scaling is based on heap size (input size) or index size (output
size), at a very high level. Despite a 5X difference. I'm referring to
the initial determination of the number of workers to be used, based
on the scan the parallel CREATE INDEX has to do. So, I'm happy to go
along with Robert's suggestion for V9, and have this number determined
based on heap input size rather than index output size. It's good to
be consistent with what we do for parallel seq scan (care about input
size), and it probably won't change things by much anyway. This is
generally the number that the cost model will end up going with, in
practice.

However, we then need to consider that since maintenance_work_mem is
doled out as maintenance_work_mem/nworkers slices for parallel CREATE
INDEX, there is a sensitivity to how much memory is left per worker as
workers are added. This clear needs to be based on projected/estimated
index size (output size), since that is what is being sorted, and
because partial indexes imply that the size of the index could be
*vastly* less than heap input size with still-sensible use of the
feature. This will be applied as a cap on the first number.

So, I agree with Robert that we should actually use heap size for the
main, initial determination of # of workers to use, but we still need
to estimate the size of the final index [1], to let the cost model cap
the initial determination when maintenance_work_mem is just too low.
(This cap will rarely be applied in practice, as I said.)

[1] 
https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: check failure with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY

2017-03-04 Thread Tom Lane
I wrote:
> Andrew Dunstan  writes:
>> On 03/03/2017 02:24 PM, Andrew Dunstan wrote:
>>> I have been setting up a buildfarm member with -DRELCACHE_FORCE_RELEASE
>>> -DCLOBBER_FREED_MEMORY, settings which Alvaro suggested to me.I got core
>>> dumps with these stack traces. The platform is Amazon Linux.

>> I have replicated this on a couple of other platforms (Fedora, FreeBSD)
>> and back to 9.5. The same failure doesn't happen with buildfarm runs on
>> earlier branches, although possibly they don't have the same set of tests.

> well, the problem in rebuild_relation() seems pretty blatant:

I fixed that, and the basic regression tests no longer crash outright with
these settings, but I do see half a dozen errors that all seem to be in
RLS-related tests.  They all look like something is trying to access an
already-closed relcache entry, much like the problem in
rebuild_relation().  But I have no time to look closer for the next
several days.  Stephen, I think this is your turf anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: check failure with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY

2017-03-04 Thread Tom Lane
Andrew Dunstan  writes:
> On 03/03/2017 02:24 PM, Andrew Dunstan wrote:
>> I have been setting up a buildfarm member with -DRELCACHE_FORCE_RELEASE
>> -DCLOBBER_FREED_MEMORY, settings which Alvaro suggested to me.I got core
>> dumps with these stack traces. The platform is Amazon Linux.

> I have replicated this on a couple of other platforms (Fedora, FreeBSD)
> and back to 9.5. The same failure doesn't happen with buildfarm runs on
> earlier branches, although possibly they don't have the same set of tests.

well, the problem in rebuild_relation() seems pretty blatant:

/* Close relcache entry, but keep lock until transaction commit */
heap_close(OldHeap, NoLock);

/* Create the transient table that will receive the re-ordered data */
OIDNewHeap = make_new_heap(tableOid, tableSpace,
   OldHeap->rd_rel->relpersistence,
   ^^^
   AccessExclusiveLock);

There are two such references after the heap_close.  I don't know that
those are the only bugs, but this reference is certainly the proximate
cause of the crash I'm seeing.

Will push a fix in a little bit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-03-04 Thread Andres Freund
Hi,

On 2017-03-04 11:02:14 -0500, Tom Lane wrote:
> But speaking of ambiguity: isn't it possible for $n symbols to appear in
> pg_stat_statements already?

Indeed.

> I think it is, both from extended-protocol
> client queries and from SPI commands, which would mean that the proposal
> as it stands is not fixing the ambiguity problem at all.  So yes, we need
> another idea.

I think using $ to signal a parameter is still a good idea, as people
kinda have to know that one anyway.  Maybe one of "$:n" "$-n"?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 8:02 AM, Tom Lane  wrote:
>> Perhaps there could be a choice of behaviors.  Even if we all agreed
>> that parameter notation was better in theory, there's something to be
>> said for maintaining backward compatibility, or having an option to do
>> so.
>
> Meh ... we've generally regretted it when we "solved" a backwards
> compatibility problem by introducing a GUC that changes query semantics.
> I'm inclined to think we should either do it or not.

In my opinion, we expose query id (and dbid, and userid) as the
canonical identifier for each pg_stat_statements entry, and have done
so for some time. That's the stable API -- not query text. I'm aware
of cases where query text was used as an identifier, but that ended up
being hashed anyway.

Query text is just for human consumption. I'd be in favor of a change
that makes it easier to copy and paste a query, to run EXPLAIN and so
on. Lukas probably realizes that there are no guarantees that the
query text that appears in pg_stat_statements will even appear as
normalized in all cases. The "sticky entry" stuff is intended to
maximize the chances of that happening, but it's still generally quite
possible (e.g. pg_stat_statements never swaps constants in a query
like "SELECT 5, pg_stat_statements_reset()"). This means that we
cannot really say that this buys us a machine-readable query text
format, at least not without adding some fairly messy caveats.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-04 Thread Peter Eisentraut
On 3/4/17 12:39, Tomas Vondra wrote:
>> Can we have a test case for page_checksum(), or is that too difficult to
>> get running deterministicly?
> 
> I'm not sure it can be made deterministic. Certainly not on heap pages, 
> because then it'd be susceptible to xmin/xmax changes, but we have other 
> bits that change even on index pages (say pd_lsn).
> 
> So I'm afraid that's not going to fly.

Then just run it and throw away the result.  See sql/page.sql for some
examples.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change in "policy" on dump ordering?

2017-03-04 Thread Peter Eisentraut
On 3/1/17 08:36, Peter Eisentraut wrote:
> On 2/22/17 18:24, Jim Nasby wrote:
>>> Yes, by that logic matview refresh should always be last.
>>
>> Patches for head attached.
>>
>> RLS was the first item added after DO_REFRESH_MATVIEW, which was added 
>> in 9.5. So if we want to treat this as a bug, they'd need to be patched 
>> as well, which is a simple matter of swapping 33 and 34.
> 
> I wonder whether we should emphasize this change by assigning
> DO_REFRESH_MATVIEW a higher number, like 100?

Since there wasn't any interest in that idea, I have committed Jim's
patch as is.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-04 Thread Tomas Vondra

On 03/04/2017 02:08 PM, Peter Eisentraut wrote:

On 3/3/17 09:03, Tomas Vondra wrote:

Damn. In my defense, the patch was originally created for an older
PostgreSQL version (to investigate issue on a production system), which
used that approach to building values. Should have notice it, though.

Attached is v2, fixing both issues.


Can we have a test case for page_checksum(), or is that too difficult to
get running deterministicly?



I'm not sure it can be made deterministic. Certainly not on heap pages, 
because then it'd be susceptible to xmin/xmax changes, but we have other 
bits that change even on index pages (say pd_lsn).


So I'm afraid that's not going to fly.

>

Also, perhaps page_checksum() doesn't need to be superuser-only, but
I can see arguments for keeping it that way for consistency.



Yes, I'll change that.

It won't have much impact in practice, because all functions providing 
the page data (get_raw_page etc.) do require superuser. But you can 
still input the page as a bytea directly, and it's good practice not to 
add unnecessary superuser checks.


regard

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-03-04 Thread Andres Freund


On March 4, 2017 1:16:56 AM PST, Robert Haas  wrote:
>Maybe.  But it looks to me like this patch is going to have
>considerably more than its share of user-visible warts, and I'm not
>very excited about that.  I feel like what we ought to be doing is
>keeping the index OID the same and changing the relfilenode to point
>to a newly-created one, and I attribute our failure to make that
>design work thus far to insufficiently aggressive hacking.

We literally spent years and a lot of emails waiting for that to happen. Users 
now hack up solutions like this in userspace, obviously a bad solution.

I agree that'd it be nicer not to have this, but not having the feature at all 
is a lot worse than this wart.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel seq. plan is not coming against inheritance or partition table

2017-03-04 Thread Ashutosh Sharma
Hi All,

>From following git commit onwards, parallel seq scan is never getting
selected for inheritance or partitioned tables.


commit 51ee6f3160d2e1515ed6197594bda67eb99dc2cc
Author: Robert Haas 
Date:   Wed Feb 15 13:37:24 2017 -0500

Replace min_parallel_relation_size with two new GUCs.



Steps to reproduce:
==
create table t1 (a integer);

create table t1_1 (check (a >=1  and a < 100)) inherits (t1);
create table t1_2 (check (a >= 100 and a < 200)) inherits (t1);

insert into t1_1 select generate_series(1, 90);
insert into t1_2 select generate_series(100, 190);

analyze t1;
analyze t1_1;
analyze t1_2;

explain analyze select * from t1 where a < 5 OR a > 195;

EXPLAIN ANALYZE output:

1) Prior to "Replace min_parallel_relation_size with two new GUCs" commit,

postgres=# explain analyze select * from t1 where a < 5 OR a > 195;
  QUERY PLAN
---
 Gather  (cost=1000.00..25094.71 rows=48787 width=4) (actual
time=0.431..184.264 rows=4 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Append  (cost=0.00..19216.01 rows=20328 width=4) (actual
time=0.025..167.465 rows=1 loops=3)
 ->  Parallel Seq Scan on t1  (cost=0.00..0.00 rows=1 width=4)
(actual time=0.001..0.001 rows=0 loops=3)
   Filter: ((a < 5) OR (a > 195))
 ->  Parallel Seq Scan on t1_1  (cost=0.00..9608.00 rows=20252
width=4) (actual time=0.023..76.644 rows=1 loops=3)
   Filter: ((a < 5) OR (a > 195))
   Rows Removed by Filter: 283334
 ->  Parallel Seq Scan on t1_2  (cost=0.00..9608.01 rows=75
width=4) (actual time=89.505..89.505 rows=0 loops=3)
   Filter: ((a < 5) OR (a > 195))
   Rows Removed by Filter: 30
 Planning time: 0.343 ms
 Execution time: 188.624 ms
(14 rows)

2) From "Replace min_parallel_relation_size with two new GUCs" commit onwards,

postgres=# explain analyze select * from t1 where a < 5 OR a > 195;
QUERY PLAN
--
 Append  (cost=0.00..34966.01 rows=50546 width=4) (actual
time=0.021..375.747 rows=4 loops=1)
   ->  Seq Scan on t1  (cost=0.00..0.00 rows=1 width=4) (actual
time=0.004..0.004 rows=0 loops=1)
 Filter: ((a < 5) OR (a > 195))
   ->  Seq Scan on t1_1  (cost=0.00..17483.00 rows=50365 width=4)
(actual time=0.016..198.393 rows=4 loops=1)
 Filter: ((a < 5) OR (a > 195))
 Rows Removed by Filter: 850001
   ->  Seq Scan on t1_2  (cost=0.00..17483.01 rows=180 width=4)
(actual time=173.310..173.310 rows=0 loops=1)
 Filter: ((a < 5) OR (a > 195))
 Rows Removed by Filter: 91
 Planning time: 0.812 ms
 Execution time: 377.831 ms
(11 rows)

RCA:

>From "Replace min_parallel_relation_size with two new GUCs" commit
onwards, we are not assigning parallel workers for the child rel with
zero heap pages. This means we won't be able to create a partial
append path as this requires all the child rels within an Append Node
to have a partial path. Please check the following code snippet from
set_append_rel_pathlist().

/* Same idea, but for a partial plan. */
if (childrel->partial_pathlist != NIL)
partial_subpaths = accumulate_append_subpath(partial_subpaths,
   linitial(childrel->partial_pathlist));
else
partial_subpaths_valid = false;

.
.

/*
 * Consider an append of partial unordered, unparameterized partial paths.
 */
if (partial_subpaths_valid)
{
...
...

/* Generate a partial append path. */
appendpath = create_append_path(rel, partial_subpaths, NULL,
parallel_workers);
add_partial_path(rel, (Path *) appendpath);
}


In short, no Gather path would be generated if baserel having an
Append Node contains any child rel without partial path. This means
just because one childRel having zero heap pages didn't get parallel
workers other childRels that was good enough to go for Parallel Seq
Scan had to go for normal seq scan which could be costlier.

Fix:

Attached is the patch that fixes this issue.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


assign_par_workers_for_empty_childRel_v1.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017

2017-03-04 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 2, 2017 at 3:45 AM, Jim Nasby  wrote:
>> On 2/27/17 4:52 PM, Thomas Munro wrote:
>>> By the way, that page claims that PostgreSQL runs on Irix and Tru64,
>>> which hasn't been true for a few years.

>> There could be a GSoC project to add support for those back in... ;P

> ...  Whether it's also
> useful to try to support running the system on unobtainable operating
> systems is less clear to me.

I seriously doubt that we'd take patches to run on non-mainstream OSes
without a concomitant promise to support buildfarm animals running such
OSes for the foreseeable future.  Without that we don't know if the
patches still work even a week after they're committed.  We killed the
above-mentioned OSes mainly for lack of any such animals, IIRC.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-03-04 Thread Tom Lane
Robert Haas  writes:
> On Wed, Mar 1, 2017 at 8:21 PM, Peter Eisentraut
>  wrote:
>> On 2/28/17 20:01, Lukas Fittl wrote:
>>> I'd like to propose changing the replacement character from ? to instead
>>> be a parameter (like $1).

>> Hmm, I think this could confuse people into thinking that the queries
>> displayed were in fact prepared queries.

> Perhaps there could be a choice of behaviors.  Even if we all agreed
> that parameter notation was better in theory, there's something to be
> said for maintaining backward compatibility, or having an option to do
> so.

Meh ... we've generally regretted it when we "solved" a backwards
compatibility problem by introducing a GUC that changes query semantics.
I'm inclined to think we should either do it or not.

My own vote would probably be for "not", because I haven't seen a case
made why it's important to be able to automatically distinguish a
constant-substitution marker from a "?" operator.

On the other hand, it seems like arguing for backwards compatibility here
is a bit contradictory, because that would only matter if you think there
*are* people trying to automatically parse the output of
pg_stat_statements in that much detail.  And if there are, they would
likely appreciate it becoming less ambiguous.

But speaking of ambiguity: isn't it possible for $n symbols to appear in
pg_stat_statements already?  I think it is, both from extended-protocol
client queries and from SPI commands, which would mean that the proposal
as it stands is not fixing the ambiguity problem at all.  So yes, we need
another idea.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-03-04 Thread Tom Lane
"Karl O. Pinc"  writes:
> On Sat, 4 Mar 2017 14:26:54 +0530
> Robert Haas  wrote:
>> On Fri, Mar 3, 2017 at 7:21 PM, Karl O. Pinc  wrote:
>>> But if the code does not exit the loop then
>>> before calling elog(ERROR), shouldn't it
>>> also call FreeFile(fd)?  

>> Hmm.  Normally error recovery cleans up opened files, but since
>> logfile_open() directly calls fopen(), that's not going to work here.
>> So that does seem to be a problem.

> Should something different be done to open the file or is it
> enough to call FreeFile(fd) to clean up properly?

I would say that in at least 99.44% of cases, if you think you need to
do some cleanup action immediately before an elog(ERROR), that means
You're Doing It Wrong.  That can only be a safe coding pattern in a
segment of code in which no called function does, *or ever will*, throw
elog(ERROR) itself.  Straight-line code with no reason ever to call
anything else might qualify, but otherwise you're taking too much risk
of current or future breakage.  You need some mechanism that will ensure
cleanup after any elog call anywhere, and the backend environment offers
lots of such mechanisms.

Without having actually looked at this patch, I would say that if it added
a direct call of fopen() to backend-side code, that was already the wrong
thing.  Almost always, AllocateFile() would be a better choice, not only
because it's tied into transaction abort, but also because it knows how to
release virtual FDs in event of ENFILE/EMFILE errors.  If there is some
convincing reason why you shouldn't use AllocateFile(), then a safe
cleanup pattern would be to have the fclose() in a PG_CATCH stanza.

(FWIW, I don't particularly agree with Michael's objection to "break"
after elog(ERROR).  Our traditional coding style is to write such things
anyway, so as to avoid drawing complaints from compilers that don't know
that elog(ERROR) doesn't return.  You could argue that that's an obsolete
reason, but I don't think I want to see it done some places and not
others.  Consistency of coding style is a good thing.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-03-04 Thread Karl O. Pinc
On Sat, 4 Mar 2017 14:26:54 +0530
Robert Haas  wrote:

> On Fri, Mar 3, 2017 at 7:21 PM, Karl O. Pinc  wrote:
> > But if the code does not exit the loop then
> > before calling elog(ERROR), shouldn't it
> > also call FreeFile(fd)?  
> 
> Hmm.  Normally error recovery cleans up opened files, but since
> logfile_open() directly calls fopen(), that's not going to work here.
> So that does seem to be a problem.

Should something different be done to open the file or is it
enough to call FreeFile(fd) to clean up properly?

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-04 Thread Amit Kapila
On Mon, Feb 20, 2017 at 4:04 PM, Rushabh Lathia
 wrote:
>
> My colleague Rahila reported compilation issue with
> the patch. Issue was only coming with we do the clean
> build on the branch.
>
> Fixed the same into latest version of patch.
>

Few assorted comments:

1.
+
+ WriteBuffile
+ Waiting during
buffile read operation.
+

Operation name and definition are not matching.


2.
+FilePrefetch(File file, off_t offset, int amount, uint32 wait_event_info)
 {
 #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
  int returnCode;
@@ -1527,8 +1527,10 @@ FilePrefetch(File file, off_t offset, int amount)
  if (returnCode < 0)
  return returnCode;

+ pgstat_report_wait_start(wait_event_info);
  returnCode = posix_fadvise(VfdCache[file].fd, offset, amount,
POSIX_FADV_WILLNEED);
+ pgstat_report_wait_end();

AFAIK, this call is non-blocking and will just initiate a read, so do
you think we should record wait event for such a call.

3.
- written = FileWrite(src->vfd, waldata_start, len);
+ written = FileWrite(src->vfd, waldata_start, len,
+ WAIT_EVENT_WRITE_REWRITE_DATA_BLOCK);
  if (written != len)
  ereport(ERROR,
  (errcode_for_file_access(),
@@ -957,7 +960,7 @@ logical_end_heap_rewrite(RewriteState state)
  hash_seq_init(_status, state->rs_logical_mappings);
  while ((src = (RewriteMappingFile *) hash_seq_search(_status)) != NULL)
  {
- if (FileSync(src->vfd) != 0)
+ if (FileSync(src->vfd, WAIT_EVENT_SYNC_REWRITE_DATA_BLOCK) != 0)


Do we want to consider recording wait event for both write and sync?
It seems to me OS level writes are relatively cheap and sync calls are
expensive, so shouldn't we just log for sync calls?  I could see the
similar usage at multiple places in the patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: check failure with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY

2017-03-04 Thread Andrew Dunstan


On 03/03/2017 02:24 PM, Andrew Dunstan wrote:
> I have been setting up a buildfarm member with -DRELCACHE_FORCE_RELEASE
> -DCLOBBER_FREED_MEMORY, settings which Alvaro suggested to me.I got core
> dumps with these stack traces. The platform is Amazon Linux.
>


I have replicated this on a couple of other platforms (Fedora, FreeBSD)
and back to 9.5. The same failure doesn't happen with buildfarm runs on
earlier branches, although possibly they don't have the same set of tests.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-03-04 Thread David Steele
Hi Robert,

On 3/4/17 1:58 AM, Robert Haas wrote:
> On Wed, Mar 1, 2017 at 9:07 AM, David Steele  wrote:
>> On 2/28/17 10:22 PM, Robert Haas wrote:
>>> On Tue, Feb 28, 2017 at 6:22 AM, David Steele  wrote:
>> I'm not sure that's the case.  It seems like it should lock just as
>> multiple backends would now.  One process would succeed and the others
>> would error.  Maybe I'm missing something?
>
> Hm, any errors happening in the workers would be reported to the
> leader, meaning that even if one worker succeeded to run
> pg_start_backup() it would be reported as an error at the end to the
> client, no? By marking the exclusive function restricted we get sure
> that it is just the leader that fails or succeeds.

 Good point, and it strengthens the argument beyond, "it just seems right."
>>>
>>> I think the argument should be based on whether or not the function
>>> depends on backend-private state that will not be synchronized.
>>> That's the definition of what makes something parallel-restricted or
>>> not.
>>
>> Absolutely.  Yesterday was a long day so I may have (perhaps) become a
>> bit flippant.
>>
>>> It looks like pg_start_backup() and pg_stop_backup() depend on the
>>> backend-private global variable nonexclusive_backup_running, so they
>>> should be parallel-restricted.
>>
>> Agreed.
> 
> How about a separately-committable patch that just does that, and then
> a main patch that applies on top of it?

Yes, that makes sense.  Attached are two patches as requested:

01 - Just marks pg_stop_backup() variants as parallel restricted
02 - Add the wait_for_archive param to pg_stop_backup().

These apply cleanly on 272adf4.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 438378d..b17a236 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /* mmddN */
-#define CATALOG_VERSION_NO 201703032
+#define CATALOG_VERSION_NO 201703041
 
 #endif
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 0c8b5c6..ec4aedb 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3141,9 +3141,9 @@ DATA(insert OID = 2096 ( pg_terminate_backend 
PGNSP PGUID 12 1 0 0 0 f f f f t
 DESCR("terminate a server process");
 DATA(insert OID = 2172 ( pg_start_backup   PGNSP PGUID 12 1 0 0 0 
f f f f t f v r 3 0 3220 "25 16 16" _null_ _null_ _null_ _null_ _null_ 
pg_start_backup _null_ _null_ _null_ ));
 DESCR("prepare for taking an online backup");
-DATA(insert OID = 2173 ( pg_stop_backupPGNSP PGUID 12 
1 0 0 0 f f f f t f v s 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ 
pg_stop_backup _null_ _null_ _null_ ));
+DATA(insert OID = 2173 ( pg_stop_backupPGNSP PGUID 12 
1 0 0 0 f f f f t f v r 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ 
pg_stop_backup _null_ _null_ _null_ ));
 DESCR("finish taking an online backup");
-DATA(insert OID = 2739 ( pg_stop_backupPGNSP PGUID 12 
1 1 0 0 f f f f t t v s 1 0 2249 "16" "{16,3220,25,25}" "{i,o,o,o}" 
"{exclusive,lsn,labelfile,spcmapfile}" _null_ _null_ pg_stop_backup_v2 _null_ 
_null_ _null_ ));
+DATA(insert OID = 2739 ( pg_stop_backupPGNSP PGUID 12 
1 1 0 0 f f f f t t v r 1 0 2249 "16" "{16,3220,25,25}" "{i,o,o,o}" 
"{exclusive,lsn,labelfile,spcmapfile}" _null_ _null_ pg_stop_backup_v2 _null_ 
_null_ _null_ ));
 DESCR("finish taking an online backup");
 DATA(insert OID = 3813 ( pg_is_in_backup   PGNSP PGUID 12 1 0 0 0 
f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_is_in_backup 
_null_ _null_ _null_ ));
 DESCR("true if server is in online backup");
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9e084ad..ba6f030 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18098,7 +18098,7 @@ SELECT set_config('log_statement_stats', 'off', false);
   
   

-pg_stop_backup(exclusive 
boolean)
+pg_stop_backup(exclusive 
boolean , wait_for_archive boolean 
)
 
setof record
Finish performing exclusive or non-exclusive on-line backup 
(restricted to superusers by default, but other users can be granted EXECUTE to 
run the function)
@@ -18182,7 +18182,13 @@ postgres=# select pg_start_backup('label_goes_here');
 pg_start_backup. In a non-exclusive backup, the contents of
 the backup_label and tablespace_map are returned
 in the result of the function, and should be written to files in the
-backup (and not in the data directory).
+backup (and not in the data directory).  There is an optional second
+parameter of type boolean.  If false, the pg_stop_backup
+will return immediately after the backup is completed 

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Stephen Frost
Peter,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Sat, Mar 4, 2017 at 12:50 AM, Robert Haas  wrote:
> > If the result of
> > compute_parallel_workers() based on min_parallel_table_scan_size is
> > smaller, then use that value instead.  I must be confused, because I
> > actually though that was the exact algorithm you were describing, and
> > it sounded good to me.
> 
> It is, but I was using that with index size, not table size. I can
> change it to be table size, based on what you said. But the workMem
> related cap, which probably won't end up being applied all that often
> in practice, *should* still do something with projected index size,
> since that really is what we're sorting, which could be very different
> (e.g. with partial indexes).

Isn't that always going to be very different, unless you're creating a
single index across every column in the table..?  Or perhaps I've
misunderstood what you're comparing as being 'very different' in your
last sentence.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans

2017-03-04 Thread Julian Markwort
Alright, for the next version of this patch I'll look into standard 
deviation (an implementation of Welfords' algorithm already exists in 
pg_stat_statements).


On 3/4/17 14:18, Peter Eisentraut wrote:


The other problem is that this measures execution time, which can vary
for reasons other than plan.  I would have expected that the cost
numbers are tracked somehow.
I've already thought of tracking specific parts of the explanation, like 
the cost numbers, instead of the whole string, I'll think of something, 
but if anybody has any bright ideas in the meantime, I'd gladly listen 
to them.



There is also the issue of generic vs specific plans, which this
approach might be papering over.
Would you be so kind and elaborate a little bit on this? I'm not sure if 
I understand this correctly. This patch only tracks specific plans, yes. 
The inital idea was that there might be some edge-cases that are not 
apparent when looking at generalized plans or queries.


kind regards
Julian


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans

2017-03-04 Thread David Steele
Hi Julian,

On 3/4/17 8:41 AM, Julian Markwort wrote:
>> Any improvements would be greatly welcome by the admin
>> community, I'm sure.
> That's good to hear - on the other hand, I really appreciate the opinion
> of admins on this idea!
>> However, this is a rather large change which could be destabilizing to
>> the many users of this extension.
> I'm fully aware of that, which is why there are already several switches
> in place so you can keep using the existing functionality without
> compromises or added complexity.
> At the same time, I'm always open to suggestions regarding the reduction
> of complexity and probably more importantly the reduction of disk IO.
>> I recommend moving this patch to the 2017-07 CF.
> Since I do not have very much time for this at the moment I'll be
> looking forward to discussions on this patch in the next commitfest!

Since some concerns were raised about the implementation, I have instead
marked this "Returned with Feedback".  I encourage you to continue
developing the patch with the community and resubmit into the
appropriate CF when it is ready.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-04 Thread Peter Eisentraut
On 3/3/17 09:03, Tomas Vondra wrote:
> Damn. In my defense, the patch was originally created for an older 
> PostgreSQL version (to investigate issue on a production system), which 
> used that approach to building values. Should have notice it, though.
> 
> Attached is v2, fixing both issues.

Can we have a test case for page_checksum(), or is that too difficult to
get running deterministicly?

Also, perhaps page_checksum() doesn't need to be superuser-only, but I
can see arguments for keeping it that way for consistency.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans

2017-03-04 Thread Julian Markwort

Any improvements would be greatly welcome by the admin
community, I'm sure.
That's good to hear - on the other hand, I really appreciate the opinion 
of admins on this idea!

However, this is a rather large change which could be destabilizing to
the many users of this extension.
I'm fully aware of that, which is why there are already several switches 
in place so you can keep using the existing functionality without 
compromises or added complexity.
At the same time, I'm always open to suggestions regarding the reduction 
of complexity and probably more importantly the reduction of disk IO.

I recommend moving this patch to the 2017-07 CF.
Since I do not have very much time for this at the moment I'll be 
looking forward to discussions on this patch in the next commitfest!


kind regards
Julian


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

2017-03-04 Thread David Steele
On 3/4/17 8:33 AM, Peter Eisentraut wrote:
> On 3/3/17 16:16, David Steele wrote:
>> While this looks like it could be a really significant performance
>> improvement, I think the above demonstrates that it needs a lot of work.
>>  I know this is not new to the 2017-03 CF but it doesn't seem enough
>> progress has been made since posting to allow it to be committed in time
>> for v10.
>>
>> I recommend moving this patch to the 2017-07 CF.
> 
> I think the patch that was in 2017-01 was given some feedback that put
> the fundamental approach in question, which the author appeared to agree
> with.  So I don't know why this patch appeared in this CF at all.

Then it sounds like it should be marked RWF.  Haribabu can resubmit when
there's a new candidate patch.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

2017-03-04 Thread Peter Eisentraut
On 3/3/17 16:16, David Steele wrote:
> While this looks like it could be a really significant performance
> improvement, I think the above demonstrates that it needs a lot of work.
>  I know this is not new to the 2017-03 CF but it doesn't seem enough
> progress has been made since posting to allow it to be committed in time
> for v10.
> 
> I recommend moving this patch to the 2017-07 CF.

I think the patch that was in 2017-01 was given some feedback that put
the fundamental approach in question, which the author appeared to agree
with.  So I don't know why this patch appeared in this CF at all.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans

2017-03-04 Thread Peter Eisentraut
On 1/25/17 12:43, Simon Riggs wrote:
> On 25 January 2017 at 17:34, Julian Markwort
>  wrote:
> 
>> Analogous to this, a bad_plan is saved, when the time has been exceeded by a
>> factor greater than 1.1 .
> ...and the plan differs?
> 
> Probably best to use some stat math to calculate deviation, rather than fixed 
> %.

Yeah, it seems to me too that this needs a bit more deeper analysis.  I
don't see offhand why a 10% deviation in execution time would be a
reasonable threshold for "good" or "bad".  A deviation approach like you
allude to would be better.

The other problem is that this measures execution time, which can vary
for reasons other than plan.  I would have expected that the cost
numbers are tracked somehow.

There is also the issue of generic vs specific plans, which this
approach might be papering over.

Needs more thought.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2017-03-04 Thread Michael Paquier
On Sat, Mar 4, 2017 at 3:08 PM, Robert Haas  wrote:
> On Thu, Mar 2, 2017 at 5:02 AM, Michael Paquier
>  wrote:
>> On Thu, Mar 2, 2017 at 2:26 AM, David Steele  wrote:
>>> This patch is in need of a committer.  Any takers?
>>> I didn't see a lot of enthusiasm from committers on the thread
>>
>> Stephen at least has showed interest.
>>
>>> so if nobody picks it up by the end of the CF I'm going to mark the patch 
>>> RWF.
>>
>> Yes, that makes sense. If no committer is willing to have a look at
>> code-level and/or has room for this patch then it is as good as
>> doomed. Except for bug fixes, I have a couple of other patches that
>> are piling up so they would likely get the same treatment. There is
>> nothing really we can do about that.
>
> Before we reach the question of whether committers have time to look
> at this, we should first consider the question of whether it has
> achieved consensus.  I'll try to summarize the votes:
>
> Tom Lane: premise pretty dubious
> Robert Haas: do we even want this?
> Peter Eisentraut: I had voiced a similar concern [to Robert's] previously
> Albe Laurenz: I think we should have that
> Andres Freund: [Tom's counterproposal won't work]
> Robert Haas: [Andres has a good point, still nervous] I'm not sure
> there's any better alternative to what's being proposed, though.
> Fujii Masao: One idea is to provide the utility or extension which
> fsync's the specified files and directories
>
> Here's an attempt to translate those words into numerical votes.  As
> per my usual practice, I will count the patch author as +1:
>
> Michael Paquier: +1
> Tom Lane: -1
> Peter Eisentraut: -1
> Albe Laurenz: +1
> Andres Freund: +1
> Robert Haas: +0.5
> Fujii Masao: -0.5
>
> So, my interpretation is that, out of 7 votes, we have -2.5 and +3.5.
> Perhaps that is a consensus for proceeding, but if so it's a pretty
> marginal one.  I think the next step for this patch is to consider why
> we shouldn't, in lieu of what's proposed here, add a pg_fsync utility
> that fsyncs a file or recursively fsyncs a directory, ship that, and
> let people use it on their pg_dump files and/or base backups if they
> wish.  I am not altogether convinced that's a better option, but I am
> also not altogether convinced that it's worse.  Also, if anyone else
> wishes to vote, or if anyone to whom I've attributed a vote wishes to
> assign a numerical value to their vote other than the one I've
> assigned, please feel free.

Not completely exact by including this message:
https://www.postgresql.org/message-id/20170123050248.go18...@tamriel.snowman.net
If I interpret this message correctly Stephen Frost can be counted
with either +1 or +0.5.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Consistency checking for hash indexes

2017-03-04 Thread Amit Kapila
On Sat, Mar 4, 2017 at 2:29 PM, Robert Haas  wrote:
> On Fri, Mar 3, 2017 at 9:44 AM, Amit Kapila  wrote:
>> On Tue, Feb 28, 2017 at 11:06 AM, Kuntal Ghosh
>>  wrote:
>>> Hello everyone,
>>>
>>> I've attached a patch which implements WAL consistency checking for
>>> hash indexes. This feature is going to be useful for developing and
>>> testing of WAL logging for hash index.
>>>
>>
>> I think it is better if you base your patch on (Microvacuum support
>> for hash index - https://commitfest.postgresql.org/13/835/).
>
> I'd rather have this based on top of the WAL logging patch, and have
> any subsequent patches that tinker with the WAL logging include the
> necessary consistency checking changes also.
>

Fair point.  I thought as the other patch has been proposed before
this patch, so it might be better to tackle the changes related to
that patch in this patch. However, changing the MicroVacuum or any
other patch to consider what needs to be masked for that patch sounds
sensible.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-04 Thread Amit Kapila
On Sat, Mar 4, 2017 at 8:55 AM, Peter Geoghegan  wrote:
> On Fri, Mar 3, 2017 at 6:58 PM, Amit Kapila  wrote:
>> You are right that we don't want the number of unclaimed-by-FSM
>> recyclable pages to grow forever, but I think that won't happen with
>> this patch.  As soon as there are more deletions (in heap), in the
>> next vacuum cycle, the pages will be reclaimed by lazy_vacuum_index().
>
> Right.
>
>>> (Thinks about it some more...)
>>>
>>> Unfortunately, I just saw a whole new problem with this patch:
>>> _bt_page_recyclable() is the one place in the B-Tree AM where we stash
>>> an XID.
>>>
>>
>> Can you be more specific to tell which code exactly you are referring here?
>
> I meant that we stash an XID when a B-Tree page is deleted, used to
> determine when it's finally safe to to recycle the page by comparing
> it to RecentGlobalXmin (recyling will happen during the next VACUUM).
> We can't immediately recycle a page, because an existing index scan
> might land on the page while following a stale pointer.
>
> _bt_page_recyclable(), which checks if recycling is safe (no possible
> index scan can land of dead page), is a pretty small function. I'm
> concerned about what happens when
> pg_class.relfrozenxid/pg_database.datfrozenxid are advanced past
> opaque->btpo.xact.
>

Are you talking pg_class.relfrozenxid of index or table?  IIUC, for
indexes it will be InvalidTransactionId and for tables, I think it
will be updated with or without the patch.

> While I can't see this explained anywhere, I'm
> pretty sure that that's supposed to be impossible, which this patch
> changes.
>

What makes you think that patch will allow pg_class.relfrozenxid to be
advanced past opaque->btpo.xact which was previously not possible?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Print correct startup cost for the group aggregate.

2017-03-04 Thread Robert Haas
On Thu, Mar 2, 2017 at 6:48 PM, Ashutosh Bapat
 wrote:
> On Thu, Mar 2, 2017 at 6:06 PM, Rushabh Lathia  
> wrote:
>> While reading through the cost_agg() I found that startup cost for the
>> group aggregate is not correctly assigned. Due to this explain plan is
>> not printing the correct startup cost.
>>
>> Without patch:
>>
>> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
>> filler like '%foo%' group by aid;
>>  QUERY PLAN
>> -
>>  GroupAggregate  (cost=81634.33..85102.04 rows=198155 width=12)
>>Group Key: aid
>>->  Sort  (cost=81634.33..82129.72 rows=198155 width=8)
>>  Sort Key: aid
>>  ->  Seq Scan on pgbench_accounts  (cost=0.00..61487.89 rows=198155
>> width=8)
>>Filter: (filler ~~ '%foo%'::text)
>> (6 rows)
>>
>> With patch:
>>
>> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
>> filler like '%foo%' group by aid;
>>  QUERY PLAN
>> -
>>  GroupAggregate  (cost=82129.72..85102.04 rows=198155 width=12)
>>Group Key: aid
>>->  Sort  (cost=81634.33..82129.72 rows=198155 width=8)
>>  Sort Key: aid
>>  ->  Seq Scan on pgbench_accounts  (cost=0.00..61487.89 rows=198155
>> width=8)
>>Filter: (filler ~~ '%foo%'::text)
>> (6 rows)
>>
>
> The reason the reason why startup_cost = input_startup_cost and not
> input_total_cost for aggregation by sorting is we don't need the whole
> input before the Group/Agg plan can produce the first row. But I think
> setting startup_cost = input_startup_cost is also not exactly correct.
> Before the plan can produce one row, it has to transit through all the
> rows belonging to the group to which the first row belongs. On an
> average it has to scan (total number of rows)/(number of groups)
> before producing the first aggregated row. startup_cost will be
> input_startup_cost + cost to scan (total number of rows)/(number of
> groups) rows + cost of transiting over those many rows. Total cost =
> startup_cost + cost of scanning and transiting through the remaining
> number of input rows.

While that idea has some merit, I think it's inconsistent with current
practice.  cost_seqscan(), for example, doesn't include the cost of
reading the first page in the startup cost, even though that certainly
must be done before returning the first row.  I think there have been
previous discussions of switching over to the practice for which you
are advocating here, but my impression (without researching) is that
the current practice is more like what Rushabh did.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-03-04 Thread Robert Haas
On Thu, Mar 2, 2017 at 11:48 AM, Andres Freund  wrote:
> On 2017-03-01 19:25:23 -0600, Jim Nasby wrote:
>> On 2/28/17 11:21 AM, Andreas Karlsson wrote:
>> > The only downside I can see to this approach is that we no logner will
>> > able to reindex catalog tables concurrently, but in return it should be
>> > easier to confirm that this approach can be made work.
>>
>> Another downside is any stored regclass fields will become invalid.
>> Admittedly that's a pretty unusual use case, but it'd be nice if there was
>> at least a way to let users fix things during the rename phase (perhaps via
>> an event trigger).
>
> I'm fairly confident that we don't want to invoke event triggers inside
> the CIC code...  I'm also fairly confident that between index oids
> stored somewhere being invalidated - what'd be a realistic use case of
> that - and not having reindex concurrently, just about everyone will
> choose the former.

Maybe.  But it looks to me like this patch is going to have
considerably more than its share of user-visible warts, and I'm not
very excited about that.  I feel like what we ought to be doing is
keeping the index OID the same and changing the relfilenode to point
to a newly-created one, and I attribute our failure to make that
design work thus far to insufficiently aggressive hacking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Consistency checking for hash indexes

2017-03-04 Thread Robert Haas
On Fri, Mar 3, 2017 at 9:44 AM, Amit Kapila  wrote:
> On Tue, Feb 28, 2017 at 11:06 AM, Kuntal Ghosh
>  wrote:
>> Hello everyone,
>>
>> I've attached a patch which implements WAL consistency checking for
>> hash indexes. This feature is going to be useful for developing and
>> testing of WAL logging for hash index.
>>
>
> I think it is better if you base your patch on (Microvacuum support
> for hash index - https://commitfest.postgresql.org/13/835/).

I'd rather have this based on top of the WAL logging patch, and have
any subsequent patches that tinker with the WAL logging include the
necessary consistency checking changes also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 12:50 AM, Robert Haas  wrote:
> If you think parallelism isn't worthwhile unless the sort was going to
> be external anyway,

I don't -- that's just when it starts to look like a safe bet that
parallelism is worthwhile. There are quite a few cases where an
external sort is faster than an internal sort these days, actually.

> then it seems like the obvious thing to do is
> divide the projected size of the sort by maintenance_work_mem, round
> down, and cap the number of workers to the result.

I'm sorry, I don't follow.

> If the result of
> compute_parallel_workers() based on min_parallel_table_scan_size is
> smaller, then use that value instead.  I must be confused, because I
> actually though that was the exact algorithm you were describing, and
> it sounded good to me.

It is, but I was using that with index size, not table size. I can
change it to be table size, based on what you said. But the workMem
related cap, which probably won't end up being applied all that often
in practice, *should* still do something with projected index size,
since that really is what we're sorting, which could be very different
(e.g. with partial indexes).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-03-04 Thread Robert Haas
On Fri, Mar 3, 2017 at 7:21 PM, Karl O. Pinc  wrote:
> But if the code does not exit the loop then
> before calling elog(ERROR), shouldn't it
> also call FreeFile(fd)?

Hmm.  Normally error recovery cleans up opened files, but since
logfile_open() directly calls fopen(), that's not going to work here.
So that does seem to be a problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-03-04 Thread Robert Haas
On Fri, Mar 3, 2017 at 11:54 AM, Michael Paquier
 wrote:
> On Fri, Mar 3, 2017 at 3:18 PM, Robert Haas  wrote:
>> Hopefully I haven't broken anything; please let me know if you
>> encounter any issues.
>
> Reading what has just been committed...
>
> +   /*
> +* No space found, file content is corrupted.  Return NULL to the
> +* caller and inform him on the situation.
> +*/
> +   elog(ERROR,
> +"missing space character in \"%s\"", LOG_METAINFO_DATAFILE);
> +   break;
> There is no need to issue a break after a elog(ERROR).

True, but it's not wrong, either.  We do it all the time.

git grep -A2 elog.*ERROR
/break


The fact that the comment doesn't match the code, though, is wrong.  Oops.

> +* No newlinei found, file content is corrupted.  Return NULL to
> s/newlinei/newline/

That's also a problem, and that comment also refers to returning,
which we don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Robert Haas
On Sat, Mar 4, 2017 at 2:17 PM, Peter Geoghegan  wrote:
> On Sat, Mar 4, 2017 at 12:43 AM, Robert Haas  wrote:
>> Oh.  But then I don't see why you need min_parallel_anything.  That's
>> just based on an estimate of the amount of data per worker vs.
>> maintenance_work_mem, isn't it?
>
> Yes -- and it's generally a pretty good estimate.
>
> I don't really know what minimum amount of memory to insist workers
> have, which is why I provisionally chose one of those GUCs as the
> threshold.
>
> Any better ideas?

I don't understand how min_parallel_anything is telling you anything
about memory.  It has, in general, nothing to do with that.

If you think parallelism isn't worthwhile unless the sort was going to
be external anyway, then it seems like the obvious thing to do is
divide the projected size of the sort by maintenance_work_mem, round
down, and cap the number of workers to the result.  If the result of
compute_parallel_workers() based on min_parallel_table_scan_size is
smaller, then use that value instead.  I must be confused, because I
actually though that was the exact algorithm you were describing, and
it sounded good to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2017-03-04 Thread Robert Haas
On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada  wrote:
> Yes, it's taking a time to update logic and measurement but it's
> coming along. Also I'm working on changing deadlock detection. Will
> post new patch and measurement result.

I think that we should push this patch out to v11.  I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly.  Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.

>From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 12:43 AM, Robert Haas  wrote:
> Oh.  But then I don't see why you need min_parallel_anything.  That's
> just based on an estimate of the amount of data per worker vs.
> maintenance_work_mem, isn't it?

Yes -- and it's generally a pretty good estimate.

I don't really know what minimum amount of memory to insist workers
have, which is why I provisionally chose one of those GUCs as the
threshold.

Any better ideas?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Robert Haas
On Sat, Mar 4, 2017 at 2:01 PM, Peter Geoghegan  wrote:
> On Sat, Mar 4, 2017 at 12:23 AM, Robert Haas  wrote:
>>> I guess that the workMem scaling threshold thing could be
>>> min_parallel_index_scan_size, rather than min_parallel_relation_size
>>> (which we now call min_parallel_table_scan_size)?
>>
>> No, it should be based on min_parallel_table_scan_size, because that
>> is the size of the parallel heap scan that will be done as input to
>> the sort.
>
> I'm talking about the extra thing we do to prevent parallelism from
> being used when per-worker workMem is excessively low. That has much
> more to do with projected index size than current heap size.

Oh.  But then I don't see why you need min_parallel_anything.  That's
just based on an estimate of the amount of data per worker vs.
maintenance_work_mem, isn't it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 12:23 AM, Robert Haas  wrote:
>> I guess that the workMem scaling threshold thing could be
>> min_parallel_index_scan_size, rather than min_parallel_relation_size
>> (which we now call min_parallel_table_scan_size)?
>
> No, it should be based on min_parallel_table_scan_size, because that
> is the size of the parallel heap scan that will be done as input to
> the sort.

I'm talking about the extra thing we do to prevent parallelism from
being used when per-worker workMem is excessively low. That has much
more to do with projected index size than current heap size.

I agree with everything else you've said, I think.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] I propose killing PL/Tcl's "modules" infrastructure

2017-03-04 Thread Robert Haas
On Thu, Mar 2, 2017 at 11:29 PM, Tom Lane  wrote:
> After thinking about that for awhile, it seemed like the most useful thing
> to do is to set up an errcontext callback that will be active throughout
> execution of the start_proc.  That will cover both setup failures like
> the above, and errors occurring within the start_proc, which could be
> equally confusing if you think they apply to the function you initially
> tried to call.  v2 patch attached that does it like that.

+1 for that approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Robert Haas
On Thu, Mar 2, 2017 at 10:38 PM, Peter Geoghegan  wrote:
> I'm glad. This justifies the lack of much of any "veto" on the
> logarithmic scaling. The only thing that can do that is
> max_parallel_workers_maintenance, the storage parameter
> parallel_workers (maybe this isn't a storage parameter in V9), and
> insufficient maintenance_work_mem per worker (as judged by
> min_parallel_relation_size being greater than workMem per worker).
>
> I guess that the workMem scaling threshold thing could be
> min_parallel_index_scan_size, rather than min_parallel_relation_size
> (which we now call min_parallel_table_scan_size)?

No, it should be based on min_parallel_table_scan_size, because that
is the size of the parallel heap scan that will be done as input to
the sort.

>> I think it's totally counter-intuitive that any hypothetical index
>> storage parameter would affect the degree of parallelism involved in
>> creating the index and also the degree of parallelism involved in
>> scanning it.  Whether or not other systems do such crazy things seems
>> to me to beside the point.  I think if CREATE INDEX allows an explicit
>> specification of the degree of parallelism (a decision I would favor)
>> it should have a syntactically separate place for unsaved build
>> options vs. persistent storage parameters.
>
> I can see both sides of it.
>
> On the one hand, it's weird that you might have query performance
> adversely affected by what you thought was a storage parameter that
> only affected the index build. On the other hand, it's useful that you
> retain that as a parameter, because you may want to periodically
> REINDEX, or have a way of ensuring that pg_restore does go on to use
> parallelism, since it generally won't otherwise. (As mentioned
> already, pg_restore does not trust the cost model due to issues with
> the availability of statistics).

If you make the changes I'm proposing above, this parenthetical issue
goes away, because the only statistic you need is the table size,
which is what it is.  As to the rest, I think a bare REINDEX should
just use the cost model as if it were CREATE INDEX, and if you want to
override that behavior, you can do that by explicit syntax.  I see
very little utility for a setting that fixes the number of workers to
be used for future reindexes: there won't be many of them, and it's
kinda confusing.  But even if we decide to have that, I see no
justification at all for conflating it with the number of workers to
be used for a scan, which is something else altogether.

> To be clear, I don't have any strong feelings on all this. I just
> think it's worth pointing out that there are reasons to not do what
> you suggest, that you might want to consider if you haven't already.

I have considered them.  I also acknowledge that other people may view
the situation differently than I do.  I'm just telling you my opinion
on the topic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers