Re: [HACKERS] Error while copying a large file in pg_rewind

2017-07-21 Thread Michael Paquier
On Fri, Jul 21, 2017 at 8:22 AM, Michael Paquier
 wrote:
> On Thu, Jul 20, 2017 at 9:31 PM, Robert Haas  wrote:
>> Still, it can't be worse than the status quo, where instead of int64
>> we're using int and int32, so maybe we ought to back-patch it as-is
>> for now and look at any further cleanup that is needed as a
>> master-only improvement.
>
> Yes. I don't like playing much with the variable types on
> back-branches, as long as the initial amount of bytes is large enough
> we will be safe for some time.

Note for the archives: the main issue has been fixed as a46fe6e8, and
the incorrect condition as 063ff921. Thanks Robert!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More race conditions in logical replication

2017-07-21 Thread Alvaro Herrera
I'm back at looking into this again, after a rather exhausting week.  I
think I have a better grasp of what is going on in this code now, and it
appears to me that we should change the locking so that active_pid is
protected by ReplicationSlotControlLock instead of the slot's spinlock;
but I haven't analyzed the idea fully yet and I don't have the patch
done yet either.  I'm going to report, hopefully with a fix committed
this time, on Monday at 19:00 CLT.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2017-07-21 Thread Peter Geoghegan
On Wed, Aug 17, 2016 at 7:54 PM, Claudio Freire  wrote:
> The attached patch tries to maintain the initial status of B-Tree
> indexes, which are created with equal-key runs in physical order,
> during the whole life of the B-Tree, and make key-tid pairs
> efficiently searchable in the process.

I don't see an entry for this in the next CF. Do you have a plan for it?

BTW, I did post some remarks on this patch on another thread recently
[1]. Not sure if any of what I said there is news to you at this
point.

[1] 
postgr.es/m/CAH2-Wzn=6Lc3OVA88x=E6SKG72ojNUE6ut6RZAqNnQx-YLcw=q...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-21 Thread Tom Lane
Andres Freund  writes:
> On 2017-07-18 16:53:43 -0400, Tom Lane wrote:
>> BTW, I don't see why you really need to mess with anything except
>> ExecProcNode?  Surely the other cases such as MultiExecProcNode are
>> not called often enough to justify changing them away from the
>> switch technology.  Yeah, maybe it would be a bit cleaner if they
>> all looked alike ... but if you're trying to make a patch that's
>> as little invasive as possible for v10, I'd suggest converting just
>> ExecProcNode to this style.

> Yea, I was making that statement when not aiming for v10.  Attached is a
> patch that converts just ExecProcNode.

I read through this patch (didn't test it at all yet).

> I dislike having the miscadmin.h include in executor.h - but I don't
> quite see a better alternative.

I seriously, seriously, seriously dislike that.  You practically might as
well put miscadmin.h into postgres.h.  Instead, what do you think of
requiring the individual ExecProcNode functions to perform
CHECK_FOR_INTERRUPTS?  Since they're already responsible for doing that
if they have any long-running internal loops, this doesn't seem like a
modularity violation.  It is a risk for bugs-of-omission, sure, but so
are a lot of other things that the per-node code has to do.

There might be something to be said for handling the chgParam/rescan tests
similarly.  That would reduce the ExecProcNode macro to a triviality,
which would be a good thing for understandability of the code I think.

Some other thoughts:

* I think the comments need more work.  Am willing to make a pass over
that if you want.

* In most places, if there's an immediate return-if-trivial-case test,
we check stack depth only after that.  There's no point in checking
and then returning; either you already crashed, or you're at peak
stack so far as this code path is concerned.

* Can we redefine the ExecCustomScan function pointer as type
ExecProcNodeCB, eliminating the unsightly cast in nodeCustom.c?

* The various callers of ExecScan() are pretty inconsistently coded.
I don't care that much whether they use castNode() or just forcibly
cast to ScanState*, but let's make them all look the same.

* I believe the usual term for what these function pointers are is
"methods", not "callbacks".  Certainly we'd call them that if we
were working in C++.

> I still think we should backpatch at least the check_stack_depth() calls
> in ExecInitNode(), ExecEndNode().

No big objection, although I'm not sure it's necessary.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More optimization effort?

2017-07-21 Thread Greg Stark
On 21 July 2017 at 20:00, Tom Lane  wrote:

>> I have, however, decided not to volunteer to be the one who works on
>> that project.
>
> Me either.  Any one of these things would require a *lot* of work in
> order to have a coherent feature that provided useful behavior across
> a bunch of different datatypes.

I had in the past idly thought about whether it would be possible to
link in one of the various general purpose theorem proving libraries
and use it to simplify the expressions. But I really have no idea how
much work it would be to teach one about all the properties and
constraints of our existing data types and operators or for that
matter how easy it would be to figure out what theorems we want proven
to be able to use an index.



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-21 Thread Stephen Frost
Masahiko, all,

* Masahiko Sawada (sawada.m...@gmail.com) wrote:
> On Tue, Jul 18, 2017 at 1:28 PM, Stephen Frost  wrote:
> > Masahiko, Michael,
> >
> > * Masahiko Sawada (sawada.m...@gmail.com) wrote:
> >> > This is beginning to shape.
> >>
> >> Sorry, I missed lots of typo in the last patch. All comments from you
> >> are incorporated into the attached latest patch and I've checked it
> >> whether there is other typos. Please review it.
> >
> > I've taken an initial look through the patch and it looks pretty
> > reasonable.  I need to go over it in more detail and work through
> > testing it myself next.
> >
> > I expect to commit this (with perhaps some minor tweaks primairly around
> > punctuation/wording), barring any issues found, on Wednesday or Thursday
> > of this week.
> 
> I understood. Thank you for looking at this!

I started discussing this with David off-list and he'd like a chance to
review it in a bit more detail (he's just returning from being gone for
a few weeks).  That review will be posted to this thread on Monday, and
I'll wait until then to move forward with the patch.

Next update will be before Tuesday, July 25th, COB.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Locale dependency in new postgres_fdw test

2017-07-21 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> We could stabilize this test result by forcing lc_messages = C in
>> the foreign server options.  However, that would lose regression
>> coverage of situations where the remote locale is different, which
>> doesn't seem like a terribly good thing.

> I don't understand what is the benefit of having a different locale
> setting if there's no way that the test would pass except with a very
> specific locale.  In other words, what coverage are we really losing by
> going this route?

What the current situation proves (or at least gives evidence for)
is that postgres_fdw doesn't have hidden dependencies on the remote server
running with the same lc_messages that prevails locally.  As an example
--- admittedly a bit far-fetched, because this shouldn't ever get past
code review --- if postgres_fdw were checking for the presence of specific
strings in error messages from the remote, we could hope that the
buildfarm would catch that.  But if we force the remote lc_messages value
throughout the test, we lose that type of coverage.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Locale dependency in new postgres_fdw test

2017-07-21 Thread Alvaro Herrera
Tom Lane wrote:

> We could stabilize this test result by forcing lc_messages = C in
> the foreign server options.  However, that would lose regression
> coverage of situations where the remote locale is different, which
> doesn't seem like a terribly good thing.

I don't understand what is the benefit of having a different locale
setting if there's no way that the test would pass except with a very
specific locale.  In other words, what coverage are we really losing by
going this route?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-21 Thread Julien Rouhaud
On 21/07/2017 13:40, Andres Freund wrote:
> Attached is a
> patch that converts just ExecProcNode. The big change in comparison to
> the earlier patch is that the assignment of the callback is now done in
> the respective ExecInit* routines. As a consequence the ExecProcNode
> callbacks now are static.

Thanks for working on it.  Just in case, I reviewed the patch and didn't
find any issue with it.

-- 
Julien Rouhaud


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-21 Thread Claudio Freire
On Fri, Jul 21, 2017 at 2:41 PM, Sokolov Yura
 wrote:
>
> My friend noticed, that I didn't said why I bother with autovacuum.
> Our customers suffers from table bloating. I've made synthetic
> bloating test, and started experiments with modifying micro- and
> auto-vacuum. My first attempts were to update FSM early (both in
> micro and autovacuum) and update it upto root, not only low level.

This FSM thing is probably not a bad idea as well.

We're forced to run regular manual vacuums because for some tables
autovacuums seems to never be enough, no matter how it's configured,
mostly because it gets canceled all the time. These are high-churn,
huge tables, so vacuuming them takes hours or days, there's always
someone with a conflicting lock at some point that ends up canceling
the autovacuum task.

The above paragraph triggered me to go check, and it seems in those
cases the FSM never gets vacuumed. That's probably not a good thing,
but I don't see how to vacuum the FSM after a cancel. So vacuuming the
FSM from time to time during long-running vacuums seems like a good
idea at this point.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Buildfarm failure and dubious coding in predicate.c

2017-07-21 Thread Tom Lane
Buildfarm member culicidae just showed a transient failure in
the 9.4 branch:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2017-07-21%2017%3A49%3A37

It's an assert trap, for which the buildfarm helpfully captured a
stack trace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7fb8d388a3fa in __GI_abort () at abort.c:89
#2  0x558d34d90814 in ExceptionalCondition 
(conditionName=conditionName@entry=0x558d34df6e2d "!(!found)", 
errorType=errorType@entry=0x558d34dcef3c "FailedAssertion", 
fileName=fileName@entry=0x558d34f19dc0 
"/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/storage/lmgr/predicate.c",
 lineNumber=lineNumber@entry=2023) at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/utils/error/assert.c:54
#3  0x558d34c9374b in RestoreScratchTarget (lockheld=lockheld@entry=1 
'\001') at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/storage/lmgr/predicate.c:2023
#4  0x558d34c966c4 in DropAllPredicateLocksFromTable (transfer=1 '\001', 
relation=relation@entry=0x7fb8d4d3aa18) at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/storage/lmgr/predicate.c:2997
#5  TransferPredicateLocksToHeapRelation 
(relation=relation@entry=0x7fb8d4d3aa18) at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/storage/lmgr/predicate.c:3014
#6  0x558d34ac7a70 in index_drop (indexId=29755, 
concurrent=concurrent@entry=0 '\000') at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/catalog/index.c:1516
#7  0x558d34ac00f8 in doDeletion (flags=-1369083928, object=0x558d35c2c03c) 
at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/catalog/dependency.c:1125
#8  deleteOneObject (object=0x558d35c2c03c, depRel=depRel@entry=0x7fffae656fe8, 
flags=flags@entry=0) at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/catalog/dependency.c:1036
#9  0x558d34ac0545 in deleteObjectsInList 
(targetObjects=targetObjects@entry=0x558d35bae140, 
depRel=depRel@entry=0x7fffae656fe8, flags=flags@entry=0) at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/catalog/dependency.c:227
#10 0x558d34ac06c8 in performMultipleDeletions 
(objects=objects@entry=0x558d35badef0, behavior=DROP_CASCADE, 
flags=flags@entry=0) at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/catalog/dependency.c:366
#11 0x558d34b3e2e9 in RemoveObjects (stmt=stmt@entry=0x558d35bf5678) at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/commands/dropcmds.c:134
#12 0x558d34ca61f0 in ExecDropStmt (stmt=stmt@entry=0x558d35bf5678, 
isTopLevel=isTopLevel@entry=1 '\001') at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/tcop/utility.c:1364
#13 0x558d34ca8455 in ProcessUtilitySlow 
(parsetree=parsetree@entry=0x558d35bf5678, 
queryString=queryString@entry=0x558d35bf4b50 "DROP SCHEMA selinto_schema 
CASCADE;", context=context@entry=PROCESS_UTILITY_TOPLEVEL, 
params=params@entry=0x0, dest=dest@entry=0x558d35bf5a20, 
completionTag=completionTag@entry=0x7fffae657710 "") at 
/home/andres/build/buildfarm-culicidae/REL9_4_STABLE/pgsql.build/../pgsql/src/backend/tcop/utility.c:1295

I've been staring at that for a little while, and I can't see any logic
error that would lead to the failure.  Clearly it'd be expected if two
sessions tried to remove/reinsert the "scratch target" concurrently,
but the locking operations should be enough to prevent that.  (Moreover,
if that had happened, you'd have expected an earlier assertion failure
in one or the other of the RemoveScratchTarget calls.)

Plausible explanations at this point seem to be:

1. Cosmic ray bit-flip.
2. There's some bug in the lock infrastructure, allowing two processes
   to acquire an LWLock concurrently.
3. Logic error I'm missing.

Probably it's #3, but what?

And, while I'm looking at this ... isn't this "scratch target" logic
just an ineffective attempt at waving a dead chicken?  It's assuming
that freeing an entry in a shared hash table guarantees that it can
insert another entry.  But that hash table is partitioned, meaning it has
a separate freelist per partition.  So the extra entry only provides a
guarantee that you can insert something into the same partition it's in,
making it useless for this purpose AFAICS.

By the same token, I do not think I believe the nearby assumptions that
deleting one entry from PredicateLockHash guarantees we can insert
another one.  That hash is partitioned as well.

It looks to me like we either need to do a fairly significant rewrite
here, or to give up on making these hashtables partitioned.  Either
one is pretty annoying, considering the very low 

Re: [HACKERS] Syncing sql extension versions with shared library versions

2017-07-21 Thread Andrew Dunstan


On 07/21/2017 04:17 PM, Mat Arye wrote:
> Hi All,
>
> I am developing the TimescaleDB extension for postgres
> (https://github.com/timescale/timescaledb) and have some questions
> about versioning. First of all, I have to say that the versioning
> system on the sql side is wonderful. It's really simple to write
> migrations etc.
>
> However when thinking through the implications of having a database
> cluster with databases having different extension versions installed,
> it was not apparently clear to me how to synchronize the installed
> extension version with a shared library version. For example, if I
> have timescaledb version 0.1.0 in one db and version 0.2.0 in another
> db, I'd like for timescaledb-0.1.0.so 
> and  timescaledb-0.2.0.so  to be used,
> respectively. (I want to avoid having to keep backwards-compatibility
> for all functions in future shared-libraries). In our case, this is
> further complicated by the fact that we need to preload the shared
> library since we are accessing the planner hooks etc. Below, I'll
> describe some solutions I have been thinking about, but wanted to hear
> if anyone else on this list has already solved this problem and has
> some insight. It is also quite possible I am missing something. 
>
> Issue 1: Preloading the right shared library.
> When preloading libraries (either via local_preload_libraries, or
> session_preload_libraries, shared_preload_libraries), it would be nice
> to preload the shared_library according to it's version. But, I looked
> through the code and found no logic for adding version numbers to
> shared library names.
> Solution 1: Set session_preload_libraries on the database via ALTER
> DATABASE SET. This can be done in the sql and the sql
> version-migration scripts can change this value as you change
> extensions versions. I think this would work, but it seems very
> hack-ish and less-than-ideal.
> Solution 2: Create some kind of stub shared-library that will, in
> turn, load another shared library of the correct version. This seems
> like the cleaner approach. Has anybody seen/implemented something like
> this already?
>
> Issue 2: module_pathname
> I believe that for user defined function the MODULE_PATHNAME
> substitution will not work since that setting is set once
> per-extension. Thus, for example, the migration scripts that include
> function definitions for older versions would use the latest .so file
> if MODULE_PATHNAME was used in the definition. This would be a problem
> if upgrading to an intermediate (not latest) version.
> Solution: MODULE_PATHNAME cannot be used, and we should build our own
> templating/makefile infrastructure to link files to the
> right-versioned shared library in the CREATE FUNCTION definition. 
>
>



It would be nice if we could teach yhe load mechanism to expand a a
version escape in the MODULE_PATHNAME. e.g.

MODULE_PATHNAME = '$libdir/foo-$version'

cheers

andtrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Syncing sql extension versions with shared library versions

2017-07-21 Thread Mat Arye
Hi All,

I am developing the TimescaleDB extension for postgres (
https://github.com/timescale/timescaledb) and have some questions about
versioning. First of all, I have to say that the versioning system on the
sql side is wonderful. It's really simple to write migrations etc.

However when thinking through the implications of having a database cluster
with databases having different extension versions installed, it was not
apparently clear to me how to synchronize the installed extension version
with a shared library version. For example, if I have timescaledb version
0.1.0 in one db and version 0.2.0 in another db, I'd like for
timescaledb-0.1.0.so and  timescaledb-0.2.0.so to be used, respectively. (I
want to avoid having to keep backwards-compatibility for all functions in
future shared-libraries). In our case, this is further complicated by the
fact that we need to preload the shared library since we are accessing the
planner hooks etc. Below, I'll describe some solutions I have been thinking
about, but wanted to hear if anyone else on this list has already solved
this problem and has some insight. It is also quite possible I am missing
something.

Issue 1: Preloading the right shared library.
When preloading libraries (either via local_preload_libraries, or
session_preload_libraries, shared_preload_libraries), it would be nice to
preload the shared_library according to it's version. But, I looked through
the code and found no logic for adding version numbers to shared library
names.
Solution 1: Set session_preload_libraries on the database via ALTER
DATABASE SET. This can be done in the sql and the sql version-migration
scripts can change this value as you change extensions versions. I think
this would work, but it seems very hack-ish and less-than-ideal.
Solution 2: Create some kind of stub shared-library that will, in turn,
load another shared library of the correct version. This seems like the
cleaner approach. Has anybody seen/implemented something like this already?

Issue 2: module_pathname
I believe that for user defined function the MODULE_PATHNAME substitution
will not work since that setting is set once per-extension. Thus, for
example, the migration scripts that include function definitions for older
versions would use the latest .so file if MODULE_PATHNAME was used in the
definition. This would be a problem if upgrading to an intermediate (not
latest) version.
Solution: MODULE_PATHNAME cannot be used, and we should build our own
templating/makefile infrastructure to link files to the right-versioned
shared library in the CREATE FUNCTION definition.

Thanks a lot in advance,
Mat Arye
http://www.timescale.com/


[HACKERS] possible effective_io_concurrency performance regression

2017-07-21 Thread Joshua D. Drake

-hackers,

While updating my Postgres performance curriculum I was doing some 
testing with effective_io_concurrency and I may have found a regression. 
I am aware that the parameter only works under certain conditions. 
However, what I appear to have found is that if it is set to anything 
but 0, it is a regression for (at least benchmarksql tpc-c) workloads.


See here:

Testing with the TPC style benchmark shows that on local systems, 
setting this between 0 – 48 keeps the TPS within noise level. However, 
testing also shows that on cloud systems such as Google Cloud Compute 
setting this setting to anything greater than 0 results in an 
approximately 10% performance degradation on TPS:


Local/GCE   effective_io_concurrencyTPS

GCE OFF 47951

8   43098

1   43233
LOCAL
0   9939

4   9960

16  9955

48  9958


I was able to produce these results pretty consistently. I wonder if any 
has seen this on EBS? GCE instance is 16CPU, 59GB memory, 240MB 
Sustained rate SSD with 15k IOPS.


Thanks,

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-07-21 Thread Fabrízio de Royes Mello
On Fri, Jul 21, 2017 at 12:19 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
>
> On Fri, Jul 21, 2017 at 10:58 AM, Yugo Nagata  wrote:
> >
> > On Fri, 21 Jul 2017 10:31:57 -0300
> > Fabrízio de Royes Mello  wrote:
> >
> > > On Fri, Jul 21, 2017 at 9:35 AM, Yugo Nagata 
wrote:
> > > >
> > > > On Fri, 21 Jul 2017 09:53:19 +0800
> > > > Craig Ringer  wrote:
> > > >
> > > > > On 21 July 2017 at 08:42, Robert Haas 
wrote:
> > > > >
> > > > > > On Thu, Jul 20, 2017 at 8:27 AM, Fabrízio de Royes Mello
> > > > > >  wrote:
> > > > > > > I'm not sure your real needs but doesn't it material for
improve
> > > Event
> > > > > > > Triggers???
> > > > > >
> > > > > > I've thought about that, too.  One problem is what to do if the
user
> > > > > > hits ^C while the event trigger procedure is running.  If you
respond
> > > > > > to that by killing the event trigger and letting the user issue
> > > > > > commands, then the event trigger can't be used for security or
> > > > > > auditing purposes because the user might prevent it from doing
> > > > > > whatever it's intended to do with a well-timed interrupt.  If
you
> > > > > > ignore ^C or make it turn into FATAL, then a poorly-crafted
trigger
> > > > > > can lock users out of the database.  Maybe that's OK.  We could
say
> > > > > > "well, if you lock yourself out of the database with your logon
> > > > > > trigger, you get to shut down the database and restart in
single user
> > > > > > mode to recover".
> > > > > >
> > > > > > A hook, as proposed here, is a lot simpler and lacks these
concerns.
> > > > > > Installing code in C into the database is intrinsically risky
> > > > > > anywhere, and not any moreso here than elsewhere.  But it's
also less
> > > > > > accessible to the average user.
> > > > > > 
> > > > >
> > > > >
> > > > > I'd favour the c hook personally. It's a lot more flexible, and
can be
> > > used
> > > > > by an extension to implement trigger-like behaviour if anyone
wants it,
> > > > > including the extension's choice of error handling decisions.
> > > > >
> > > > > It's also a lot simpler and less intrusive for core. Which is nice
> > > where we
> > > > > don't have something that we don't have anything compelling
destined for
> > > > > core that needs it. (I want to add a bunch of hooks in the logical
> > > > > replication code in pg11 for similar reasons, and so features
like DDL
> > > > > replication can be prototyped as extensions more practically).
> > > > >
> > >
> > > I agree with you both...
> > >
> > >
> > > > > That said, isn't ExecutorStart_hook + ProcessUtility_hook able to
serve
> > > the
> > > > > same job as a session-start hook, albeit at slightly higher
overhead?
> > > You
> > > > > can just test to see if your initial tasks have run yet.
> > > >
> > > > Thank you for your suggestion. Certainly, we can do the similar job
of a
> > > > session-start hook using these existing hooks, although these hooks
are
> > > > triggered when the first query is executed not when the session is
> > > started.
> > > > Now I come to think that an additional hook is not need.
> > > >
> > >
> > > As Nagata said hooks proposed by Craing will happens only when the
first
> > > query is called so I don't know how it works for session start... are
we
> > > missing something?
> >
> > Yes, ExecutorStart_hook + ProcessUtility_hook is not strictly same as
> > session_start hook. If a query is issued a long time since the session
start,
> > the timing the hook happens is largely deviated. It is no problem if we
only
> > want do something once at the session start, but it might be problem if
> > we want to record the timestamp of the session start, for example.
> >
> > >
> > > If we're going to add this hook what about add a session end hook
also?
> >
> > If someone want the session-start hook, he might want this too.
> >
>
> Well if someone wants here are the patches... I just did a minor fix and
cleanup in your previous session_start sample and provide both samples into
the same patch.
>

I made a mistake on previous patch... now the attached three patches in
their correct orders.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b8d860e..7a1fa3b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -160,6 +160,9 @@ static bool RecoveryConflictPending = false;
 static bool RecoveryConflictRetryable = true;
 static ProcSignalReason RecoveryConflictReason;
 
+/* Hook for plugins to get control at start of session */

Re: [HACKERS] More optimization effort?

2017-07-21 Thread Tom Lane
Robert Haas  writes:
> Another very similar (but possibly easier) case is:

> select * from pgbench_accounts where aid = 1.0;

> This will use a sequential scan rather than an index scan, because the
> query optimizer doesn't know that the only integer for which =(int4,
> numeric) will return true is 1.  Therefore it has to scan the whole
> table one row at a time and check, for each one, whether the =
> operator returns true.  It can't cast the constant to an integer
> because the user might have written 1.1 rather than 1.0, in which case
> the cast would fail; but the query should return 0 rows, not ERROR.

> You can imagine fixing this by having some kind of datatype-specific
> knowledge that would replace "aid = 1.0" with "aid = 1" and "aid =
> 1.1" with "false"; it would also have to know that "aid = 99"
> should be changed to "false" because 99 isn't representable as
> int4.

Couple thoughts about that:

* Another conceivable route to a solution is to add "int = numeric"
and friends to the btree opclasses for integers.  I'm not sure if
there is any fundamental reason we've not done that (it's possible
it would fall foul of the requirements about transitivity, not sure).
However, this would only fix things to the extent of allowing an
index scan to occur; it wouldn't help in non-indexed cases, nor would
it know anything about simplifying say "aid = 1.1" to "false".

* The already-existing protransform infrastructure could be used to
address this type of problem; that is, you could imagine attaching
a transform function to numeric_eq that would look for cases like
"integer::numeric = nonintegralconstant" and simplify them accordingly.
When that feature went in, there was talk of using transforms to
simplify e.g. "variable + 0" or "variable * 1", but nobody's got round
to anything of the sort yet.

* On the other hand, protransform doesn't offer any easy way to apply
similar optimizations to a bunch of different functions/operators.
For instance, if your goal is to simplify "variable + 0", there are
a depressingly large number of "+" operators to write transform
functions for.  Maybe there's no way around duplicate coding for that,
but it looks tedious and bulky.

> I have, however, decided not to volunteer to be the one who works on
> that project.

Me either.  Any one of these things would require a *lot* of work in
order to have a coherent feature that provided useful behavior across
a bunch of different datatypes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More optimization effort?

2017-07-21 Thread Robert Haas
On Fri, Jul 21, 2017 at 10:33 AM, Tom Lane  wrote:
> But the bigger picture is that doing something that helps to any
> useful extent would require a really substantial amount of new,
> datatype- and operator-specific knowledge that doesn't exist in the
> system today.  And as Craig noted, applying that knowledge would
> be expensive, even in cases where it failed to help.
>
> So, color me skeptical ...

I agree, but with a caveat.  If somebody felt like doing all of that
work, and either made it cheap enough to justify enabling it by
default or added a controlling GUC, it'd be fine with me.  We've
talked before about having knobs to adjust how hard the optimizer
tries to optimize things, and this would be a good candidate for such
a thing.  The bigger issue from my perspective is that I really doubt
that anybody wants to put the effort into doing something like this in
a principled way.

Another very similar (but possibly easier) case is:

select * from pgbench_accounts where aid = 1.0;

This will use a sequential scan rather than an index scan, because the
query optimizer doesn't know that the only integer for which =(int4,
numeric) will return true is 1.  Therefore it has to scan the whole
table one row at a time and check, for each one, whether the =
operator returns true.  It can't cast the constant to an integer
because the user might have written 1.1 rather than 1.0, in which case
the cast would fail; but the query should return 0 rows, not ERROR.

You can imagine fixing this by having some kind of datatype-specific
knowledge that would replace "aid = 1.0" with "aid = 1" and "aid =
1.1" with "false"; it would also have to know that "aid = 99"
should be changed to "false" because 99 isn't representable as
int4.

I have, however, decided not to volunteer to be the one who works on
that project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Locale dependency in new postgres_fdw test

2017-07-21 Thread Tom Lane
I wrote:
> I had not realized (or forgot) that postgres_fdw allows the remote
> end to run in whatever lc_messages locale is default for the remote
> installation.  It's a bit surprising that none of the existing test
> cases expose any remote-side messages directly, but evidently not.

> We could stabilize this test result by forcing lc_messages = C in
> the foreign server options.  However, that would lose regression
> coverage of situations where the remote locale is different, which
> doesn't seem like a terribly good thing.

It turns out that that way doesn't fix the problem, anyway, because
an lc_messages setting in the startup packet is applied too late
to change the output for the errors we're interested in.  There have
been past discussions about maybe fixing that, but it's certainly
not happening now, much less in back branches.

BTW, I found out while trying to do that that ALTER SERVER does not
accept "options '-c lc_messages=C'" anyway, which surprised me quite
a bit.  The reason turns out to be that libpq labels "options" as a
debug option which causes postgres_fdw to ignore it.  Should we think
about changing that?  Being able to set GUC variables for the remote
session seems useful.

> Another option is to temporarily set VERBOSITY to "terse" so that
> the DETAIL is suppressed from the test output.  But then we don't
> really know why the connection failed, so that could mask issues
> that the test case ought to find, too.

So we're stuck with that solution.  The disadvantage of not being
entirely sure why the connection failed could be solved if psql had
some way to report just the SQLSTATE of the last failure.  I think
there's been some discussions about saving error SQLSTATEs into a
special variable, as we do for LASTOID for instance; but it doesn't
look like that's actually been done yet.  We should revisit this
if that feature ever materializes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-21 Thread Sokolov Yura

On 2017-07-21 19:32, Robert Haas wrote:

On Fri, Jul 21, 2017 at 4:19 AM, Sokolov Yura
 wrote:


Probably with increased ring buffer there is no need in raising
vacuum_cost_limit. Will you admit it?


No, I definitely won't admit that.  With default settings autovacuum
won't write more than ~2.3MB/s if I remember the math correctly, so if
you've got a 1TB table you're probably going to need a bigger value.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


I've seed autovacuum process spending >50% of its time in fsync
(with current ring buffer) (but I used autovacuum_cost_delay=2ms).
fsync could lasts up to second on hdd if there is concurrent IO.
Even on ssd fsync could be really noticeable.

But, I agree that for 1TB table autovacuum_cost_limit still should
be increased, even with larger ring buffer.


My friend noticed, that I didn't said why I bother with autovacuum.
Our customers suffers from table bloating. I've made synthetic
bloating test, and started experiments with modifying micro- and
auto-vacuum. My first attempts were to update FSM early (both in
micro and autovacuum) and update it upto root, not only low level.

Then I looked to strace of autovacuum process, and noticed storm
of fsync. I catched backtraces with gdb rooting on fsync, and
found that evicting dirty pages from small ring buffer it the
reason.

After some experiments with combining my "early fsm update" and
size of ring buffer, I understood that increasing ring buffer
gives most of benefits: autovacuum runs faster, and bloating is
greatly reduced. On extreme case, 400mb table bloats to 17GB
on master, and only to 5GB with faster autovacuum.

I used custom scripts, and that is why my statistic is not full.
Though, I didn't found performance reduction. In fact, it looks
like tests with "larger autovacuum ring" did more queries per hour
than tests against master.

I will run pgbench for weekend, so latencies and percentiles
will be collected.

With regards,
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Locale dependency in new postgres_fdw test

2017-07-21 Thread Tom Lane
So 8bf58c0d9 immediately blew up in the buildfarm, with eg this
on jaguarundi:

***
*** 130,136 
  ALTER SERVER loopback OPTIONS (SET dbname 'no such database');
  SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;  -- should fail
  ERROR:  could not connect to server "loopback"
! DETAIL:  FATAL:  database "no such database" does not exist
  DO $d$
  BEGIN
  EXECUTE $$ALTER SERVER loopback
--- 130,136 
  ALTER SERVER loopback OPTIONS (SET dbname 'no such database');
  SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;  -- should fail
  ERROR:  could not connect to server "loopback"
! DETAIL:  FATAL:  Datenbank ?no such database? existiert nicht
  DO $d$
  BEGIN
  EXECUTE $$ALTER SERVER loopback
***

I had not realized (or forgot) that postgres_fdw allows the remote
end to run in whatever lc_messages locale is default for the remote
installation.  It's a bit surprising that none of the existing test
cases expose any remote-side messages directly, but evidently not.

We could stabilize this test result by forcing lc_messages = C in
the foreign server options.  However, that would lose regression
coverage of situations where the remote locale is different, which
doesn't seem like a terribly good thing.

Another option is to temporarily set VERBOSITY to "terse" so that
the DETAIL is suppressed from the test output.  But then we don't
really know why the connection failed, so that could mask issues
that the test case ought to find, too.

Maybe set lc_messages = C in the options only for the duration
of these new test cases?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-07-21 Thread Dean Rasheed
On 17 July 2017 at 17:37, Dean Rasheed  wrote:
> On 17 July 2017 at 16:34, Robert Haas  wrote:
>> Do you want to own this open item, then?
>>
> OK.
>
> I need to give the patch another read-through, and then I'll aim to
> push it sometime in the next few days.
>

Committed.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add a Gather executor node.

2017-07-21 Thread Tom Lane
Alvaro Herrera  writes:
> Robert Haas wrote:
>> I think those top-of-file lists are just annoying, and would prefer to
>> rip them out entirely rather than spend time maintaining them.

> +1

To the extent that they're just lists of function names, +1.  Some of
them have some documentation in them, which you'd need to make sure
is duplicative or else copy it to an appropriate place.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-07-21 Thread Jim Mlodgenski
>
> > Can a user do anything remotely interesting or useful without hitting
> either
> > ExecutorStart_hook or ProcessUtility_hook? They can parse queries I guess
> > but you could just set your hook up in the parser instead. If you hook
> the
> > parser all they can do is open an idle session and sit there...
>
> That's an exceedingly-weak argument for rejecting this patch.  The
> fact that you can probably hack around the lack of a hook for most
> reasonable use cases is not an argument for having a hook that does
> what people actually want to do.
>
>
When I first saw this thread, my initial thought of a use case is to
prepare some key application queries so they are there and ready to go.
That would need to be before the ExecutorStart_hook or ProcessUtility_hook
if an app would just want to execute the prepared statement.


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-21 Thread Joshua D. Drake

On 07/20/2017 08:58 PM, Joshua D. Drake wrote:

On 07/19/2017 07:57 PM, Tom Lane wrote:

Peter Geoghegan  writes:



Test 1: 55G/srv/main
TPS:955

Test 2: 112G/srv/main
TPS:531 (Not sure what happened here, long checkpoint?)

Test 3: 109G/srv/main
TPS:868

Test 4: 143G
TPS:840

Test 5: 154G
TPS: 722

I am running the query here:

https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index 


This query didn't finish after 12 hours. Here is the new set:

name |  setting
-+---
 autovacuum  | on
 autovacuum_analyze_scale_factor | 0.1
 autovacuum_analyze_threshold| 50
 autovacuum_freeze_max_age   | 2
 autovacuum_max_workers  | 12
 autovacuum_multixact_freeze_max_age | 4
 autovacuum_naptime  | 10
 autovacuum_vacuum_cost_delay| 0
 autovacuum_vacuum_cost_limit| 5000
 autovacuum_vacuum_scale_factor  | 0.1
 autovacuum_vacuum_threshold | 50
 autovacuum_work_mem | -1
 log_autovacuum_min_duration | -1

I have only ran one test but it is pretty telling:

Test 1: 60G /srv/main
TPS: 914

Test 2: 92G /srv/main
TPS: Still running

I will post a update after the third or fourth test depending on the 
numbers. I created this instance exactly for these tests so if someone 
wants to poke around I can give access.


Thanks,

JD


--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PgFDW connection invalidation by ALTER SERVER/ALTER USER MAPPING

2017-07-21 Thread Tom Lane
Ashutosh Bapat  writes:
> On Fri, Jul 21, 2017 at 10:55 AM, Kyotaro HORIGUCHI
>  wrote:
>> The attached patch differs only in this point.

> +1. The patch looks good to me.

Pushed with a couple additional changes: we'd all missed that the header
comment for GetConnection was obsoleted by this change, and the arguments
for GetSysCacheHashValue really need to be coerced to Datum.  (I think
OID to Datum is the same as what the compiler would do anyway, but best
not to assume that.)

Back-patching was more exciting than I could wish.  It seems that
before 9.6, we did not have struct UserMapping storing the OID of the
pg_user_mapping row it had been made from.  I changed GetConnection to
re-look-up that row and get the OID.  But that's ugly, and there's a
race condition: if user mappings are being added or deleted meanwhile,
we might locate a per-user mapping when we're really using a PUBLIC
mapping or vice versa, causing the ConnCacheEntry to be labeled with
the wrong hash value so that it might not get invalidated properly later.
Still, it's significantly better than it was, and that corner case seems
unlikely to get hit in practice --- for one thing, you'd have to then
revert the mapping addition/deletion before the ConnCacheEntry would be
found and used again.  I don't want to take the risk of modifying struct
UserMapping in stable branches, so it's hard to see a way to make that
completely bulletproof before 9.6.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-07-21 Thread Robert Haas
On Fri, Jul 21, 2017 at 11:10 AM, Craig Ringer  wrote:
> Don't we have that timestamp already?
>
> What practical use cases are there for acting post-auth but that can't wait
> until the user tries to do something?

Have, yes; record, no.

> Can a user do anything remotely interesting or useful without hitting either
> ExecutorStart_hook or ProcessUtility_hook? They can parse queries I guess
> but you could just set your hook up in the parser instead. If you hook the
> parser all they can do is open an idle session and sit there...

That's an exceedingly-weak argument for rejecting this patch.  The
fact that you can probably hack around the lack of a hook for most
reasonable use cases is not an argument for having a hook that does
what people actually want to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2017-07-21 Thread Marina Polyakova

Hello again!

Here is the third version of the patch for pgbench thanks to Fabien 
Coelho comments. As in the previous one, transactions with serialization 
and deadlock failures are rolled back and retried until they end 
successfully or their number of tries reaches maximum.


Differences from the previous version:
* Some code cleanup :) In particular, the Variables structure for 
managing client variables and only one new tap tests file (as they were 
recommended here [1] and here [2]).
* There's no error if the last transaction in the script is not 
completed. But the transactions started in the previous scripts and/or 
not ending in the current script, are not rolled back and retried after 
the failure. Such script try is reported as failed because it contains a 
failure that was not rolled back and retried.
* Usually the retries and/or failures are printed if they are not equal 
to zeros. In transaction/aggregation logs the failures are always 
printed and the retries are printed if max_tries is greater than 1. It 
is done for the general format of the log during the execution of the 
program.


Patch is attached. Any suggestions are welcome!

[1] 
https://www.postgresql.org/message-id/alpine.DEB.2.20.1707121338090.12795%40lancre
[2] 
https://www.postgresql.org/message-id/alpine.DEB.2.20.1707121142300.12795%40lancre


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres CompanyFrom 0ee372e93b8d7017563d2f2f55357c39c08a Mon Sep 17 00:00:00 2001
From: Marina Polyakova 
Date: Fri, 21 Jul 2017 17:57:58 +0300
Subject: [PATCH v3] Pgbench Retry transactions with serialization or deadlock
 errors

Now transactions with serialization or deadlock failures can be rolled back and
retried again and again until they end successfully or their number of tries
reaches maximum. You can set the maximum number of tries by using the
appropriate benchmarking option (--max-tries). The default value is 1. If
there're retries and/or failures their statistics are printed in the progress,
in the transaction / aggregation logs and in the end with other results (all and
for each script). A transaction failure is reported here only if the last try of
this transaction fails. Also retries and/or failures are printed per-command
with average latencies if you use the appropriate benchmarking option
(--report-per-command, -r) and the total number of retries and/or failures is
not zero.

Note that the transactions started in the previous scripts and/or not ending in
the current script, are not rolled back and retried after the failure. Such
script try is reported as failed because it contains a failure that was not
rolled back and retried.
---
 doc/src/sgml/ref/pgbench.sgml  | 240 +-
 src/bin/pgbench/pgbench.c  | 872 +
 .../t/002_serialization_and_deadlock_failures.pl   | 459 +++
 3 files changed, 1412 insertions(+), 159 deletions(-)
 create mode 100644 src/bin/pgbench/t/002_serialization_and_deadlock_failures.pl

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 64b043b..3bbeec5 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -49,6 +49,7 @@
 
 
 transaction type: builtin: TPC-B (sort of)
+transaction maximum tries number: 1
 scaling factor: 10
 query mode: simple
 number of clients: 10
@@ -59,7 +60,7 @@ tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 
 
-  The first six lines report some of the most important parameter
+  The first seven lines report some of the most important parameter
   settings.  The next line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
@@ -436,22 +437,33 @@ pgbench  options  dbname
 Show progress report every sec seconds.  The report
 includes the time since the beginning of the run, the tps since the
 last report, and the transaction latency average and standard
-deviation since the last report.  Under throttling (-R),
-the latency is computed with respect to the transaction scheduled
-start time, not the actual transaction beginning time, thus it also
-includes the average schedule lag time.
+deviation since the last report.  If since the last report there are
+transactions that ended with serialization/deadlock failures they are
+also reported here as failed (see
+ for more information).  Under
+throttling (-R), the latency is computed with respect to the
+transaction scheduled start time, not the actual transaction beginning
+time, thus it also includes the average schedule lag time.  If since the
+last report there're transactions that have been rolled back and retried
+  

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-21 Thread Robert Haas
On Fri, Jul 21, 2017 at 4:19 AM, Sokolov Yura
 wrote:
> You are one of leadership. I know it is not your job to test every tiny
> change a school boy proposed. But here is a lot of people, who waits for
> your word. Instead of cooling rush and closing discussions, you may just
> say: "please, someone test it with that particular workload".

I had no intention of cooling rush and closing discussions.  I was
trying to help you understand what points you needed to address in
order to have a chance of getting this committed.  I feel like I came
into this discussion to try to help you make some progress on this
issue, and instead of appreciating that, you're making me the bad guy.

> When there is no garbage, increasing autovacuum ring buffer changes almost
> nothing. When there is garbage, current small ring buffer leads to a storm
> of fsyncs. Frequent fsyncs slows down hdd a lot, and then hdd isn't capable
> to satisfy queries and refill OS cache. Will you admit it?

I haven't tested it, but it sounds believable.

>> I've also run into many customers whose problem that vacuum ran too
>> slowly, and generally raising vacuum_cost_limit fixes that problem just
>> fine.
>
> Probably with increased ring buffer there is no need in raising
> vacuum_cost_limit. Will you admit it?

No, I definitely won't admit that.  With default settings autovacuum
won't write more than ~2.3MB/s if I remember the math correctly, so if
you've got a 1TB table you're probably going to need a bigger value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add a Gather executor node.

2017-07-21 Thread Andres Freund


On July 21, 2017 5:34:32 PM GMT+02:00, Alvaro Herrera 
 wrote:
>Robert Haas wrote:
>
>> I think those top-of-file lists are just annoying, and would prefer
>to
>> rip them out entirely rather than spend time maintaining them.
>
>+1

I'm quite sympathetic to that view. But I think it's either them, or ripping 
out, not leaving bit rot.

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-21 Thread Sokolov Yura

On 2017-07-20 22:51, Tom Lane wrote:

Robert Haas  writes:

I think that's a valid point.  There are also other concerns here -
e.g. whether instead of adopting the patch as proposed we ought to (a)
use some smaller size, or (b) keep the size as-is but reduce the
maximum fraction of shared_buffers that can be consumed, or (c) divide
the ring buffer size through by autovacuum_max_workers.  Personally,
of those approaches, I favor (b).  I think a 16MB ring buffer is
probably just fine if you've got 8GB of shared_buffers but I'm
skeptical about it when you've got 128MB of shared_buffers.


WFM.  I agree with *not* dividing the basic ring buffer size by
autovacuum_max_workers.  If you have allocated more AV workers, I think
you expect AV to go faster, not for the workers to start fighting among
themselves.

It might, however, be reasonable for the fraction-of-shared-buffers
limitation to have something to do with autovacuum_max_workers, so that
you can't squeeze yourself out of shared_buffers if you set that number
really high.  IOW, I think the upthread suggestion of
min(shared_buffers/8/autovacuum_workers, 16MB) is basically the right
idea, though we could debate the exact constants.

regards, tom lane


Attached version is with min(shared_buffers/8/autovacuum_workers, 16MB).

With regards
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres CompanyFrom 8ebd5e7eb498fdc75fc7b724ace1f6de8fbcf3fd Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Tue, 18 Jul 2017 12:33:33 +0300
Subject: [PATCH] Set total vacuum ring buffer 16MB

Vacuum suffers a lot from small ring buffer in a way bulk writer
suffered before Tom Lane's fix at 6382448cf96:
> the smaller size resulted in an undesirable decrease in bulk data
> loading speed, due to COPY processing frequently getting blocked
> for WAL flushing.

During discussion were decided to set it to
min(shared_buffers/8/autovacuum_max_workers, 16MB), so that many autovacuum
workers will not consume significant part of shared buffers.
---
 src/backend/storage/buffer/freelist.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 9d8ae6ae8e..da83cd155b 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -16,6 +16,7 @@
 #include "postgres.h"
 
 #include "port/atomics.h"
+#include "postmaster/autovacuum.h"
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
 #include "storage/proc.h"
@@ -526,6 +527,7 @@ GetAccessStrategy(BufferAccessStrategyType btype)
 {
 	BufferAccessStrategy strategy;
 	int			ring_size;
+	int			n;
 
 	/*
 	 * Select ring size to use.  See buffer/README for rationales.
@@ -541,12 +543,15 @@ GetAccessStrategy(BufferAccessStrategyType btype)
 
 		case BAS_BULKREAD:
 			ring_size = 256 * 1024 / BLCKSZ;
+			n = 1;
 			break;
 		case BAS_BULKWRITE:
 			ring_size = 16 * 1024 * 1024 / BLCKSZ;
+			n = 1;
 			break;
 		case BAS_VACUUM:
-			ring_size = 256 * 1024 / BLCKSZ;
+			ring_size = 16 * 1024 * 1024 / BLCKSZ;
+			n = autovacuum_max_workers;
 			break;
 
 		default:
@@ -556,7 +561,7 @@ GetAccessStrategy(BufferAccessStrategyType btype)
 	}
 
 	/* Make sure ring isn't an undue fraction of shared buffers */
-	ring_size = Min(NBuffers / 8, ring_size);
+	ring_size = Min(NBuffers / 8 / n, ring_size);
 
 	/* Allocate the object and initialize all elements to zeroes */
 	strategy = (BufferAccessStrategy)
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add a Gather executor node.

2017-07-21 Thread Alvaro Herrera
Robert Haas wrote:

> I think those top-of-file lists are just annoying, and would prefer to
> rip them out entirely rather than spend time maintaining them.

+1

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-07-21 Thread Fabrízio de Royes Mello
On Fri, Jul 21, 2017 at 10:58 AM, Yugo Nagata  wrote:
>
> On Fri, 21 Jul 2017 10:31:57 -0300
> Fabrízio de Royes Mello  wrote:
>
> > On Fri, Jul 21, 2017 at 9:35 AM, Yugo Nagata 
wrote:
> > >
> > > On Fri, 21 Jul 2017 09:53:19 +0800
> > > Craig Ringer  wrote:
> > >
> > > > On 21 July 2017 at 08:42, Robert Haas  wrote:
> > > >
> > > > > On Thu, Jul 20, 2017 at 8:27 AM, Fabrízio de Royes Mello
> > > > >  wrote:
> > > > > > I'm not sure your real needs but doesn't it material for improve
> > Event
> > > > > > Triggers???
> > > > >
> > > > > I've thought about that, too.  One problem is what to do if the
user
> > > > > hits ^C while the event trigger procedure is running.  If you
respond
> > > > > to that by killing the event trigger and letting the user issue
> > > > > commands, then the event trigger can't be used for security or
> > > > > auditing purposes because the user might prevent it from doing
> > > > > whatever it's intended to do with a well-timed interrupt.  If you
> > > > > ignore ^C or make it turn into FATAL, then a poorly-crafted
trigger
> > > > > can lock users out of the database.  Maybe that's OK.  We could
say
> > > > > "well, if you lock yourself out of the database with your logon
> > > > > trigger, you get to shut down the database and restart in single
user
> > > > > mode to recover".
> > > > >
> > > > > A hook, as proposed here, is a lot simpler and lacks these
concerns.
> > > > > Installing code in C into the database is intrinsically risky
> > > > > anywhere, and not any moreso here than elsewhere.  But it's also
less
> > > > > accessible to the average user.
> > > > > 
> > > >
> > > >
> > > > I'd favour the c hook personally. It's a lot more flexible, and can
be
> > used
> > > > by an extension to implement trigger-like behaviour if anyone wants
it,
> > > > including the extension's choice of error handling decisions.
> > > >
> > > > It's also a lot simpler and less intrusive for core. Which is nice
> > where we
> > > > don't have something that we don't have anything compelling
destined for
> > > > core that needs it. (I want to add a bunch of hooks in the logical
> > > > replication code in pg11 for similar reasons, and so features like
DDL
> > > > replication can be prototyped as extensions more practically).
> > > >
> >
> > I agree with you both...
> >
> >
> > > > That said, isn't ExecutorStart_hook + ProcessUtility_hook able to
serve
> > the
> > > > same job as a session-start hook, albeit at slightly higher
overhead?
> > You
> > > > can just test to see if your initial tasks have run yet.
> > >
> > > Thank you for your suggestion. Certainly, we can do the similar job
of a
> > > session-start hook using these existing hooks, although these hooks
are
> > > triggered when the first query is executed not when the session is
> > started.
> > > Now I come to think that an additional hook is not need.
> > >
> >
> > As Nagata said hooks proposed by Craing will happens only when the first
> > query is called so I don't know how it works for session start... are we
> > missing something?
>
> Yes, ExecutorStart_hook + ProcessUtility_hook is not strictly same as
> session_start hook. If a query is issued a long time since the session
start,
> the timing the hook happens is largely deviated. It is no problem if we
only
> want do something once at the session start, but it might be problem if
> we want to record the timestamp of the session start, for example.
>
> >
> > If we're going to add this hook what about add a session end hook also?
>
> If someone want the session-start hook, he might want this too.
>

Well if someone wants here are the patches... I just did a minor fix and
cleanup in your previous session_start sample and provide both samples into
the same patch.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b8d860e..7a1fa3b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -160,6 +160,9 @@ static bool RecoveryConflictPending = false;
 static bool RecoveryConflictRetryable = true;
 static ProcSignalReason RecoveryConflictReason;
 
+/* Hook for plugins to get control at start of session */
+session_start_hook_type session_start_hook = NULL;
+
 /* 
  *		decls for routines only used in this file
  * 
@@ -3808,6 +3811,9 @@ PostgresMain(int argc, char *argv[],
 	if (!IsUnderPostmaster)
 		PgStartTime = GetCurrentTimestamp();
 
+	if 

Re: [HACKERS] [PATCH] A hook for session start

2017-07-21 Thread Craig Ringer
On 21 Jul. 2017 21:58, "Yugo Nagata"  wrote:

On Fri, 21 Jul 2017 10:31:57 -0300
Fabrízio de Royes Mello  wrote:

> On Fri, Jul 21, 2017 at 9:35 AM, Yugo Nagata  wrote:
> >
> > On Fri, 21 Jul 2017 09:53:19 +0800
> > Craig Ringer  wrote:
> >
> > > On 21 July 2017 at 08:42, Robert Haas  wrote:
> > >
> > > > On Thu, Jul 20, 2017 at 8:27 AM, Fabrízio de Royes Mello
> > > >  wrote:
> > > > > I'm not sure your real needs but doesn't it material for improve
> Event
> > > > > Triggers???
> > > >
> > > > I've thought about that, too.  One problem is what to do if the user
> > > > hits ^C while the event trigger procedure is running.  If you
respond
> > > > to that by killing the event trigger and letting the user issue
> > > > commands, then the event trigger can't be used for security or
> > > > auditing purposes because the user might prevent it from doing
> > > > whatever it's intended to do with a well-timed interrupt.  If you
> > > > ignore ^C or make it turn into FATAL, then a poorly-crafted trigger
> > > > can lock users out of the database.  Maybe that's OK.  We could say
> > > > "well, if you lock yourself out of the database with your logon
> > > > trigger, you get to shut down the database and restart in single
user
> > > > mode to recover".
> > > >
> > > > A hook, as proposed here, is a lot simpler and lacks these concerns.
> > > > Installing code in C into the database is intrinsically risky
> > > > anywhere, and not any moreso here than elsewhere.  But it's also
less
> > > > accessible to the average user.
> > > > 
> > >
> > >
> > > I'd favour the c hook personally. It's a lot more flexible, and can be
> used
> > > by an extension to implement trigger-like behaviour if anyone wants
it,
> > > including the extension's choice of error handling decisions.
> > >
> > > It's also a lot simpler and less intrusive for core. Which is nice
> where we
> > > don't have something that we don't have anything compelling destined
for
> > > core that needs it. (I want to add a bunch of hooks in the logical
> > > replication code in pg11 for similar reasons, and so features like DDL
> > > replication can be prototyped as extensions more practically).
> > >
>
> I agree with you both...
>
>
> > > That said, isn't ExecutorStart_hook + ProcessUtility_hook able to
serve
> the
> > > same job as a session-start hook, albeit at slightly higher overhead?
> You
> > > can just test to see if your initial tasks have run yet.
> >
> > Thank you for your suggestion. Certainly, we can do the similar job of a
> > session-start hook using these existing hooks, although these hooks are
> > triggered when the first query is executed not when the session is
> started.
> > Now I come to think that an additional hook is not need.
> >
>
> As Nagata said hooks proposed by Craing will happens only when the first
> query is called so I don't know how it works for session start... are we
> missing something?

Yes, ExecutorStart_hook + ProcessUtility_hook is not strictly same as
session_start hook. If a query is issued a long time since the session
start,
the timing the hook happens is largely deviated. It is no problem if we only
want do something once at the session start, but it might be problem if
we want to record the timestamp of the session start, for example.


Don't we have that timestamp already?

What practical use cases are there for acting post-auth but that can't wait
until the user tries to do something?

Can a user do anything remotely interesting or useful without hitting
either ExecutorStart_hook or ProcessUtility_hook? They can parse queries I
guess but you could just set your hook up in the parser instead. If you
hook the parser all they can do is open an idle session and sit there...

So given that you can effectively do it already at the C hook level, if
you're going to do it at all I guess it it'd be more interesting to expose
a convenient event trigger for session start. As others suggested upthread.
So it's easy for DBAs and devs who won't have any idea where to start
writing extensions that register hooks.

But... I think you need a good use case. Such a trigger would have no way
to receive parameters from the user (except custom GUCs) or report any sort
of result other than an error/warning/notice. So what's it going to do that
can't already be decided by pg_hba.cond, pg_authid etc?


Re: [HACKERS] [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

2017-07-21 Thread Greg Stark
On 20 July 2017 at 14:19, Tom Lane  wrote:
> Greg Stark  writes:
>
>> Would it be useful to keep in one of the memory checking assertion builds?
>
> Why?  Code that expects to continue accessing a relcache entry's tupdesc
> after closing the relcache entry is broken, independently of whether it's
> in a debug build or not.

Mea Culpa. I hadn't actually read the code and assumed it would be
sensible to change from something that freed these tupdescs into
something that raised an assertion if they were still unfreed at end
of transaction.

Reading the code I see that it's only there to *avoid* freeing the
tupledesc just in case there's something still using it. If we just
free it then the normal memory checking builds would catch cases where
they're used after being freed.

But what I still don't understand is whether the fact that it still
passes the regression tests means anything. Unless there happened to
be a sinval message at the wrong time the regression tests wouldn't
uncover any problem cases. If it passed the tests on a
CLOBBER_CACHE_ALWAYS build then perhaps that would prove it's safe? Or
perhaps if the columns haven't actually been altered it would still
fail to fail?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Precalculate stable functions, infrastructure v1

2017-07-21 Thread Marina Polyakova

Like for the previous patches it seems that there is
no obvious performance degradation too on regular queries (according 
to

pgbench).


pgbench probably isn't a very good test for this sort of thing - it
only issues very short-running queries where the cost of evaluating
expressions is a relatively small part of the total cost.  Even if
things get worse, I'm not sure if you'd see it.


If there's a mistake, for example, more than 1 try to replace cached 
expressions in the whole query tree, results of "in buffer test" or 
"mostly cache test" can different a little..



I'm not sure exactly
how you could construct a test case that could be harmed by this patch
- I guess you'd want to initialize lots of CacheExprs but never make
use of the caching usefully?


As I mentioned in the first letter about this feature it will be useful 
for such text search queries [1]:


SELECT COUNT(*) FROM messages WHERE body_tsvector @@ 
to_tsquery('postgres');


And I'm not sure that it is logical to precalculate stable and immutable 
functions themselves, but not to precalculate expressions that behave 
like stable/immutable functions; precalculate some types of operators 
and not to precalculate others (ScalarArrayOpExpr, RowCompareExpr). My 
patch solves the problem that not all nodes are simplified in 
eval_const_expressions_mutator (for example, ScalarArrayOpExpr) and 
consts of other types now behave more like ordinary consts (for example, 
composite types, coerce expressions, ConvertRowtypeExpr).



It could also be useful to test things like TPC-H to see if you get an
improvement.


I saw the examples of queries in TPC-H tests. If I'm not wrong they are 
not the target tests for this functionality (nothing will be 
precalculated). But it's a good idea to check that there's no a 
performance degradation on them too.


[1] 
https://www.postgresql.org/message-id/ba261b9fc25dea4069d8ba9a8fcadf35%40postgrespro.ru


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More optimization effort?

2017-07-21 Thread Tom Lane
Tatsuo Ishii  writes:
> Currently following query does not use an index:
> test=# explain select * from pgbench_accounts where aid*100 < 1;
> While following one does use the index.
> test=# explain select * from pgbench_accounts where aid < 1/100;

> Is it worth to make our optimizer a little bit smarter to convert the
> the first query into the second form?

That's a rather ambitious definition of "little bit" smarter.

For starters, I'm not sure those queries are even fully equivalent
w.r.t. issues like overflow and roundoff.  While a person might
decide that the transformation is OK anyway, I'm not sure that the
optimizer should be allowed to take such liberties.

But the bigger picture is that doing something that helps to any
useful extent would require a really substantial amount of new,
datatype- and operator-specific knowledge that doesn't exist in the
system today.  And as Craig noted, applying that knowledge would
be expensive, even in cases where it failed to help.

So, color me skeptical ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make ExplainBeginGroup()/ExplainEndGroup() public.

2017-07-21 Thread Hadi Moshayedi
Title should have been "Make ExplainOpenGroup()/ExplainCloseGroup()
public.".

Sorry for the misspell.


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-21 Thread Joshua D. Drake

On 07/20/2017 11:54 PM, Sokolov Yura wrote:

On 2017-07-21 06:58, Joshua D. Drake wrote:

On 07/19/2017 07:57 PM, Tom Lane wrote:

Peter Geoghegan  writes:





--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


Have you measured increased vacuum ring buffer?


No, not yet. I think we are still in the proving the problem stage.

JD



This will require recompilation, though.

With regards,



--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Make ExplainBeginGroup()/ExplainEndGroup() public.

2017-07-21 Thread Hadi Moshayedi
Hello,

The attached patch moves declarations of
ExplainBeginGroup()/ExplainEndGroup() from explain.c to explain.h.

This can be useful for extensions that need explain groups in their
custom-scan explain output.

For example, Citus uses groups in its custom explain outputs [1]. But it
achieves it by having a copy of
ExplainBeginGroup()/ExplainEndGroup() in its source code, which is not the
best way.

Please review.

[1]
https://github.com/citusdata/citus/blob/master/src/backend/distributed/planner/multi_explain.c

Thanks,
Hadi
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7648201218..46467e1045 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -124,10 +124,6 @@ static void ExplainCustomChildren(CustomScanState *css,
 	  List *ancestors, ExplainState *es);
 static void ExplainProperty(const char *qlabel, const char *value,
 bool numeric, ExplainState *es);
-static void ExplainOpenGroup(const char *objtype, const char *labelname,
- bool labeled, ExplainState *es);
-static void ExplainCloseGroup(const char *objtype, const char *labelname,
-  bool labeled, ExplainState *es);
 static void ExplainDummyGroup(const char *objtype, const char *labelname,
   ExplainState *es);
 static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
@@ -3213,7 +3209,7 @@ ExplainPropertyBool(const char *qlabel, bool value, ExplainState *es)
  * If labeled is true, the group members will be labeled properties,
  * while if it's false, they'll be unlabeled objects.
  */
-static void
+void
 ExplainOpenGroup(const char *objtype, const char *labelname,
  bool labeled, ExplainState *es)
 {
@@ -3276,7 +3272,7 @@ ExplainOpenGroup(const char *objtype, const char *labelname,
  * Close a group of related objects.
  * Parameters must match the corresponding ExplainOpenGroup call.
  */
-static void
+void
 ExplainCloseGroup(const char *objtype, const char *labelname,
   bool labeled, ExplainState *es)
 {
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 78822b766a..543b2bb0c6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -101,4 +101,9 @@ extern void ExplainPropertyFloat(const char *qlabel, double value, int ndigits,
 extern void ExplainPropertyBool(const char *qlabel, bool value,
 	ExplainState *es);
 
+extern void ExplainOpenGroup(const char *objtype, const char *labelname,
+ bool labeled, ExplainState *es);
+extern void ExplainCloseGroup(const char *objtype, const char *labelname,
+  bool labeled, ExplainState *es);
+
 #endif			/* EXPLAIN_H */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-07-21 Thread Yugo Nagata
On Fri, 21 Jul 2017 10:31:57 -0300
Fabrízio de Royes Mello  wrote:

> On Fri, Jul 21, 2017 at 9:35 AM, Yugo Nagata  wrote:
> >
> > On Fri, 21 Jul 2017 09:53:19 +0800
> > Craig Ringer  wrote:
> >
> > > On 21 July 2017 at 08:42, Robert Haas  wrote:
> > >
> > > > On Thu, Jul 20, 2017 at 8:27 AM, Fabrízio de Royes Mello
> > > >  wrote:
> > > > > I'm not sure your real needs but doesn't it material for improve
> Event
> > > > > Triggers???
> > > >
> > > > I've thought about that, too.  One problem is what to do if the user
> > > > hits ^C while the event trigger procedure is running.  If you respond
> > > > to that by killing the event trigger and letting the user issue
> > > > commands, then the event trigger can't be used for security or
> > > > auditing purposes because the user might prevent it from doing
> > > > whatever it's intended to do with a well-timed interrupt.  If you
> > > > ignore ^C or make it turn into FATAL, then a poorly-crafted trigger
> > > > can lock users out of the database.  Maybe that's OK.  We could say
> > > > "well, if you lock yourself out of the database with your logon
> > > > trigger, you get to shut down the database and restart in single user
> > > > mode to recover".
> > > >
> > > > A hook, as proposed here, is a lot simpler and lacks these concerns.
> > > > Installing code in C into the database is intrinsically risky
> > > > anywhere, and not any moreso here than elsewhere.  But it's also less
> > > > accessible to the average user.
> > > > 
> > >
> > >
> > > I'd favour the c hook personally. It's a lot more flexible, and can be
> used
> > > by an extension to implement trigger-like behaviour if anyone wants it,
> > > including the extension's choice of error handling decisions.
> > >
> > > It's also a lot simpler and less intrusive for core. Which is nice
> where we
> > > don't have something that we don't have anything compelling destined for
> > > core that needs it. (I want to add a bunch of hooks in the logical
> > > replication code in pg11 for similar reasons, and so features like DDL
> > > replication can be prototyped as extensions more practically).
> > >
> 
> I agree with you both...
> 
> 
> > > That said, isn't ExecutorStart_hook + ProcessUtility_hook able to serve
> the
> > > same job as a session-start hook, albeit at slightly higher overhead?
> You
> > > can just test to see if your initial tasks have run yet.
> >
> > Thank you for your suggestion. Certainly, we can do the similar job of a
> > session-start hook using these existing hooks, although these hooks are
> > triggered when the first query is executed not when the session is
> started.
> > Now I come to think that an additional hook is not need.
> >
> 
> As Nagata said hooks proposed by Craing will happens only when the first
> query is called so I don't know how it works for session start... are we
> missing something?

Yes, ExecutorStart_hook + ProcessUtility_hook is not strictly same as
session_start hook. If a query is issued a long time since the session start,
the timing the hook happens is largely deviated. It is no problem if we only
want do something once at the session start, but it might be problem if
we want to record the timestamp of the session start, for example.

> 
> If we're going to add this hook what about add a session end hook also?

If someone want the session-start hook, he might want this too.

> 
> Regards,
> 
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
> >> Timbira: http://www.timbira.com.br
> >> Blog: http://fabriziomello.github.io
> >> Linkedin: http://br.linkedin.com/in/fabriziomello
> >> Twitter: http://twitter.com/fabriziomello
> >> Github: http://github.com/fabriziomello


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add a Gather executor node.

2017-07-21 Thread Robert Haas
On Fri, Jul 21, 2017 at 7:06 AM, Andres Freund  wrote:
> On 2015-09-30 23:29:30 +, Robert Haas wrote:
>> Add a Gather executor node.
>> ...
>> src/backend/executor/execProcnode.c   |   46 
>
> I just noticed that this added a new execProcnode dispatch routine, but
> didn't add that to the file's header
>
>  *   INTERFACE ROUTINES
>  *  ExecInitNode-   initialize a plan node and 
> its subplans
>  *  ExecProcNode-   get a tuple by executing the 
> plan node
>  *  ExecEndNode -   shut down a plan node 
> and its subplans
>  *

I think those top-of-file lists are just annoying, and would prefer to
rip them out entirely rather than spend time maintaining them.  If you
want to see the list of interface routines, look in the header file.

heapam.c's header, for example, is missing try_relation_open,
relation_openrv_extended, heap_openrv_extended,
heap_beginscan_catalog, heap_beginscan_strat, heap_beginscan_bm,
heap_beginscan_sampling, heap_setscanlimits, heapgetpage,
heap_rescan_set_params, heap_parallelscan_estimate,
heap_parallelscan_initialize, heap_beginscan_parallel,
heap_hot_search_buffer, heap_hot_search, setLastTid,
GetBulkInsertState, FreeBulkInsertState, ReleaseBulkInsertState,
heap_finish_speculative, heap_abort_speculative, heap_lock_tuple,
heap_inplace_update, heap_tuple_needs_freeze,
heap_tuple_needs_eventual_freeze, simple_heap_insert,
simple_heap_update, simple_heap_delete, and heap_update_snapshot,
which means that if this comment was ever correct, it was more than a
decade ago.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-21 Thread Alexander Korotkov
On Wed, Jul 19, 2017 at 11:08 PM, Alvaro Herrera 
wrote:

> I'm not entirely sure what's the best way to deal with the polymorphic
> problem, but on the other hand as Robert says downthread maybe we
> shouldn't be solving it at this stage anyway.  So let's step back a bit,
> get a patch that works for the case where the types match on both sides
> of the FK, then we review that patch; if all is well, we can discuss the
> other problem as a stretch goal.


+1
Regular FK functionality have type restrictions based on btree opfamilies
and implicit casts.  Array FK should not necessary have the same type
restrictions.  Also, we don't necessary need to make those restrictions as
soft as possible during this GSoC project.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-07-21 Thread Alexander Korotkov
Hi, Alexey!

On Fri, Jul 21, 2017 at 3:05 PM, Alexey Chernyshov <
a.chernys...@postgrespro.ru> wrote:

> the following patch transfers functionality from gevel module
> (http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for
> analyzing GIN and GiST indexes to pageinspect. Gevel was originally
> designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and
> GIN indexes.
>

It's very good that you've picked up this work!  pageinspect is lacking of
this functionality.

Functions added:
>  - gist_stat(text) - shows statistics on GiST Tree
>  - gist_tree(text) - shows GiST tree
>  - gist_tree(text, int4) - shows GiST tree up to MAXLEVEL
>  - gist_print(text) - prints objects stored in GiST tree
>  - spgist_stat(text) - shows statistics on SP-GiST
>  - spgist_print(text) - prints objects stored in index
>  - gin_value_count() - originally gin_stat(text) - prints estimated counts
> for index values
>  - gin_stats() - originally gin_statpage(text) - shows statistics
>  - gin_count_estimate(text, tsquery) - shows number of indexed rows matched
> query
>
> Tests also transferred, docs for new functions are added. I run pgindent
> over the code, but the result is different from those I expected, so I
> leave
> pgindented one.
> The patch is applicable to the commit 866f4a7c210857aa342bf901558d17
> 0325094dde.
>

As we can see, gevel contains functions which analyze the whole index.
 pageinspect is written in another manner: it gives you functionality to
analyze individual pages, tuples and so on.
Thus, we probably shouldn't try to move gevel functions to pageinspect "as
is".  They should be rewritten in more granular manner as well as other
pageinspact functions are.  Any other opinions?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] A hook for session start

2017-07-21 Thread Fabrízio de Royes Mello
On Fri, Jul 21, 2017 at 9:35 AM, Yugo Nagata  wrote:
>
> On Fri, 21 Jul 2017 09:53:19 +0800
> Craig Ringer  wrote:
>
> > On 21 July 2017 at 08:42, Robert Haas  wrote:
> >
> > > On Thu, Jul 20, 2017 at 8:27 AM, Fabrízio de Royes Mello
> > >  wrote:
> > > > I'm not sure your real needs but doesn't it material for improve
Event
> > > > Triggers???
> > >
> > > I've thought about that, too.  One problem is what to do if the user
> > > hits ^C while the event trigger procedure is running.  If you respond
> > > to that by killing the event trigger and letting the user issue
> > > commands, then the event trigger can't be used for security or
> > > auditing purposes because the user might prevent it from doing
> > > whatever it's intended to do with a well-timed interrupt.  If you
> > > ignore ^C or make it turn into FATAL, then a poorly-crafted trigger
> > > can lock users out of the database.  Maybe that's OK.  We could say
> > > "well, if you lock yourself out of the database with your logon
> > > trigger, you get to shut down the database and restart in single user
> > > mode to recover".
> > >
> > > A hook, as proposed here, is a lot simpler and lacks these concerns.
> > > Installing code in C into the database is intrinsically risky
> > > anywhere, and not any moreso here than elsewhere.  But it's also less
> > > accessible to the average user.
> > > 
> >
> >
> > I'd favour the c hook personally. It's a lot more flexible, and can be
used
> > by an extension to implement trigger-like behaviour if anyone wants it,
> > including the extension's choice of error handling decisions.
> >
> > It's also a lot simpler and less intrusive for core. Which is nice
where we
> > don't have something that we don't have anything compelling destined for
> > core that needs it. (I want to add a bunch of hooks in the logical
> > replication code in pg11 for similar reasons, and so features like DDL
> > replication can be prototyped as extensions more practically).
> >

I agree with you both...


> > That said, isn't ExecutorStart_hook + ProcessUtility_hook able to serve
the
> > same job as a session-start hook, albeit at slightly higher overhead?
You
> > can just test to see if your initial tasks have run yet.
>
> Thank you for your suggestion. Certainly, we can do the similar job of a
> session-start hook using these existing hooks, although these hooks are
> triggered when the first query is executed not when the session is
started.
> Now I come to think that an additional hook is not need.
>

As Nagata said hooks proposed by Craing will happens only when the first
query is called so I don't know how it works for session start... are we
missing something?

If we're going to add this hook what about add a session end hook also?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Adding -E switch to pg_dumpall

2017-07-21 Thread Michael Paquier
On Thu, Jul 20, 2017 at 11:17 AM, Fabien COELHO  wrote:
> Ok for me. I switched the status to "Ready for committers".

Thanks for the review, Fabien.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make sure all statistics is sent after a few DML are performed

2017-07-21 Thread Yugo Nagata
On Fri, 21 Jul 2017 04:58:47 -0700
Andres Freund  wrote:

> Hi,
> 
> (please don't top-reply on this list)
> 
> On 2017-07-19 14:04:39 +0900, Yugo Nagata wrote:
> > On Tue, 18 Jul 2017 10:10:49 -0400
> > Tom Lane  wrote:
> > 
> > Thank you for your comments. I understand the problem of my proposal
> > patch.
> 
> Does that mean you're trying to rewrite it in the way that was
> suggested:

Not yet, but I'll try to do it.

> 
> > > > Another,
> > > > pretty half-baked, approach would be to add a procsignal triggering idle
> > > > backends to send stats, and send that to all idle backends when querying
> > > > stats. We could even publish the number of outstanding stats updates in
> > > > PGXACT or such, without any locking, and send it only to those that have
> > > > outstanding ones.
> > > 
> > > If somebody wanted to do the work, that'd be a viable answer IMO.  You'd
> > > really want to not wake backends that have nothing more to send, but
> > > I agree that it'd be possible to advertise that in shared memory.
> 
> or are you planning to just let the issue leave be?
> 
> - Andres


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-07-21 Thread Yugo Nagata
On Fri, 21 Jul 2017 09:53:19 +0800
Craig Ringer  wrote:

> On 21 July 2017 at 08:42, Robert Haas  wrote:
> 
> > On Thu, Jul 20, 2017 at 8:27 AM, Fabrízio de Royes Mello
> >  wrote:
> > > I'm not sure your real needs but doesn't it material for improve Event
> > > Triggers???
> >
> > I've thought about that, too.  One problem is what to do if the user
> > hits ^C while the event trigger procedure is running.  If you respond
> > to that by killing the event trigger and letting the user issue
> > commands, then the event trigger can't be used for security or
> > auditing purposes because the user might prevent it from doing
> > whatever it's intended to do with a well-timed interrupt.  If you
> > ignore ^C or make it turn into FATAL, then a poorly-crafted trigger
> > can lock users out of the database.  Maybe that's OK.  We could say
> > "well, if you lock yourself out of the database with your logon
> > trigger, you get to shut down the database and restart in single user
> > mode to recover".
> >
> > A hook, as proposed here, is a lot simpler and lacks these concerns.
> > Installing code in C into the database is intrinsically risky
> > anywhere, and not any moreso here than elsewhere.  But it's also less
> > accessible to the average user.
> > 
> 
> 
> I'd favour the c hook personally. It's a lot more flexible, and can be used
> by an extension to implement trigger-like behaviour if anyone wants it,
> including the extension's choice of error handling decisions.
> 
> It's also a lot simpler and less intrusive for core. Which is nice where we
> don't have something that we don't have anything compelling destined for
> core that needs it. (I want to add a bunch of hooks in the logical
> replication code in pg11 for similar reasons, and so features like DDL
> replication can be prototyped as extensions more practically).
> 
> That said, isn't ExecutorStart_hook + ProcessUtility_hook able to serve the
> same job as a session-start hook, albeit at slightly higher overhead? You
> can just test to see if your initial tasks have run yet.

Thank you for your suggestion. Certainly, we can do the similar job of a
session-start hook using these existing hooks, although these hooks are
triggered when the first query is executed not when the session is started.
Now I come to think that an additional hook is not need.

Thanks,

> 
> -- 
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlogfilename

2017-07-21 Thread Yugo Nagata
On Fri, 21 Jul 2017 10:11:05 +0800
Craig Ringer  wrote:

> On 20 July 2017 at 21:33, Yugo Nagata  wrote:
> 
> > On Thu, 20 Jul 2017 11:02:25 +0200
> > Michael Paquier  wrote:
> >
> > > On Thu, Jul 20, 2017 at 10:58 AM, DEV_OPS  wrote:
> > > > I think you may reference to function: pg_xlogfile_name   in
> > > > src/backend/access/transam/xlogfuncs.c,  it use XLogFileName  defined
> > in
> > > > src/include/access/xlog_internal.h
> > > >
> > > > #define XLogFileName(fname, tli, logSegNo)  \
> > > > snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli,   \
> > > >  (uint32) ((logSegNo) / XLogSegmentsPerXLogId), \
> > > >  (uint32) ((logSegNo) % XLogSegmentsPerXLogId))
> > > >
> > > >
> > > > hope it's helpful for you
> > >
> > > The first 8 characters are the timeline number in hexadecimal format.
> > > The next 8 characters indicate a segment number, which gets
> > > incremented every 256 segments in hexa format. The last 8 characters
> > > indicate the current segment number in hexa format.
> >
> > As far as I understand, XLOG is a logical big file of 256 * 16 MB,
> > and this is split to multiple physical files of 16MB which are called
> > WAL segments. The second 8 characters indicate the id of the logical
> > xlog file, and the last 8 characters indicate the sequencial number of
> > the segment in this xlog.
> > 
> >
> 
> You missed the timeline ID, which is the first 8 digits.

Yes, I missed this. Thanks.

> 
> 
> -- 
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make sure all statistics is sent after a few DML are performed

2017-07-21 Thread Andres Freund
Hi,

(please don't top-reply on this list)

On 2017-07-19 14:04:39 +0900, Yugo Nagata wrote:
> On Tue, 18 Jul 2017 10:10:49 -0400
> Tom Lane  wrote:
> 
> Thank you for your comments. I understand the problem of my proposal
> patch.

Does that mean you're trying to rewrite it in the way that was
suggested:

> > > Another,
> > > pretty half-baked, approach would be to add a procsignal triggering idle
> > > backends to send stats, and send that to all idle backends when querying
> > > stats. We could even publish the number of outstanding stats updates in
> > > PGXACT or such, without any locking, and send it only to those that have
> > > outstanding ones.
> > 
> > If somebody wanted to do the work, that'd be a viable answer IMO.  You'd
> > really want to not wake backends that have nothing more to send, but
> > I agree that it'd be possible to advertise that in shared memory.

or are you planning to just let the issue leave be?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-21 Thread Alik Khilazhev
Hello!

I realized that I was sending emails as HTML and latest patch is not visible in 
the archive now.
That’s why I am attaching it again.

I am sorry for that.



pgbench-zipf-05v.patch
Description: Binary data

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-21 Thread Andres Freund
On 2017-07-18 16:53:43 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > ...  If we were to go this route we'd have to probably move
> > the callback assignment into the ExecInit* routines, and possibly
> > replace the switch in ExecInitNode() with another callback, assigned in
> > make_*, and implement callbacks for ExecShutdown, MultiExecProcNode etc.
> 
> BTW, I don't see why you really need to mess with anything except
> ExecProcNode?  Surely the other cases such as MultiExecProcNode are
> not called often enough to justify changing them away from the
> switch technology.  Yeah, maybe it would be a bit cleaner if they
> all looked alike ... but if you're trying to make a patch that's
> as little invasive as possible for v10, I'd suggest converting just
> ExecProcNode to this style.

Yea, I was making that statement when not aiming for v10.  Attached is a
patch that converts just ExecProcNode. The big change in comparison to
the earlier patch is that the assignment of the callback is now done in
the respective ExecInit* routines. As a consequence the ExecProcNode
callbacks now are static.  I think we should do this fully in v11, I
removing dispatch routines like ExecInitNode() is a good idea, both
because it moves concerns more towards the nodes themselves - given the
growth of executor nodes that strikes me as a good idea.

I've also added stack depth checks to ExecEndNode(),
MultiExecProcNode(), ExecShutdownNode(), because it's not guaranteed
that ExecProcNode is called for every node...

I dislike having the miscadmin.h include in executor.h - but I don't
quite see a better alternative.

I want to run this through pgindent before merging, otherwise we'll
presumably end up with a lot of noise.

I still think we should backpatch at least the check_stack_depth() calls
in ExecInitNode(), ExecEndNode().

Greetings,

Andres Freund
>From 120218e4c6aab1c716aab12495c1a466baa8e37a Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 17 Jul 2017 00:33:49 -0700
Subject: [PATCH] Move ExecProcNode from dispatch to callback based model.

This allows us to add stack-depth checks the first time an executor
node is called, and skip that overhead on following calls.
Additionally it yields a nice speedup.

We should move towards that model for further routines, but as this is
required for v10, it seems better to only do the necessary (which
already is quite large).

Todo: Expand (scope, need in v10) & link.

Author: Andres Freund
Reported-By: Julien Rouhaud
Discussion:
https://postgr.es/m/22833.1490390...@sss.pgh.pa.us
https://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0...@dalibo.com
---
 src/backend/executor/execProcnode.c| 257 +++--
 src/backend/executor/nodeAgg.c |   6 +-
 src/backend/executor/nodeAppend.c  |   8 +-
 src/backend/executor/nodeBitmapHeapscan.c  |   7 +-
 src/backend/executor/nodeCtescan.c |   7 +-
 src/backend/executor/nodeCustom.c  |   8 +-
 src/backend/executor/nodeForeignscan.c |   7 +-
 src/backend/executor/nodeFunctionscan.c|   7 +-
 src/backend/executor/nodeGather.c  |   7 +-
 src/backend/executor/nodeGatherMerge.c |   7 +-
 src/backend/executor/nodeGroup.c   |   6 +-
 src/backend/executor/nodeHashjoin.c|   6 +-
 src/backend/executor/nodeIndexonlyscan.c   |   7 +-
 src/backend/executor/nodeIndexscan.c   |   7 +-
 src/backend/executor/nodeLimit.c   |   6 +-
 src/backend/executor/nodeLockRows.c|   6 +-
 src/backend/executor/nodeMaterial.c|   6 +-
 src/backend/executor/nodeMergeAppend.c |   7 +-
 src/backend/executor/nodeMergejoin.c   |   6 +-
 src/backend/executor/nodeModifyTable.c |   6 +-
 src/backend/executor/nodeNamedtuplestorescan.c |   7 +-
 src/backend/executor/nodeNestloop.c|   6 +-
 src/backend/executor/nodeProjectSet.c  |   6 +-
 src/backend/executor/nodeRecursiveunion.c  |   6 +-
 src/backend/executor/nodeResult.c  |   6 +-
 src/backend/executor/nodeSamplescan.c  |   7 +-
 src/backend/executor/nodeSeqscan.c |   7 +-
 src/backend/executor/nodeSetOp.c   |   6 +-
 src/backend/executor/nodeSort.c|   6 +-
 src/backend/executor/nodeSubqueryscan.c|   7 +-
 src/backend/executor/nodeTableFuncscan.c   |   7 +-
 src/backend/executor/nodeTidscan.c |   7 +-
 src/backend/executor/nodeUnique.c  |   6 +-
 src/backend/executor/nodeValuesscan.c  |   7 +-
 src/backend/executor/nodeWindowAgg.c   |   6 +-
 src/backend/executor/nodeWorktablescan.c   |   7 +-
 src/include/executor/executor.h|  28 ++-
 src/include/executor/nodeAgg.h |   1 -
 src/include/executor/nodeAppend.h  |   1 -
 src/include/executor/nodeBitmapHeapscan.h  |   1 -
 

Re: [HACKERS] Small improvement to compactify_tuples

2017-07-21 Thread Sokolov Yura

On 2017-05-17 17:46, Sokolov Yura wrote:

Alvaro Herrera писал 2017-05-15 20:13:

As I understand, these patches are logically separate, so putting them
together in a single file isn't such a great idea.  If you don't edit
the patches further, then you're all set because we already have the
previously archived patches.  Next commitfest starts in a few months
yet, and if you feel the need to submit corrected versions in the
meantime, please do submit in separate files.  (Some would even argue
that each should be its own thread, but I don't think that's 
necessary.)


Thank you for explanation.

I'm adding new version of first patch with minor improvement:
- I added detection of a case when all buckets are trivial
  (ie 0 or 1 element). In this case no need to sort buckets at all.


I'm putting rebased version of second patch.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres CompanyFrom 3ae5aa0401fc5028cc916d55c4b18e53bf6aa9fb Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Mon, 15 May 2017 16:04:14 +0300
Subject: [PATCH 2/2] Simplify PageRepairFragmentation

In assumption that page usually doesn't become empty, merge second loop
body (collecting items with storage) into first (counting kinds of
items).
---
 src/backend/storage/page/bufpage.c | 47 +++---
 1 file changed, 19 insertions(+), 28 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index f28f18cff3..77cef51d02 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -497,6 +497,7 @@ sort_itemIds(itemIdSort itemidbase, int nitems)
 
 	if (max == 1)
 		goto end;
+
 	/*
 	 * count[k+2] is start of bucket k, count[k+1] is end of bucket k, and
 	 * count[k+1]-count[k+2] is length of bucket k.
@@ -563,10 +564,11 @@ PageRepairFragmentation(Page page)
 	Offset		pd_special = ((PageHeader) page)->pd_special;
 	ItemId		lp;
 	int			nline,
-nstorage,
 nunused;
 	int			i;
 	Size		totallen;
+	itemIdSortData itemidbase[MaxHeapTuplesPerPage];
+	itemIdSort	itemidptr = itemidbase;
 
 	/*
 	 * It's worth the trouble to be more paranoid here than in most places,
@@ -586,14 +588,26 @@ PageRepairFragmentation(Page page)
 		pd_lower, pd_upper, pd_special)));
 
 	nline = PageGetMaxOffsetNumber(page);
-	nunused = nstorage = 0;
+	nunused = totallen = 0;
 	for (i = FirstOffsetNumber; i <= nline; i++)
 	{
 		lp = PageGetItemId(page, i);
 		if (ItemIdIsUsed(lp))
 		{
 			if (ItemIdHasStorage(lp))
-nstorage++;
+			{
+itemidptr->offsetindex = i - 1;
+itemidptr->itemoff = ItemIdGetOffset(lp);
+if (unlikely(itemidptr->itemoff < (int) pd_upper ||
+			 itemidptr->itemoff >= (int) pd_special))
+	ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			 errmsg("corrupted item pointer: %u",
+	itemidptr->itemoff)));
+itemidptr->alignedlen = MAXALIGN(ItemIdGetLength(lp));
+totallen += itemidptr->alignedlen;
+itemidptr++;
+			}
 		}
 		else
 		{
@@ -603,7 +617,7 @@ PageRepairFragmentation(Page page)
 		}
 	}
 
-	if (nstorage == 0)
+	if (itemidptr == itemidbase)
 	{
 		/* Page is completely empty, so just reset it quickly */
 		((PageHeader) page)->pd_upper = pd_special;
@@ -611,36 +625,13 @@ PageRepairFragmentation(Page page)
 	else
 	{
 		/* Need to compact the page the hard way */
-		itemIdSortData itemidbase[MaxHeapTuplesPerPage];
-		itemIdSort	itemidptr = itemidbase;
-
-		totallen = 0;
-		for (i = 0; i < nline; i++)
-		{
-			lp = PageGetItemId(page, i + 1);
-			if (ItemIdHasStorage(lp))
-			{
-itemidptr->offsetindex = i;
-itemidptr->itemoff = ItemIdGetOffset(lp);
-if (itemidptr->itemoff < (int) pd_upper ||
-	itemidptr->itemoff >= (int) pd_special)
-	ereport(ERROR,
-			(errcode(ERRCODE_DATA_CORRUPTED),
-			 errmsg("corrupted item pointer: %u",
-	itemidptr->itemoff)));
-itemidptr->alignedlen = MAXALIGN(ItemIdGetLength(lp));
-totallen += itemidptr->alignedlen;
-itemidptr++;
-			}
-		}
-
 		if (totallen > (Size) (pd_special - pd_lower))
 			ereport(ERROR,
 	(errcode(ERRCODE_DATA_CORRUPTED),
 	 errmsg("corrupted item lengths: total %u, available space %u",
 			(unsigned int) totallen, pd_special - pd_lower)));
 
-		compactify_tuples(itemidbase, nstorage, page);
+		compactify_tuples(itemidbase, (int) (itemidptr - itemidbase), page);
 	}
 
 	/* Set hint bit for PageAddItem */
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mishandling of WCO constraints in direct foreign table modification

2017-07-21 Thread Etsuro Fujita

On 2017/07/21 17:18, Kyotaro HORIGUCHI wrote:

At Fri, 21 Jul 2017 12:00:03 +0900, Etsuro Fujita  wrote 
in <15aa9936-9bd8-c9e3-7ca1-394861073...@lab.ntt.co.jp>

Attached is the second version which updated docs in postgres-fdw.sgml
as well.


!no local joins for the query, no row-level local BEFORE or
!AFTER triggers on the target table, and no
!CHECK OPTION constraints from parent views.
!In UPDATE,

Might be a silly question, is CHECK OPTION a "constraint"?


I mean constraints derived from WITH CHECK OPTIONs specified for parent 
views.  We use the words "WITH CHECK OPTION constraints" in comments in 
nodeModifyTable.c, so the expression "CHECK OPTION constrains" doesn't 
sound not that bad to me.  (I used "CHECK OPTION", not "WITH CHECK 
OPTION", because we use "CHECK OPTION" a lot more in the documentation 
than "WITH CHECK OPTION".)


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-07-21 Thread Etsuro Fujita

On 2017/07/20 11:21, Etsuro Fujita wrote:

On 2017/07/19 23:36, Tom Lane wrote:

Please put the responsibility of doing const-expression simplification
in these cases somewhere closer to where the problem is being created.


It would be reasonable that it's the FDW's responsibility to do that 
const-simplification if necessary?
There seems to be no objections, so I removed the const-expression 
simplification from the patch and I added the note to the docs for 
AddForeignUpdateTargets.


Attached is an updated version of the patch.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 6924,6929  update bar set f2 = f2 + 100 returning *;
--- 6924,6988 
7 | 277
  (6 rows)
  
+ -- Test that UPDATE/DELETE with inherited target works with row-level triggers
+ CREATE TRIGGER trig_row_before
+ BEFORE UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ CREATE TRIGGER trig_row_after
+ AFTER UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ explain (verbose, costs off)
+ update bar set f2 = f2 + 100;
+   QUERY PLAN  

+ 
--
+  Update on public.bar
+Update on public.bar
+Foreign Update on public.bar2
+  Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1 RETURNING 
f1, f2, f3
+->  Seq Scan on public.bar
+  Output: bar.f1, (bar.f2 + 100), bar.ctid
+->  Foreign Scan on public.bar2
+  Output: bar2.f1, (bar2.f2 + 100), bar2.f3, bar2.ctid, bar2.*
+  Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+ (9 rows)
+ 
+ update bar set f2 = f2 + 100;
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2
+ NOTICE:  OLD: (3,333,33),NEW: (3,433,33)
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2
+ NOTICE:  OLD: (4,344,44),NEW: (4,444,44)
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2
+ NOTICE:  OLD: (7,277,77),NEW: (7,377,77)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2
+ NOTICE:  OLD: (3,333,33),NEW: (3,433,33)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2
+ NOTICE:  OLD: (4,344,44),NEW: (4,444,44)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2
+ NOTICE:  OLD: (7,277,77),NEW: (7,377,77)
+ explain (verbose, costs off)
+ delete from bar where f2 < 400;
+  QUERY PLAN   
   
+ 
-
+  Delete on public.bar
+Delete on public.bar
+Foreign Delete on public.bar2
+  Remote SQL: DELETE FROM public.loct2 WHERE ctid = $1 RETURNING f1, f2, f3
+->  Seq Scan on public.bar
+  Output: bar.ctid
+  Filter: (bar.f2 < 400)
+->  Foreign Scan on public.bar2
+  Output: bar2.ctid, bar2.*
+  Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 WHERE ((f2 < 
400)) FOR UPDATE
+ (10 rows)
+ 
+ delete from bar where f2 < 400;
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW DELETE ON bar2
+ NOTICE:  OLD: (7,377,77)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW DELETE ON bar2
+ NOTICE:  OLD: (7,377,77)
+ -- cleanup
+ DROP TRIGGER trig_row_before ON bar2;
+ DROP TRIGGER trig_row_after ON bar2;
  drop table foo cascade;
  NOTICE:  drop cascades to foreign table foo2
  drop table bar cascade;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***
*** 1609,1614  explain (verbose, costs off)
--- 1609,1634 
  update bar set f2 = f2 + 100 returning *;
  update bar set f2 = f2 + 100 returning *;
  
+ -- Test that UPDATE/DELETE with inherited target works with row-level triggers
+ CREATE TRIGGER trig_row_before
+ BEFORE UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ 
+ CREATE TRIGGER trig_row_after
+ AFTER UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ 
+ explain (verbose, costs off)
+ update bar set f2 = f2 + 100;
+ update bar set f2 = f2 + 100;
+ 
+ explain (verbose, costs off)
+ delete from bar where f2 < 400;
+ delete from bar where f2 < 400;
+ 
+ -- cleanup
+ DROP TRIGGER trig_row_before ON bar2;
+ DROP TRIGGER trig_row_after ON bar2;
  drop table foo cascade;
  drop table bar cascade;
  drop table loct1;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***
*** 428,438  AddForeignUpdateTargets (Query *parsetree,
   Avoid using names matching ctidN,
   wholerow, or
   wholerowN, as the core system can
!  generate junk columns of these names.
  
  
  
!  This function is called in the rewriter, not the planner, so the
   information available 

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-21 Thread Alik Khilazhev
Hmmm. On second thought, maybe one or the other is enough, either restrict the parameter to values where the approximation is good, or put out a clear documentation about when the approximation is not very good, but it may be still useful even if not precise.So I would be in favor of expanding the documentation but not restricting the parameter beyond avoiding value 1.0.I have removed restriction and expanded documentation in attaching patch v5. Also I have recorded patch to CF 2017-09 —  https://commitfest.postgresql.org/14/1206/. 

pgbench-zipf-05v.patch
Description: Binary data
—Thanks and Regards,Alik KhilazhevPostgres Professional:http://www.postgrespro.comThe Russian Postgres Company



Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-21 Thread Sokolov Yura

On 2017-07-20 20:59, Robert Haas wrote:

If you want something changed, it's your job to do that testing.


I've been testing for two weeks before I wrote to pgsql-hackers. And I
wrote some highlevel results in first letter.
I haven't noticed transactions slowdown from increased vacuum ring 
buffer.

But:
- my workload could be too synthetic,
- I've measured total performed transactions, and sampled time of random
ones.
So probably my measurement were not exhaustive. And definitely I could
not imagine whole set of interesting workloads. And given every test
run for at least 5 hours (and, in fact, test run on master for 20 hours,
cause autovacuum doesn't finish faster on that synthetic workload)
I will spend at least half-year if I test all possible workloads.

That is why I asked community to test it on workloads people consider
interesting.

I may measure by my self, if some tells me what workload he wants to be
tested.

Two previous discussions of the topic were killed without any evidence 
of
testing at all, only with theoretical doubts. Is it fair? Why "probably 
it

is bad" is better than "probably it is good"?

You are one of leadership. I know it is not your job to test every tiny
change a school boy proposed. But here is a lot of people, who waits for
your word. Instead of cooling rush and closing discussions, you may just
say: "please, someone test it with that particular workload".


I don't think this discussion is really going anywhere unless you are
willing to admit that increasing VACUUM performance could have some
downsides.  If you're not willing to admit that, there's not a lot to 
talk

about here.


I can admit many things. I've seen how autovacuum drops pgbench 
performance
from 10tps down to 1500tps cause of contention on CLogControlLock. 
(btw

my LWLock patch improves it to 3000tps).

But that is not a reason autovacuum should be intentionally slow. How
Stephen Frost said, that is what vacuum_cost_limit and vacuum_cost_delay 
are

for. (and, certainly, it is reason to improve CLog and SLRU).


OK, but I have helped *many* customers whose problem was that vacuum
ran too fast and blew data out of the OS cache causing query response
times to go through the roof.


When there is no garbage, increasing autovacuum ring buffer changes 
almost
nothing. When there is garbage, current small ring buffer leads to a 
storm
of fsyncs. Frequent fsyncs slows down hdd a lot, and then hdd isn't 
capable

to satisfy queries and refill OS cache. Will you admit it?


I've also run into many customers whose problem that vacuum ran too
slowly, and generally raising vacuum_cost_limit fixes that problem just
fine.


Probably with increased ring buffer there is no need in raising
vacuum_cost_limit. Will you admit it?

With regards,
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mishandling of WCO constraints in direct foreign table modification

2017-07-21 Thread Kyotaro HORIGUCHI
At Fri, 21 Jul 2017 12:00:03 +0900, Etsuro Fujita  
wrote in <15aa9936-9bd8-c9e3-7ca1-394861073...@lab.ntt.co.jp>
> On 2017/07/21 3:24, Robert Haas wrote:
> > I think that's reasonable.  This should be committed and back-patched
> > to 9.6, right?
> 
> Yeah, because direct modify was introduced in 9.6.
> 
> Attached is the second version which updated docs in postgres-fdw.sgml
> as well.

!no local joins for the query, no row-level local BEFORE or
!AFTER triggers on the target table, and no
!CHECK OPTION constraints from parent views.
!In UPDATE,

Might be a silly question, is CHECK OPTION a "constraint"?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PgFDW connection invalidation by ALTER SERVER/ALTER USER MAPPING

2017-07-21 Thread Ashutosh Bapat
On Fri, Jul 21, 2017 at 10:55 AM, Kyotaro HORIGUCHI
 wrote:
> At Thu, 20 Jul 2017 18:15:42 -0400, Tom Lane  wrote in 
> <18927.1500588...@sss.pgh.pa.us>
>> This seems like overkill.  We can test it reasonably easily within the
>> existing framework, as shown in the attached patch.  I'm also fairly
>
> It checks for a disconnection caused in a single session. I
> thought that its inter-process characteristics is important
> (since I had forgot that in the previous version), but it is
> reasonable enough if we can rely on the fact that it surely works
> through invalidation mechanism.
>
> In shoft, I agree to the test in your patch.
>
>> concerned that what you're showing here would be unstable in the buildfarm
>> as a result of race conditions between the multiple sessions.
>
> Sure. It is what I meant by 'fragile'.
>
>> I made some cosmetic updates to the code patch, as well.
>
> Thank you for leaving the hashvalue staff and revising the comment.
>
> By the way I mistakenly had left the following code in the
> previous patch.
>
> + /* hashvalue == 0 means a cache reset, must clear all state */
> + if (hashvalue == 0)
> +   entry->invalidated = true;
> + else if ((cacheid == FOREIGNSERVEROID &&
> +   entry->server_hashvalue == hashvalue) ||
> +  (cacheid == USERMAPPINGOID &&
> +   entry->mapping_hashvalue == hashvalue))
> +   entry->invalidated = true;
>
> The reason for the redundancy was that it had used switch-case in
> the else block just before. However, it is no longer
> reasonable. I'd like to change here as the follows.
>
> + /* hashvalue == 0 means a cache reset, must clear all state */
> + if ((hashvalue == 0) ||
> + ((cacheid == FOREIGNSERVEROID &&
> +   entry->server_hashvalue == hashvalue) ||
> +  (cacheid == USERMAPPINGOID &&
> +   entry->mapping_hashvalue == hashvalue)))
> +   entry->invalidated = true;
>
> The attached patch differs only in this point.
>

+1. The patch looks good to me.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash

2017-07-21 Thread Kyotaro HORIGUCHI
At Fri, 21 Jul 2017 11:39:38 +0530, Neha Sharma  
wrote in 
> Here is the back trace from the core dump attached.
> 
> (gdb) bt
> #0  0x7f4a71424495 in raise () from /lib64/libc.so.6
> #1  0x7f4a71425c75 in abort () from /lib64/libc.so.6
> #2  0x009dc18a in ExceptionalCondition (conditionName=0xa905d0
> "!(TransactionIdPrecedesOrEquals(oldestXact,
> ShmemVariableCache->oldestXid))",
> errorType=0xa9044f "FailedAssertion", fileName=0xa90448 "clog.c",
> lineNumber=683) at assert.c:54
> #3  0x00524215 in TruncateCLOG (oldestXact=150036635,
> oldestxid_datoid=13164) at clog.c:682

In vac_truncate_clog, TruncateCLOG is called before
SetTransactionIdLimit, which advances
ShmemVariableCache->oldestXid. Given that the assertion in
TruncateCLOG is valid, they should be called in reverse order. I
suppose that CLOG files can be safely truncated after advancing
XID limits.

By the way, the attached patch is made by "git diff --patience".

filterdiff converts it into somewhat wrong shape. Specifically,
the result is missing the addition part of the difference, as the
second attached patch. I'm not sure which of git(2.9.2) or
filterdiff (0.3.3), (or me?) is doing wrong..


> #4  0x006a6be8 in vac_truncate_clog (frozenXID=150036635,
> minMulti=1, lastSaneFrozenXid=200562449, lastSaneMinMulti=1) at
> vacuum.c:1197
> #5  0x006a6948 in vac_update_datfrozenxid () at vacuum.c:1063
> #6  0x007ce0a2 in do_autovacuum () at autovacuum.c:2625
> #7  0x007cc987 in AutoVacWorkerMain (argc=0, argv=0x0) at
> autovacuum.c:1715
> #8  0x007cc562 in StartAutoVacWorker () at autovacuum.c:1512
> #9  0x007e2acd in StartAutovacuumWorker () at postmaster.c:5414
> #10 0x007e257e in sigusr1_handler (postgres_signal_arg=10) at
> postmaster.c:5111
> #11 
> #12 0x7f4a714d3603 in __select_nocancel () from /lib64/libc.so.6
> #13 0x007dde88 in ServerLoop () at postmaster.c:1717
> #14 0x007dd67d in PostmasterMain (argc=3, argv=0x2eb8b00) at
> postmaster.c:1361
> #15 0x0071a218 in main (argc=3, argv=0x2eb8b00) at main.c:228
> (gdb) print ShmemVariableCache->oldestXid
> $3 = 548
> 
> 
> Regards,
> Neha Sharma
> 
> On Fri, Jul 21, 2017 at 11:01 AM, Thomas Munro <
> thomas.mu...@enterprisedb.com> wrote:
> 
> > On Fri, Jul 21, 2017 at 4:16 PM, Neha Sharma
> >  wrote:
> > >
> > > Attached is the core dump file received on PG 10beta2 version.
> >
> > Thanks Neha.  It's be best to post the back trace and if possible
> > print oldestXact and ShmemVariableCache->oldestXid from the stack
> > frame for TruncateCLOG.
> >
> > The failing assertion in TruncateCLOG() has a comment that says
> > "vac_truncate_clog already advanced oldestXid", but vac_truncate_clog
> > calls SetTransactionIdLimit() to write ShmemVariableCache->oldestXid
> > *after* it calls TruncateCLOG().  What am I missing here?
> >
> > What actually prevents ShmemVariableCache->oldestXid from going
> > backwards anyway?  Suppose there are two or more autovacuum processes
> > that reach vac_truncate_clog() concurrently.  They do a scan of
> > pg_database whose tuples they access without locking through a
> > pointer-to-volatile because they expect concurrent in-place writers,
> > come up with a value for frozenXID, and then arrive at
> > SetTransactionIdLimit() in whatever order and clobber
> > ShmemVariableCache->oldestXid.  What am I missing here?
> >
> > --
> > Thomas Munro
> > http://www.enterprisedb.com
> >

-- 
堀口恭太郎

日本電信電話株式会社 NTTオープンソースソフトウェアセンタ
Phone: 03-5860-5115 / Fax: 03-5463-5490
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index faa1812..cd8be92 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1192,13 +1192,6 @@ vac_truncate_clog(TransactionId frozenXID,
 	AdvanceOldestCommitTsXid(frozenXID);
 
 	/*
-	 * Truncate CLOG, multixact and CommitTs to the oldest computed value.
-	 */
-	TruncateCLOG(frozenXID, oldestxid_datoid);
-	TruncateCommitTs(frozenXID);
-	TruncateMultiXact(minMulti, minmulti_datoid);
-
-	/*
 	 * Update the wrap limit for GetNewTransactionId and creation of new
 	 * MultiXactIds.  Note: these functions will also signal the postmaster
 	 * for an(other) autovac cycle if needed.   XXX should we avoid possibly
@@ -1206,6 +1199,14 @@ vac_truncate_clog(TransactionId frozenXID,
 	 */
 	SetTransactionIdLimit(frozenXID, oldestxid_datoid);
 	SetMultiXactIdLimit(minMulti, minmulti_datoid, false);
+
+	/*
+	 * Truncate CLOG, multixact and CommitTs to the oldest computed value
+	 * after advancing xid limits.
+	 */
+	TruncateCLOG(frozenXID, oldestxid_datoid);
+	TruncateCommitTs(frozenXID);
+	TruncateMultiXact(minMulti, minmulti_datoid);
 }
 
 
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
***
*** 1192,1204  vac_truncate_clog(TransactionId 

Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-21 Thread Sokolov Yura

On 2017-07-21 06:58, Joshua D. Drake wrote:

On 07/19/2017 07:57 PM, Tom Lane wrote:

Peter Geoghegan  writes:

My argument for the importance of index bloat to the more general
bloat problem is simple: any bloat that accumulates, that cannot be
cleaned up, will probably accumulate until it impacts performance
quite noticeably.


But that just begs the question: *does* it accumulate indefinitely, or
does it eventually reach a more-or-less steady state?  The traditional
wisdom about btrees, for instance, is that no matter how full you pack
them to start with, the steady state is going to involve something 
like

1/3rd free space.  You can call that bloat if you want, but it's not
likely that you'll be able to reduce the number significantly without
paying exorbitant costs.

I'm not claiming that we don't have any problems, but I do think it's
important to draw a distinction between bloat and normal operating
overhead.


Agreed but we aren't talking about 30% I don't think. Here is where I
am at. It took until 30 minutes ago for the tests to finish:

name |  setting
-+---
 autovacuum  | on
 autovacuum_analyze_scale_factor | 0.1
 autovacuum_analyze_threshold| 50
 autovacuum_freeze_max_age   | 2
 autovacuum_max_workers  | 3
 autovacuum_multixact_freeze_max_age | 4
 autovacuum_naptime  | 60
 autovacuum_vacuum_cost_delay| 20
 autovacuum_vacuum_cost_limit| -1
 autovacuum_vacuum_scale_factor  | 0.2
 autovacuum_vacuum_threshold | 50
 autovacuum_work_mem | -1
 log_autovacuum_min_duration | -1


Test 1: 55G /srv/main
TPS:955

Test 2: 112G/srv/main
TPS:531 (Not sure what happened here, long checkpoint?)

Test 3: 109G/srv/main
TPS:868

Test 4: 143G
TPS:840

Test 5: 154G
TPS:722

I am running the query here:

https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index

And will post a followup. Once the query finishes I am going to launch
the tests with autovacuum_vacuum_cost_limit of 5000. Is there anything
else you folks would like me to change?

JD




--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


Have you measured increased vacuum ring buffer?
This will require recompilation, though.

With regards,
--
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cache lookup errors with functions manipulation object addresses

2017-07-21 Thread Michael Paquier
On Thu, Jul 20, 2017 at 6:26 PM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>> On Thu, Jul 20, 2017 at 4:04 PM, Alvaro Herrera
>>  wrote:
>> > I think the addition of checks everywhere for NULL return is worse.
>> > Let's add a missing_ok flag instead, so that most callers can just trust
>> > that they get a non null value if they don't want to deal with that
>> > case.
>>
>> If you want to minimize the diffs or such a patch, we could as well
>> have an extended version of those APIs. I don't think that for the
>> addition of one argument like a missing_ok that's the way to go, but
>> some people may like that to make this patch less intrusive.
>
> I think minimizing API churn is worthwhile in some cases but not all.
> These functions seem fringe enough that not having an API-compatible
> version is unnecessary.  But that's just my opinion.

I can see your point. The v1 proposed above adds quite a lot of error
code churn to deal with the lack of missing_ok flag in
getObjectDescription, getObjectIdentity and getObjectIdentityParts.
And the cache lookup error messages cannot be object-specific either,
so I fell back to using %u/%u with the class as first identifier.
Let's go with what you suggest here then.

Before producing any v2, I would still need some sort of consensus
about a couple of points though to grab object details. Here is what I
think should be done:
1) For functions, let's remove format_procedure_qualified, and replace
it with format_procedure_extended which replaces
format_procedure_internal.
2) For relation attributes, we have now get_relid_attribute_name()
which cannot tolerate failures, and get_attname which returns NULL on
errors. My suggestion here is to remove get_relid_attribute_name, and
add a missing_ok flag to get_attname. Let's do this as a refactoring
patch. One thing that may matter is modules calling the existing APIs.
We could provide a compatibility macro.
3) For types, the existing interface is more a mess. HEAD has
allow_invalid which is used by the SQL function format_type. My
suggestion here would be to remove allow_invalid and replace it with
missing_ok, to return NULL if the type has gone missing, or issue an
error depending on what caller wants. oidvectortypes() needs to be
modified as well. I would suggest to change this interface as a
refactoring patch.
4) GetForeignServer and GetForeignDataWrapper need to have a
missing_ok. I suggest to refactor them as well with a separate patch.
5) For operators, there is format_operator_internal which has its own
idea of invalid objects. I think that the existing API should be
reworked.

So, while the work of this thread is largely possible and can be done
incrementally. The devil is in the details of how to handle the
existing APIs. Reaching an agreement about all the points above is key
to get a clean implementation we are happy with.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-21 Thread Ashutosh Bapat
On Fri, Jul 21, 2017 at 11:54 AM, Rafia Sabih
 wrote:
> So, does this
> also mean that a partitioned table will not join with an unpartitioned
> table without append of partitions?
>

Yes. When you join an unpartitioned table with a partitioned table,
the planner will choose to append all the partitions of the
partitioned table and then join with the unpartitioned table.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-21 Thread Rafia Sabih
On Thu, Jul 20, 2017 at 2:44 PM, Ashutosh Bapat
 wrote:
>
> On Thu, Jul 20, 2017 at 11:46 AM, Amit Langote
>  wrote:
> > On 2017/07/20 15:05, Ashutosh Bapat wrote:
> >> On Wed, Jul 19, 2017 at 9:54 AM, Rafia Sabih
> >>  wrote:
> >>>
> >>> Partition information:
> >>> Type of partitioning - single column range partition
> >>> Tables partitioned - Lineitem and orders
> >>>
> >>> Lineitem -
> >>> Partition key = l_orderkey
> >>> No of partitions = 18
> >>>
> >>> Orders -
> >>> Partition key = o_orderkey
> >>> No of partitions = 11
> >>>
> >>
> >> The patch set upto 0015 would refuse to join two partitioned relations
> >> using a partition-wise join if they have different number of
> >> partitions. Next patches implement a more advanced partition matching
> >> algorithm only for list partitions. Those next patches would refuse to
> >> apply partition-wise join for range partitioned tables. So, I am
> >> confused as to how come partition-wise join is being chosen even when
> >> the number of partitions differ.
> >
> > In 21_part_patched.out, I see that lineitem is partitionwise-joined with
> > itself.
> >
> >  >  Append
> >
> >->  Hash Semi Join
> >Hash Cond: (l1.l_orderkey = l2.l_orderkey)
> >Join Filter: (l2.l_suppkey <> l1.l_suppkey)
> >Rows Removed by Join Filter: 395116
> >
> >->  Parallel Seq Scan on lineitem_001 l1
> >Filter: (l_receiptdate > l_commitdate)
> >Rows Removed by Filter: 919654
> >
> >->  Hash
> >Buckets: 8388608  Batches: 1  Memory Usage: 358464kB
> >->  Seq Scan on lineitem_001 l2
> >
> Ah, I see now.
>
> We need the same number of partitions in all partitioned tables, for
> joins to pick up partition-wise join.
>
Oh, I missed this limitation, will modify my setup to have same number
of partitions in the partitioned table with same ranges. So, does this
also mean that a partitioned table will not join with an unpartitioned
table without append of partitions?

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-21 Thread Ashutosh Bapat
On Fri, Jul 21, 2017 at 11:42 AM, Rafia Sabih
 wrote:
>
> Following the discussion at [1], with the patch Thomas posted there, now Q21
> completes in some 160 seconds.

Your earlier reports mentioned unpartitioned case taking 300 seconds,
partitioned case without partition-wise join taking 600 seconds and
with partition-wise join it taking 3200 seconds. My experiements
showed that those have changed to 70s, 160s and 160s resp. This is
with Thomas's patch. Can you please confirm?

> The plan is changed for the good but does not
> use partition-wise join.

As explained earlier, this is because the tables are not partitioned
similarly. Please try with lineitem and orders partitioned similarly
i.e. same number of partitions and exactly same ranges.


> Not just the join orders but the join strategy itself changed, with the
> patch no hash semi join is picked which was consuming most time there,
> rather nested loop semi join is in picture now, though the estimates are
> still way-off, but the change in join-order made them terrible from
> horrible. It appears like this query is performing efficient now
> particularly because of worse under-estimated hash-join as compared to
> under-estimated nested loop join.

Earlier it was using partition-wise join between lineitems (l1, l2,
l3) since it's the same table. Now for some reason the planner doesn't
find joining them to each other a better strategy, instead they are
joined indirectly so we don't see partition-wise join being picked. We
should experiment with orders and lineitems being partitioned
similarly. Can you please provide that result?

>
> For the hash-semi-join:
> ->  Hash  (cost=3449457.34..3449457.34 rows=119994934 width=8) (actual
> time=180858.448..180858.448 rows=119994608 loops=3)
>Buckets: 33554432
> Batches: 8  Memory Usage: 847911kB
>
> Overall, this doesn't look like a problem of partition-wise join patch
> itself.
>

Thanks for confirming it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error while copying a large file in pg_rewind

2017-07-21 Thread Michael Paquier
On Thu, Jul 20, 2017 at 9:31 PM, Robert Haas  wrote:
> I was initially surprised that your testing managed to pass, but then
> I noticed that this sanity test is using && where it should really be
> using ||; it will only fail if ALL of the data types are wrong.  Oops.

Oh, oh. If this was right from the beginning I would have hit this
issue. Yes that's a clear oversight of the original code.

> This code plays pretty fast and loose with off_t vs. size_t vs. uint64
> vs. int64, but those definitely don't all agree in signedness and may
> not even agree in width (size_t, at least, might be 4 bytes).  We use
> stat() to get the size of a file as an off_t, and then store it into a
> uint64. Perhaps off_t is always uint64 anyway, but I'm not sure.
> Then, that value gets passed to fetch_file_range(), still as a uint64,
> and it then gets sent to the server to be stored into an int8 column,
> which is signed rather than unsigned.  That will error out if there's
> a file size >= 2^63, but filesystem holding more than 8 exabytes are
> unusual and will probably remain so for some time.  The server sends
> back an int64 which we pass to write_target_range(), whose argument is
> declared as off_t, which is where we started.  I'm not sure there's
> any real problem with all of that, but it's a little nervous-making
> that we keep switching types.  Similarly, libpqProcessFileList gets
> the file size as an int64 and then passes it to process_source_file()
> which is expecting size_t.  So we manage to use 4 different data types
> to represent a file size.  On most systems, all of them except SQL's
> int8 are likely to be 64-bit unsigned integers, but I'm not sure what
> will happen on obscure platforms.

Yes, I felt uneasy as well when hacking the patch with all the type
switches that have been done, but I kept my focus on bringing out a
minimally invasive patch to fix the issue and any other holes I found.
One thing that I think could be added in the patch is an overflow
check in libpqGetFile(), because I think that we still want to use
size_t for this code path. What do you think?

Note I am not much worrying on signedness of the values yet (Postgres
still lacks a proper in-core unsigned type, this gives an argument for
one), as long as the value are correctly stored on 8 bytes, and we
still have some margin until we reach that. We could add a comment in
the code to underline that assumption, though I am not sure that this
is really necessary...

> Still, it can't be worse than the status quo, where instead of int64
> we're using int and int32, so maybe we ought to back-patch it as-is
> for now and look at any further cleanup that is needed as a
> master-only improvement.

Yes. I don't like playing much with the variable types on
back-branches, as long as the initial amount of bytes is large enough
we will be safe for some time.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-21 Thread Rafia Sabih
On Thu, Jul 20, 2017 at 8:53 AM, Thomas Munro  wrote:

> On Thu, Jul 20, 2017 at 2:02 PM, Robert Haas 
> wrote:
> > On Wed, Jul 19, 2017 at 7:45 PM, Thomas Munro
> >  wrote:
> >> Isn't this the same as the issue reported here?
> >>
> >> https://www.postgresql.org/message-id/flat/CAEepm%3D270ze2hVxWkJw-
> 5eKzc3AB4C9KpH3L2kih75R5pdSogg%40mail.gmail.com
> >
> > Hmm, possibly.  But why would that affect the partition-wise join case
> only?
>
> It doesn't.  From Rafia's part_reg.zip we see a bunch of rows=1 that
> turn out to be wrong by several orders of magnitude:
>
> 21_nopart_head.out:  Hash Semi Join  (cost=5720107.25..9442574.55
> rows=1 width=50)
> 21_part_head.out:Hash Semi Join  (cost=5423094.06..8847638.36
> rows=1 width=38)
> 21_part_patched.out: Hash Semi Join  (cost=309300.53..491665.60 rows=1
> width=12)
>
> My guess is that the consequences of that bad estimate are sensitive
> to arbitrary other parameters moving around, as you can see from the
> big jump in execution time I showed in the that message, measured on
> unpatched master of the day:
>
>   4 workers = 9.5s
>   3 workers = 39.7s
>
> That's why why both parallel hash join and partition-wise join are
> showing regressions on Q21: it's just flip-flopping between various
> badly costed plans.  Note that even without parallelism, the fix that
> Tom Lane suggested gives a much better plan:
>
> https://www.postgresql.org/message-id/CAEepm%
> 3D11BiYUkgXZNzMtYhXh4S3a9DwUP8O%2BF2_ZPeGzzJFPbw%40mail.gmail.com
>
>
Following the discussion at [1], with the patch Thomas posted there, now
Q21 completes in some 160 seconds. The plan is changed for the good but
does not use partition-wise join. The output of explain analyse is
attached.

Not just the join orders but the join strategy itself changed, with the
patch no hash semi join is picked which was consuming most time there,
rather nested loop semi join is in picture now, though the estimates are
still way-off, but the change in join-order made them terrible from
horrible. It appears like this query is performing efficient now
particularly because of worse under-estimated hash-join as compared to
under-estimated nested loop join.

For the hash-semi-join:
->  Hash  (cost=3449457.34..3449457.34 rows=119994934 width=8) (actual
time=180858.448..180858.448 rows=119994608 loops=3)
   Buckets: 33554432
 Batches: 8  Memory Usage: 847911kB

Overall, this doesn't look like a problem of partition-wise join patch
itself.

[1]
https://www.postgresql.org/message-id/CAEepm%3D3%3DNHHko3oOzpik%2BggLy17AO%2Bpx3rGYrg3x_x05%2BBr9-A%40mail.gmail.com


-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


Q21_SE_patch.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash

2017-07-21 Thread Neha Sharma
Here is the back trace from the core dump attached.

(gdb) bt
#0  0x7f4a71424495 in raise () from /lib64/libc.so.6
#1  0x7f4a71425c75 in abort () from /lib64/libc.so.6
#2  0x009dc18a in ExceptionalCondition (conditionName=0xa905d0
"!(TransactionIdPrecedesOrEquals(oldestXact,
ShmemVariableCache->oldestXid))",
errorType=0xa9044f "FailedAssertion", fileName=0xa90448 "clog.c",
lineNumber=683) at assert.c:54
#3  0x00524215 in TruncateCLOG (oldestXact=150036635,
oldestxid_datoid=13164) at clog.c:682
#4  0x006a6be8 in vac_truncate_clog (frozenXID=150036635,
minMulti=1, lastSaneFrozenXid=200562449, lastSaneMinMulti=1) at
vacuum.c:1197
#5  0x006a6948 in vac_update_datfrozenxid () at vacuum.c:1063
#6  0x007ce0a2 in do_autovacuum () at autovacuum.c:2625
#7  0x007cc987 in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1715
#8  0x007cc562 in StartAutoVacWorker () at autovacuum.c:1512
#9  0x007e2acd in StartAutovacuumWorker () at postmaster.c:5414
#10 0x007e257e in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5111
#11 
#12 0x7f4a714d3603 in __select_nocancel () from /lib64/libc.so.6
#13 0x007dde88 in ServerLoop () at postmaster.c:1717
#14 0x007dd67d in PostmasterMain (argc=3, argv=0x2eb8b00) at
postmaster.c:1361
#15 0x0071a218 in main (argc=3, argv=0x2eb8b00) at main.c:228
(gdb) print ShmemVariableCache->oldestXid
$3 = 548


Regards,
Neha Sharma

On Fri, Jul 21, 2017 at 11:01 AM, Thomas Munro <
thomas.mu...@enterprisedb.com> wrote:

> On Fri, Jul 21, 2017 at 4:16 PM, Neha Sharma
>  wrote:
> >
> > Attached is the core dump file received on PG 10beta2 version.
>
> Thanks Neha.  It's be best to post the back trace and if possible
> print oldestXact and ShmemVariableCache->oldestXid from the stack
> frame for TruncateCLOG.
>
> The failing assertion in TruncateCLOG() has a comment that says
> "vac_truncate_clog already advanced oldestXid", but vac_truncate_clog
> calls SetTransactionIdLimit() to write ShmemVariableCache->oldestXid
> *after* it calls TruncateCLOG().  What am I missing here?
>
> What actually prevents ShmemVariableCache->oldestXid from going
> backwards anyway?  Suppose there are two or more autovacuum processes
> that reach vac_truncate_clog() concurrently.  They do a scan of
> pg_database whose tuples they access without locking through a
> pointer-to-volatile because they expect concurrent in-place writers,
> come up with a value for frozenXID, and then arrive at
> SetTransactionIdLimit() in whatever order and clobber
> ShmemVariableCache->oldestXid.  What am I missing here?
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>