Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Craig Ringer
On 1 June 2016 at 13:09, Tsunakawa, Takayuki  wrote:

> From: pgsql-hackers-ow...@postgresql.org [mailto:
> pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer
>
> While that's probably OK, it's not especially desirable. The typical
> Windows deployment model involves the application bundling all its direct
> dependencies except when those are provided by a runtime redistributable
> designed for that purpose.
>
>
>
>
>
> I think so, too, but I'm not confident that's typical.  Some people may
> think of PostgreSQL binaries as a shared component for their applications
> and place it in one place, just like the PostgreSQL library package is in
> /usr/lib/pgsql.
>

(Your reply formatting seems mangled, mixing my text with yours)

/usr/lib/pgsql works because *nix systems don't typically do binary
distribution.

Windows apps that rely on binary distribution should bundle the libraries
they require.

Maybe a note on windows distribution in the libpq manual would be
warranted. I thought it was so accepted as the way it's done that nobody
would really do anything else.

(Then again, EDB's installers don't bundle their Python, Perl, etc
runtimes, but I think that's partly a legal thing).



> There's a client-only installation method as follows, although I haven't
> checked whether EnterpriseDB, OpenSCG, or any other PostgreSQL-based
> products provide client-only installation.
>
>
> https://www.postgresql.org/docs/devel/static/install-windows-full.html#AEN30192
>

Right, and EDBs installers also let you install just the client AFAIK, but
there's no simple client library redist package, like you'd expect if it
was intended for use that way.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer
While that's probably OK, it's not especially desirable. The typical Windows 
deployment model involves the application bundling all its direct dependencies 
except when those are provided by a runtime redistributable designed for that 
purpose.


I think so, too, but I'm not confident that's typical.  Some people may think 
of PostgreSQL binaries as a shared component for their applications and place 
it in one place, just like the PostgreSQL library package is in /usr/lib/pgsql.


- Use the app with newer PostgreSQL major versions without rebuilding the app.  
That is, the users just replaces the PostgreSQL client.

... especially since there isn't a client-only PostgreSQL distribution Windows.


There's a client-only installation method as follows, although I haven't 
checked whether EnterpriseDB, OpenSCG, or any other PostgreSQL-based products 
provide client-only installation.
https://www.postgresql.org/docs/devel/static/install-windows-full.html#AEN30192

[Excerpt]
--
If you want to install only the client applications and interface libraries, 
then you can use these commands:

install c:\destination\directory client
--


How about adding an article about application binary compatibility in the 
following section, as well as chapters for libpq, ECPG, etc?

It would be sensible to better define the binary compatibility expectations 
that clients may rely upon and, when they are broken, a managed way in which 
they're broken (soname bump, etc).

If you have an interest in the area it might be worth drafting a proposal after 
taking a look at past binary compatibility discussions on -hackers.

Sure, I'll submit a patch to pgsql-docs.  Thanks to Michael's confirmation, I 
could answer the customer's question, so this is not an immediate task now.  
But I'll do.


- On-disk format
- Wire protocol
- SQL-level (data types, syntax, encoding handling, settings, ...)

Yes, I recognize these items.  I omitted them because:

- On-disk format: this is handled in the PostgreSQL manual article about 
upgrading
- Wire protocol: I believe the compatibility will be retained
- SQL-level: ditto

But if you feel a need for their compatibility description for completeness, 
I'll add it.  ... Yes, the explicit explanation may be necessary so that users 
are assured that the PostgreSQL compatibility policy matches their expectation.


The simplest solution would be to incorporate the soname, so it becomes 
libpq0509.dll for example. Like VS does with the VS runtime. The main downside 
is that because it's not a true soname and the OS has no such concept, the 
linker for everything compiled against that DLL must specify the versioned DLL 
name, it can't just link to 'libpq' .

Although I haven’t examined yet, some directive in .def file might enable 
applications to specify libpq.lib at build time and to link with libpq5.dll at 
run time.


Regards
Takayuki Tsunakawa



Re: [HACKERS] Statement timeout

2016-05-31 Thread Craig Ringer
On 1 June 2016 at 08:33, Tatsuo Ishii  wrote:

> > FWIW, I think the existing behavior is just fine.  It corresponds to what
> > PQexec has always done with multi-statement query strings; that is,
> > statement_timeout governs the total time to execute the transaction (the
> > whole query string, unless you put transaction control commands in
> there).
> > In extended query mode, since you can only put one textual query per
> Parse
> > message, that maps to statement_timeout governing the total time from
> > initial Parse to Sync.  Which is what it does.
>
> I've never thought about that. And I cannot imagine anyone is using
> that way in extended protocol to simulate multi-statement queries. Is
> that documented somewhere?
>

Well, multiple parse/bind/execute messages before a sync are definitely
used by PgJDBC and nPgSQL for batching, and I just posted a patch for it
for libpq. I wouldn't have considered it to simulate multi-statement
simple-protocol queries, but I guess there are some parallels.

I am very surprised to find out that statement_timeout tracks the total
time and isn't reset by a new statement, but I guess it makes sense - what,
exactly, delimits a "query" in extended query mode? statement_timeout in
simple-query mode starts at parse time and runs until the end of execute.
In e.q.p. there might be only one parse, then a series of Bind and Execute
messages, or there may be repeated Parse messages.

Personally I'd be fairly happy with statement-timeout applying per-message
in the extended query protocol. That would mean that it behaves slightly
differently, and a query with a long slow parse and bind phase followed by
quick execution might fail to time out in the extended query protocol where
it would time out as a simple query. It'd behave as if the query was
PREPAREd then separately EXECUTEd in simple-query protocol. I'm not hugely
bothered by that, but if it's really a concern I'd ideally like to add a
new protocol message that resets the statement timeout counter, so the
client can define what delimits a statement. Not practical in the near term.

For now I'd be OK with documenting this as a quirk/limitation of
statement_timeout, that it applies to a whole extended-query-protocol
batch.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Craig Ringer
 On 30 May 2016 at 11:04, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> Hello,
>
> I'd like to ask you about the policy of application binary compatibility.
> And have a suggestion as well.
>
> QUESTION
> ==
>
> My customer asked me if the following usage is correct.
>
> - Build an embedded SQL C application with PostgreSQL 9.2.
> - Distribute the app without ecpg nor libpq libraries.  Require users to
> install PostgreSQL client which includes ecpg and libpq libraries.
>

Why?

While that's probably OK, it's not especially desirable. The typical
Windows deployment model involves the application bundling all its direct
dependencies except when those are provided by a runtime redistributable
designed for that purpose.


> - Use the app with newer PostgreSQL major versions without rebuilding the
> app.  That is, the users just replaces the PostgreSQL client.
>

... especially since there isn't a client-only PostgreSQL distribution
Windows.



> How about adding an article about application binary compatibility in the
> following section, as well as chapters for libpq, ECPG, etc?
>

It would be sensible to better define the binary compatibility expectations
that clients may rely upon and, when they are broken, a managed way in
which they're broken (soname bump, etc).

If you have an interest in the area it might be worth drafting a proposal
after taking a look at past binary compatibility discussions on -hackers.



> There are three kinds of application assets that users are concerned about
> when upgrading.  Are there anything else to mention?
>
> * libpq app
> * ECPG app
> * C-language user defined function (and other stuff dependent on it, such
> as extensions, UDTs, etc.)
>

- On-disk format
- Wire protocol
- SQL-level (data types, syntax, encoding handling, settings, ...)
- ...


>  On the other hand, the application will start on Windows and probably
cause difficult trouble due to the incompatibility.

Yeah, we just write 'libpq.dll' on Windows.

The simplest solution would be to incorporate the soname, so it becomes
libpq0509.dll for example. Like VS does with the VS runtime. The main
downside is that because it's not a true soname and the OS has no such
concept, the linker for everything compiled against that DLL must specify
the versioned DLL name, it can't just link to 'libpq' .

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Change in order of criteria - reg

2016-05-31 Thread sri harsha
Hi,

In PostgreSQL , does the order in which the criteria is given matter ??
For example

Query 1 : Select * from TABLE where a > 5 and b < 10;

Query 2 : Select * from TABLE where b <10 and a > 5;

Are query 1 and query 2 the same in PostgreSQL or different ?? If its
different , WHY ??



Thanks,

Harsha


Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT

2016-05-31 Thread Craig Ringer
On 1 June 2016 at 11:48, Michael Paquier  wrote:


> Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD
> and back-branches?
>


Sounds sensible to me.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] PostmasterPid not marked with PGDLLIMPORT

2016-05-31 Thread Michael Paquier
Hi all,

While hacking a background worker for Windows/Linux that is sending
signals to the Postmaster depending on the state of the server where
Postgres is running (particularly after a certain size threshold is
reached on the partition of PGDATA SIGINT is sent to PostmasterPid to
have it stop cleanly), I have noticed that PostmasterPid is not marked
as PGDLLIMPORT in miscadmin.h, so I cannot use the field directly in
this background worker for Windows... Bypassing this problem can be
done by parsing postmaster.pid but adding this extra logic is really
annoying as this requires a one-line change upstream...
Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD
and back-branches?

Regards,
-- 
Michael
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 356fcc6..78545da 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -143,7 +143,7 @@ do { \
 /*
  * from utils/init/globals.c
  */
-extern pid_t PostmasterPid;
+extern PGDLLIMPORT pid_t PostmasterPid;
 extern bool IsPostmasterEnvironment;
 extern PGDLLIMPORT bool IsUnderPostmaster;
 extern bool IsBackgroundWorker;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Tsunakawa, Takayuki
> From: Michael Meskes [mailto:mes...@postgresql.org]
> e.g. a random hit from google:=C2=A0https://www.bottomupcs.com/libra
> ries_and_the_linker.xhtml
> 
> There even is a wikipedia page about
> it:=C2=A0https://en.wikipedia.org/wiki/
> Soname

Thank you for good pointers.  The former is particularly nice.

> > BTW, although this may be a separate topic, it may be better that we
> > add the major version in the library names like libpq5.dll and
> > libecpg6.dll, so that the application can fail to run with the
> > incompatible versions of libpq and libecpg.=C2=A0=C2=A0FYI:
> 
> Does this mean you cannot have to versions of libpq installed on the same
> Windows system at the same time?

No, you can have different versions in separate folders, as in:

C:\Program Files\PostgreSQL\9.2
C:\Program Files\PostgreSQL\9.5

and change the PATH environment variable to point to a newer version when you 
want to use the existing application without rebuilding it.

However, the problem I pointed out is that when the new library is incompatible 
with the older one, say the major version of libpq.dll changes from 5 to 6, the 
application user and/or developer cannot notice the incompatibility immediately 
and easily.  On Unix/Linux, the application fails to start because the older 
library is not found.  On the other hand, the application will start on Windows 
and probably cause difficult trouble due to the incompatibility.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit ac1d794

2016-05-31 Thread Robert Haas
On Tue, May 31, 2016 at 10:23 PM, Tom Lane  wrote:
> Noah Misch  writes:
>> On Tue, May 31, 2016 at 10:09:05PM -0400, Robert Haas wrote:
>>> What I *think* is going on here is:
>>> - ac1d794 lowered performance
>>> - backend_flush_after with a non-zero default lowered performance with
>>> a vengeance
>>> - 98a64d0 repaired the damage done by ac1d794, or much of it, but
>>> Mithun couldn't see it in his benchmarks because backend_flush_after>0
>>> is so bad
>
>> Ashutosh Sharma's measurements do bolster that conclusion.
>
>>> That could be wrong, but I haven't seen any evidence that it's wrong.
>>> So I'm inclined to say we should just move this open item back to the
>>> CLOSE_WAIT list (adding a link to this email to explain why we did
>>> so).  Does that work for you?
>
>> That works for me.
>
> Can we make a note to re-examine this after the backend_flush_after
> business is resolved?  Or at least get Mithun to redo his benchmarks
> with backend_flush_after set to zero?

Ashutosh Sharma already did pretty much that test in the email to
which I linked.

(Ashutosh Sharma and Mithun CY work in the same office and have access
to the same hardware.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel.c is not marked as test covered

2016-05-31 Thread Noah Misch
On Sun, May 29, 2016 at 01:31:24AM -0400, Noah Misch wrote:
> On Sun, May 15, 2016 at 12:53:13PM +, Clément Prévost wrote:
> > On Mon, May 9, 2016 at 4:50 PM Andres Freund  wrote:
> > > I think it's a good idea to run a force-parallel run on some buildfarm
> > > members. But I'm rather convinced that the core tests run by all animals
> > > need some minimal coverage of parallel queries. Both because otherwise
> > > it'll be hard to get some coverage of unusual platforms, and because
> > > it's imo something rather relevant to test during development.
> > >
> > Good point.
> > 
> > After some experiments, I found out that, for my setup (9b7bfc3a88ef7b), a
> > parallel seq scan is used given both parallel_setup_cost
> > and parallel_tuple_cost are set to 0 and given that the table is at least 3
> > times as large as the biggest test table tenk1.
> > 
> > The attached patch is a regression test using this method that is
> > reasonably small and fast to run. I also hid the workers count from the
> > explain output when costs are disabled as suggested by Tom Lane and Robert
> > Haas on this same thread (
> > http://www.postgresql.org/message-id/CA+TgmobBQS4ss3+CwoZOKgbsBqKfRndwc=hlialaep5axqc...@mail.gmail.com
> > ).
> > 
> > Testing under these conditions does not test the planner job at all but at
> > least some parallel code can be run on the build farm and the test suite
> > gets 2643 more lines and 188 more function covered.
> > 
> > I don't know however if this test will be reliable on other platforms, some
> > more feedback is needed here.
> 
> [This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.
> 
> [1] 
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

I enjoy reviewing automated test patches, so I persuaded Robert to transfer
ownership of this open item to me.  I will update this thread no later than
2016-06-07 09:00 UTC.  There is an 85% chance I will have reviewed the
proposed patch by then.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-31 Thread Robert Haas
On Sun, May 29, 2016 at 1:44 AM, Noah Misch  wrote:
> On Fri, May 06, 2016 at 04:42:48PM -0400, Robert Haas wrote:
>> On Thu, May 5, 2016 at 2:20 PM, Andres Freund  wrote:
>> > On 2016-05-02 14:48:18 -0700, Andres Freund wrote:
>> > +   charnew_vmbuf[BLCKSZ];
>> > +   char   *new_cur = new_vmbuf;
>> > +   boolempty = true;
>> > +   boolold_lastpart;
>> > +
>> > +   /* Copy page header in advance */
>> > +   memcpy(new_vmbuf, , 
>> > SizeOfPageHeaderData);
>> >
>> > Shouldn't we zero out new_vmbuf? Afaics we're not necessarily zeroing it
>> > with old_lastpart && !empty, right?
>>
>> Oh, dear.  That seems like a possible data corruption bug.  Maybe we'd
>> better fix that right away (although I don't actually have time before
>> the wrap).
>
> [This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.

I am going to try to find time to look at this later this week, but
realistically it's going to be a little bit difficult to find that
time.  I was away over Memorial Day weekend and was in meetings most
of today.  I have a huge pile of email to catch up on.  I will send
another status update no later than Friday.  If Andres or anyone else
wants to jump in and fix this up meanwhile, that would be great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit ac1d794

2016-05-31 Thread Tom Lane
Noah Misch  writes:
> On Tue, May 31, 2016 at 10:09:05PM -0400, Robert Haas wrote:
>> What I *think* is going on here is:
>> - ac1d794 lowered performance
>> - backend_flush_after with a non-zero default lowered performance with
>> a vengeance
>> - 98a64d0 repaired the damage done by ac1d794, or much of it, but
>> Mithun couldn't see it in his benchmarks because backend_flush_after>0
>> is so bad

> Ashutosh Sharma's measurements do bolster that conclusion.

>> That could be wrong, but I haven't seen any evidence that it's wrong.
>> So I'm inclined to say we should just move this open item back to the
>> CLOSE_WAIT list (adding a link to this email to explain why we did
>> so).  Does that work for you?

> That works for me.  

Can we make a note to re-examine this after the backend_flush_after
business is resolved?  Or at least get Mithun to redo his benchmarks
with backend_flush_after set to zero?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit ac1d794

2016-05-31 Thread Noah Misch
On Tue, May 31, 2016 at 10:09:05PM -0400, Robert Haas wrote:
> On Sun, May 29, 2016 at 1:40 AM, Noah Misch  wrote:
> > On Fri, Dec 25, 2015 at 08:08:15PM +0300, Васильев Дмитрий wrote:
> >> I suddenly found commit ac1d794 gives up to 3 times performance 
> >> degradation.
> >>
> >> I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70 CPU-core
> >> machine:
> >> commit ac1d794 gives me 363,474 tps
> >> and previous commit a05dc4d gives me 956,146
> >> and master( 3d0c50f ) with revert ac1d794 gives me 969,265
> >
> > [This is a generic notification.]
> >
> > The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> > since you committed the patch believed to have created it, you own this open
> > item.  If some other commit is more relevant or if this does not belong as a
> > 9.6 open item, please let us know.  Otherwise, please observe the policy on
> > open item ownership[1] and send a status update within 72 hours of this
> > message.  Include a date for your subsequent status update.  Testers may
> > discover new open items at any time, and I want to plan to get them all 
> > fixed
> > well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> > efforts toward speedy resolution.  Thanks.
> 
> So, the reason this is back on the open items list is that Mithun Cy
> re-reported this problem in:
> 
> https://www.postgresql.org/message-id/CAD__OuhPmc6XH=wYRm_+Q657yQE88DakN4=ybh2ovefashk...@mail.gmail.com
> 
> When I saw that, I moved this from CLOSE_WAIT back to open.  However,
> subsequently, Ashutosh Sharma posted this, which suggests (not
> conclusively) that in fact the problem has been fixed:
> 
> https://www.postgresql.org/message-id/cae9k0pkfehvq-zg4mh0bz-zt_oe5pas6dauxrcxwx9kevwc...@mail.gmail.com
> 
> What I *think* is going on here is:
> 
> - ac1d794 lowered performance
> - backend_flush_after with a non-zero default lowered performance with
> a vengeance
> - 98a64d0 repaired the damage done by ac1d794, or much of it, but
> Mithun couldn't see it in his benchmarks because backend_flush_after>0
> is so bad

Ashutosh Sharma's measurements do bolster that conclusion.

> That could be wrong, but I haven't seen any evidence that it's wrong.
> So I'm inclined to say we should just move this open item back to the
> CLOSE_WAIT list (adding a link to this email to explain why we did
> so).  Does that work for you?

That works for me.  


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Robert Haas
On Tue, May 31, 2016 at 10:13 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, May 31, 2016 at 10:02 PM, Tom Lane  wrote:
>>> The reloption does not set an exact value, according to the code:
>
>> True, max_parallel_degree is an overriding limit.  But the point is
>> that, without the reloption, you can't get lots of workers on a small
>> table.  The reloption lets you do that.
>
> Color me skeptical that that's actually a good idea ...

I'll color you as not having read the threads where multiple users asked for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
Robert Haas  writes:
> On Tue, May 31, 2016 at 10:02 PM, Tom Lane  wrote:
>> The reloption does not set an exact value, according to the code:

> True, max_parallel_degree is an overriding limit.  But the point is
> that, without the reloption, you can't get lots of workers on a small
> table.  The reloption lets you do that.

Color me skeptical that that's actually a good idea ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard to maintain duplication in contain_volatile_functions_not_nextval_walker

2016-05-31 Thread Robert Haas
On Fri, May 27, 2016 at 10:36 PM, Amit Kapila  wrote:
> On Sat, May 28, 2016 at 12:28 AM, Andres Freund  wrote:
>> contain_volatile_functions_walker is duplicated, near entirely, in
>> contain_volatile_functions_not_nextval_walker.
>
> Previously, I also had same observation.
>
>> Wouldn't it have been better not to duplicate, and keep a flag about
>> ignoring nextval in the context variable?
>
> makes sense.  +1 for doing it in the way as you are suggesting.

+1 from me, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Robert Haas
On Tue, May 31, 2016 at 10:02 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Now, this case is a little trickier.  If we called it simply
>> parallel_degree rather than max_parallel_degree, then it would have
>> the same name as the reloption.  But the reloption sets an exact
>> value, and the GUC sets a cap, which is a significant difference.
>
> The reloption does not set an exact value, according to the code:
>
> /*
>  * Use the table parallel_degree, but don't go further than
>  * max_parallel_degree.
>  */
> parallel_degree = Min(rel->rel_parallel_degree, max_parallel_degree);
>
> although the existing documentation for it is so vague that you
> couldn't tell from that.

True, max_parallel_degree is an overriding limit.  But the point is
that, without the reloption, you can't get lots of workers on a small
table.  The reloption lets you do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit ac1d794

2016-05-31 Thread Robert Haas
On Sun, May 29, 2016 at 1:40 AM, Noah Misch  wrote:
> On Fri, Dec 25, 2015 at 08:08:15PM +0300, Васильев Дмитрий wrote:
>> I suddenly found commit ac1d794 gives up to 3 times performance degradation.
>>
>> I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70 CPU-core
>> machine:
>> commit ac1d794 gives me 363,474 tps
>> and previous commit a05dc4d gives me 956,146
>> and master( 3d0c50f ) with revert ac1d794 gives me 969,265
>
> [This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.

So, the reason this is back on the open items list is that Mithun Cy
re-reported this problem in:

https://www.postgresql.org/message-id/CAD__OuhPmc6XH=wYRm_+Q657yQE88DakN4=ybh2ovefashk...@mail.gmail.com

When I saw that, I moved this from CLOSE_WAIT back to open.  However,
subsequently, Ashutosh Sharma posted this, which suggests (not
conclusively) that in fact the problem has been fixed:

https://www.postgresql.org/message-id/cae9k0pkfehvq-zg4mh0bz-zt_oe5pas6dauxrcxwx9kevwc...@mail.gmail.com

What I *think* is going on here is:

- ac1d794 lowered performance
- backend_flush_after with a non-zero default lowered performance with
a vengeance
- 98a64d0 repaired the damage done by ac1d794, or much of it, but
Mithun couldn't see it in his benchmarks because backend_flush_after>0
is so bad

That could be wrong, but I haven't seen any evidence that it's wrong.
So I'm inclined to say we should just move this open item back to the
CLOSE_WAIT list (adding a link to this email to explain why we did
so).  Does that work for you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
Robert Haas  writes:
> Now, this case is a little trickier.  If we called it simply
> parallel_degree rather than max_parallel_degree, then it would have
> the same name as the reloption.  But the reloption sets an exact
> value, and the GUC sets a cap, which is a significant difference.

The reloption does not set an exact value, according to the code:

/*
 * Use the table parallel_degree, but don't go further than
 * max_parallel_degree.
 */
parallel_degree = Min(rel->rel_parallel_degree, max_parallel_degree);

although the existing documentation for it is so vague that you
couldn't tell from that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Robert Haas
On Tue, May 31, 2016 at 8:51 PM, Peter Eisentraut
 wrote:
> On 5/31/16 4:04 PM, Tom Lane wrote:
>> The name should be closely related to what we use for #3.  I could go for
>> max_total_parallel_workers for #2 and max_parallel_workers for #3.
>> Or maybe max_parallel_workers_total?
>
> Most cluster-wide settings like this are named max_something
> (max_connections, max_wal_senders, max_replication_slots), whereas things
> that apply on a lower level are named max_something_per_something
> (max_files_per_process, max_locks_per_transations).
>
> So let's leave max_worker_processes mostly alone and not add any _total_,
> _absolute_, _altogether_. ;-)

That's interesting, because it suggests that max_parallel_degree might
end up being called something that doesn't begin with "max".  Which is
an interesting line of thought.  Now, it does function as a maximum of
sorts, but that doesn't necessarily imply that it has to have max in
the name.  By way of analogy, work_mem is not called max_work_mem, yet
everybody still understands that the actual memory used might be less
than the configured value.

Now, this case is a little trickier.  If we called it simply
parallel_degree rather than max_parallel_degree, then it would have
the same name as the reloption.  But the reloption sets an exact
value, and the GUC sets a cap, which is a significant difference.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Robert Haas
On Tue, May 31, 2016 at 5:58 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Robert Haas wrote:
>>> I just want to point out that if we change #1, we're breaking
>>> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
>>> I'd just leave it alone.
>
>> We can add the old name as a synonym in guc.c to maintain compatibility.
>
> I doubt this is much of an issue at this point; max_worker_processes has
> only been there a release or so, and surely there are very few people
> explicitly setting it, given its limited use-case up to now.  It will be
> really hard to change it after 9.6, but I think we could still get away
> with that today.

max_worker_processes was added in 9.4, so it's been there for two
releases, but it probably is true that few people have set it.
Nevertheless, I don't think there's much evidence that it is a bad
enough name that we really must change it.

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout

2016-05-31 Thread Amit Kapila
On Wed, Jun 1, 2016 at 6:03 AM, Tatsuo Ishii  wrote:
>
> From the document about statement_timeout (config.sgml):
>
> Abort any statement that takes more than the specified number of
> milliseconds, starting from the time the command arrives at the
server
> from the client.  If log_min_error_statement is set to
> ERROR or lower, the statement that timed out will
also be
> logged.  A value of zero (the default) turns this off.
>
> Apparently "starting from the time the command arrives at the server
> from the client" referrers to the timing of execute message arrives to
> server the in my example. And I think current behavior of our code is
> doing different from what he document advertises. Or at least counter
> intuitive to users.
>

It seems to me the above documentation is more relevant with respect to
simple query.  However, I agree that it is better if statement_timeout is
the timeout for each execution of the parsed statement.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-05-31 Thread Tsunakawa, Takayuki
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Marco Atzeri
> on cygwin the postgresql binary package already include the library
> versions:
> 
>/usr/bin/cygecpg-6.dll
>/usr/bin/cygecpg_compat-3.dll
>/usr/bin/cygpgtypes-3.dll
>/usr/bin/cygpq-5.dll
> 
> attached the patch used for the build.

Thanks for the information.   I didn't know there's a PostgreSQL binary package 
for Cygwin.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Eisentraut

On 5/31/16 4:04 PM, Tom Lane wrote:

The name should be closely related to what we use for #3.  I could go for
max_total_parallel_workers for #2 and max_parallel_workers for #3.
Or maybe max_parallel_workers_total?


Most cluster-wide settings like this are named max_something 
(max_connections, max_wal_senders, max_replication_slots), whereas 
things that apply on a lower level are named max_something_per_something 
(max_files_per_process, max_locks_per_transations).


So let's leave max_worker_processes mostly alone and not add any 
_total_, _absolute_, _altogether_. ;-)


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename synchronous_standby_names?

2016-05-31 Thread Michael Paquier
On Wed, Jun 1, 2016 at 3:56 AM, David G. Johnston
 wrote:
> On Tue, May 31, 2016 at 2:19 PM, Peter Eisentraut
>  wrote:
>>
>> On 5/31/16 1:47 PM, Jaime Casanova wrote:
>>>
>>> Are we going to change synchronous_standby_names? Certainly the GUC is
>>> not *only* a list of names anymore.
>>>
>>> synchronous_standby_config?
>>> synchronous_standbys (adjust to correct english if necesary)?
>>
>>
>> If the existing values are still going to be accepted, then I would leave
>> it as is.
>
>
> +1

+1. We've made quite a lot of deal to take an approach for the N-sync
that is 100% backward-compatible, it would be good to not break that
effort.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] User demand, and idea, for C-code conversions from JSON arrays to PostgreSQL arrays

2016-05-31 Thread David G. Johnston
All,

Oven in the "JSON[B] arrays are second-class citizens" thread [1] I made
the observation that the only way to get a PostgreSQL array from a JSON
array is via the "elements->cast->array_agg" chain.  For JSON arrays that
are homogeneous in nature the capability to go "directly" from JSON to
json[], text[], bigint[], etc... seems like it would have some value.

A couple, so far, of community members have chimed in.  Maybe this ends up
on a ToDo somewhere but for the moment I figured I'd float the idea on its
own thread since the other one is really about a different (though somewhat
related) topic.

In the spirit of the json_populate_record like functions something with the
following signature seems usable:

as_array(anyarray, jsonb) : anyarray  [2]

so that actual calls look like:

SELECT as_array(null::text[], '["a", "b", "c"]'::jsonb)

For better or worse while every table gets a corresponding composite type
explicitly created types cannot be treated as row types in this situation

SELECT jsonb_populate_record(null::text[], '["a", "b", "c"]'::jsonb)
> ERROR: first argument of jsonb_populate_record must be a row type

Loosening the restriction and allowing jsonb_populate_record to fulfill the
role of the above described "as_array" function would be something to
consider, but likely not possible or worth any effort in trying to make
happen.

David J.

[1]
https://www.postgresql.org/message-id/CADkLM=fSC+otuBmzoJT6Riyksue3HpHgu2=Mofcv=fd0der...@mail.gmail.com

[2] ROTQ[3] - whose idea was it to put the type first and the data second?

[3] Rhetorical Off-Topic Question


Re: [HACKERS] Statement timeout

2016-05-31 Thread Tatsuo Ishii
> FWIW, I think the existing behavior is just fine.  It corresponds to what
> PQexec has always done with multi-statement query strings; that is,
> statement_timeout governs the total time to execute the transaction (the
> whole query string, unless you put transaction control commands in there).
> In extended query mode, since you can only put one textual query per Parse
> message, that maps to statement_timeout governing the total time from
> initial Parse to Sync.  Which is what it does.

I've never thought about that. And I cannot imagine anyone is using
that way in extended protocol to simulate multi-statement queries. Is
that documented somewhere?

> What you propose here is not a bug fix but a redefinition of what
> statement_timeout does; and you've made it inconsistent with simple query
> mode.  I don't really think it's an improvement.

>From the document about statement_timeout (config.sgml):

Abort any statement that takes more than the specified number of
milliseconds, starting from the time the command arrives at the server
from the client.  If log_min_error_statement is set to
ERROR or lower, the statement that timed out will also be
logged.  A value of zero (the default) turns this off.

Apparently "starting from the time the command arrives at the server
from the client" referrers to the timing of execute message arrives to
server the in my example. And I think current behavior of our code is
doing different from what he document advertises. Or at least counter
intuitive to users.

> BTW, aren't you missing a re-enable of the timeout for statements after
> the first one?

Will check.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COMMENT ON, psql and access methods

2016-05-31 Thread Michael Paquier
On Wed, Jun 1, 2016 at 4:52 AM, Robert Haas  wrote:
> On Fri, May 27, 2016 at 7:53 AM, Michael Paquier
>  wrote:
>> As far as I can see, COMMENT ON has no support for access methods.
>> Wouldn't we want to add it as it is created by a command? On top of
>> that, perhaps we could have a backslash command in psql to list the
>> supported access methods, like \dam[S]? The system methods would be in
>> this case all the in-core ones.
>
> +1.

Are there other opinions? That's not a 9.6 blocker IMO, so I could get
patches out for 10.0 if needed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread Peter van Hardenberg
The idea of converting a JSONB array to a PG array is appealing and would
potentially be more general-purpose than adding a new unnest. I'm not sure
how feasible either suggestion is.

I will say that I think the current state of affairs is gratuitously
verbose and expects users to memorize a substantial number of long function
names to perform simple tasks.

-p


On Tue, May 31, 2016 at 2:06 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, May 31, 2016 at 4:34 PM, David Fetter  wrote:
>
>> Folks,
>>
>> While querying some JSONB blobs at work in preparation for a massive
>> rework of the data infrastructure, I ran into things that really
>> puzzled me, to wit:
>>
>> SELECT * FROM unnest('["a","b","c"]'::jsonb);
>> ERROR:  function unnest(jsonb) does not exist
>>
>> SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
>>  value
>> ───
>>  "a"
>>  "b"
>>  "c"
>> (3 rows)
>>
>>
> ​I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that
> we endeavor to emulate.
>
> Having an overloaded form:  is unappealing.
> While likely not that common the introduction of an ambiguity makes raises
> the bar considerably.
>
> That said we do seem to be lacking any easy way to take a json array and
> attempt to convert it directly into a PostgreSQL array.  Just a conversion
> is not always going to succeed though the capability seems worthwhile if as
> yet unasked for.  The each->convert->array_agg pattern works but is likely
> inefficient for homogeneous json array cases.
>
> David J.
>



-- 
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut


Re: [HACKERS] Parallel safety tagging of extension functions

2016-05-31 Thread Andreas Karlsson

On 05/31/2016 06:47 PM, Tom Lane wrote:

Given that, your original approach of manually updating proargtypes in the
existing pg_proc row for the functions may be the best way.  Anything else
is going to be more complicated and ultimately will still require at least
one direct catalog update.


It is the least ugly of all the ugly solutions I could think of. I have 
attached a patch which fixes the signatures using this method. I use 
CREATE OR REPLACE FUNCTION to update to catcache. What do you think? Is 
it too ugly?


Andreas


gin-gist-signatures-v1.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread Tom Lane
David Fetter  writes:
> On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
>> While likely not that common the introduction of an ambiguity makes
>> raises the bar considerably.

> What ambiguity?

My first thought about it was that

select unnest('{1,2,3}');

would start failing.  But it turns out it already does fail:

ERROR:  function unnest(unknown) is not unique

You get that as a result of the recent introduction of unnest(tsvector),
which we debated a few weeks ago and seem to have decided to leave as-is.
But it failed before 9.6 too, with

ERROR:  could not determine polymorphic type because input has type "unknown"

So at least in this particular case, adding unnest(jsonb) wouldn't be a
problem from the standpoint of not being able to resolve calls that we
could resolve before.

Nonetheless, there *is* an ambiguity here, which is specific to json(b):
what type of array are you expecting to get?  The reason we have both
json[b]_array_elements() and json[b]_array_elements_text() is that there
are plausible use-cases for returning either json or plain text.  It's not
hard to imagine that somebody will want json[b]_array_elements_numeric()
before long, too.  If you want to have an unnest(jsonb) then you will need
to make an arbitrary decision about which type it will return, and that
doesn't seem like an especially great idea to me.


> UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY
> than the json_array_elements-like functions do.

AFAICT, this is nonsense.  We did not tie WITH ORDINALITY to UNNEST;
it works for any set-returning function.

regression=# select * from unnest(array[1,2,3]) with ordinality;
 unnest | ordinality 
+
  1 |  1
  2 |  2
  3 |  3
(3 rows)

regression=# select * from jsonb_array_elements('["a","b","c"]'::jsonb) with 
ordinality;
 value | ordinality 
---+
 "a"   |  1
 "b"   |  2
 "c"   |  3
(3 rows)


regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread David G. Johnston
On Tue, May 31, 2016 at 5:46 PM, David Fetter  wrote:

> On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
> > On Tue, May 31, 2016 at 4:34 PM, David Fetter  wrote:
> >
> > > Folks,
> > >
> > > While querying some JSONB blobs at work in preparation for a massive
> > > rework of the data infrastructure, I ran into things that really
> > > puzzled me, to wit:
> > >
> > > SELECT * FROM unnest('["a","b","c"]'::jsonb);
> > > ERROR:  function unnest(jsonb) does not exist
> > >
> > >
> ​​
> SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
> > >  value
> > > ───
> > >  "a"
> > >  "b"
> > >  "c"
> > > (3 rows)
> > ​I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle
> that
> > we endeavor to emulate.
>
> You cut out the part where I introduced the part that's not
> equivalent, so "there is more than one way to do it" isn't on point
> here.  More on that specific issue below.
>
> UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY
> than the json_array_elements-like functions do.

​  Is it really more

> efficient to build and maintain those capabilities separately in the
> JSON[B] set-returning functions, or to force our end users to use
> atrocious hacks like putting
> generate_series(1,jsonb_array_length(foo)) in the target list than
> just to make UNNEST and ROWS FROM do the right thing?
>
>
​​Both of these work in 9.5...

​
​
SELECT * FROM ROWS FROM (jsonb_array_elements('["a","b","c"]'::jsonb)) WITH
ORDINALITY
​value | ordinality
--
"a" | 1
"b" | 2
"c" | 3​​​


> > Having an overloaded form:  is unappealing.
>
> On what grounds?
>

​Aesthetic.  YMAV (your mileage apparently varies).


> > While likely not that common the introduction of an ambiguity makes
> > raises the bar considerably.
>
> What ambiguity?
>
> SELECT jsonb_array_elements('{"a":2,"b":1}'::jsonb);
> ERROR:  cannot extract elements from an object
>
>
I stand corrected.  I was thinking you could somehow craft unnest('') but there is no way to auto-convert to "anyarray"...

The json_array_elements family manages to do the right thing.  Why
> would it be harder to make sure UNNEST and ROWS FROM() do so?
>

I have apparently failed to understand your point.  All I saw was that you
wanted "unnest(jsonb)" to work in an identical fashion to
"​jsonb_array_elements(jsonb)".  If there is some aspect beyond this being
an aliasing situation then you have failed to communicate it such that I
comprehended that fact.


> > That said we do seem to be lacking any easy way to take a json array and
> > attempt to convert it directly into a PostgreSQL array.  Just a
> conversion
> > is not always going to succeed though the capability seems worthwhile if
> as
> > yet unasked for.  The each->convert->array_agg pattern works but is
> likely
> > inefficient for homogeneous json array cases.
>
> To your earlier point about "there is more than one way to do it," we
> have made no attempt to make some sort of language composed of
> orthonormal vectors, and the SQL standard actually requires that we
> not do so.  For a trivial example, there's
>
> SELECT * FROM foo
> and
> TABLE foo
>
> which are equivalent and both spelled out in the standard.
>
>
​And would we have done so had we not been compelled to by the standard?

Maybe it should be enshrined somewhere more obvious but Alvaro, and
indirectly Tom, recently made the same claim[1] against TIMTOWTDI so I feel
justified making such a claim on behalf of the project - as well as my
general personal feeling.  Adding user-friendly UI to the system, to
correct deficiencies, does have merit, but that is not what I personally
see here.

​David J.​

[1]
https://www.postgresql.org/message-id/20160405164951.GA286879@alvherre.pgsql


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
Alvaro Herrera  writes:
> Robert Haas wrote:
>> I just want to point out that if we change #1, we're breaking
>> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
>> I'd just leave it alone.

> We can add the old name as a synonym in guc.c to maintain compatibility.

I doubt this is much of an issue at this point; max_worker_processes has
only been there a release or so, and surely there are very few people
explicitly setting it, given its limited use-case up to now.  It will be
really hard to change it after 9.6, but I think we could still get away
with that today.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread David Fetter
On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote:
> On Tue, May 31, 2016 at 4:34 PM, David Fetter  wrote:
> 
> > Folks,
> >
> > While querying some JSONB blobs at work in preparation for a massive
> > rework of the data infrastructure, I ran into things that really
> > puzzled me, to wit:
> >
> > SELECT * FROM unnest('["a","b","c"]'::jsonb);
> > ERROR:  function unnest(jsonb) does not exist
> >
> > SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
> >  value
> > ───
> >  "a"
> >  "b"
> >  "c"
> > (3 rows)
> ​I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that
> we endeavor to emulate.

You cut out the part where I introduced the part that's not
equivalent, so "there is more than one way to do it" isn't on point
here.  More on that specific issue below.

UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY
than the json_array_elements-like functions do.  Is it really more
efficient to build and maintain those capabilities separately in the
JSON[B] set-returning functions, or to force our end users to use
atrocious hacks like putting
generate_series(1,jsonb_array_length(foo)) in the target list than
just to make UNNEST and ROWS FROM do the right thing?

> Having an overloaded form:  is unappealing.

On what grounds?

> While likely not that common the introduction of an ambiguity makes
> raises the bar considerably.

What ambiguity?

SELECT jsonb_array_elements('{"a":2,"b":1}'::jsonb);
ERROR:  cannot extract elements from an object

The json_array_elements family manages to do the right thing.  Why
would it be harder to make sure UNNEST and ROWS FROM() do so?

> That said we do seem to be lacking any easy way to take a json array and
> attempt to convert it directly into a PostgreSQL array.  Just a conversion
> is not always going to succeed though the capability seems worthwhile if as
> yet unasked for.  The each->convert->array_agg pattern works but is likely
> inefficient for homogeneous json array cases.

To your earlier point about "there is more than one way to do it," we
have made no attempt to make some sort of language composed of
orthonormal vectors, and the SQL standard actually requires that we
not do so.  For a trivial example, there's

SELECT * FROM foo
and
TABLE foo

which are equivalent and both spelled out in the standard.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread Corey Huinker
On Tue, May 31, 2016 at 5:06 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, May 31, 2016 at 4:34 PM, David Fetter  wrote:
>
>> Folks,
>>
>> While querying some JSONB blobs at work in preparation for a massive
>> rework of the data infrastructure, I ran into things that really
>> puzzled me, to wit:
>>
>> SELECT * FROM unnest('["a","b","c"]'::jsonb);
>> ERROR:  function unnest(jsonb) does not exist
>>
>> SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
>>  value
>> ───
>>  "a"
>>  "b"
>>  "c"
>> (3 rows)
>>
>>
> ​I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that
> we endeavor to emulate.
>
> Having an overloaded form:  is unappealing.
> While likely not that common the introduction of an ambiguity makes raises
> the bar considerably.
>
> That said we do seem to be lacking any easy way to take a json array and
> attempt to convert it directly into a PostgreSQL array.  Just a conversion
> is not always going to succeed though the capability seems worthwhile if as
> yet unasked for.  The each->convert->array_agg pattern works but is likely
> inefficient for homogeneous json array cases.
>
> David J.
>

If there is no list of people asking for that function, let me be the first.

In the mean time, I've resigned myself to carting this around from db to
db...

create function jsonb_array_to_text_array(jsonb_arr jsonb) returns text[]
language sql as $$
select  array_agg(r) from jsonb_array_elements_text(jsonb_arr) r;
$$;


Re: [HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread David G. Johnston
On Tue, May 31, 2016 at 4:34 PM, David Fetter  wrote:

> Folks,
>
> While querying some JSONB blobs at work in preparation for a massive
> rework of the data infrastructure, I ran into things that really
> puzzled me, to wit:
>
> SELECT * FROM unnest('["a","b","c"]'::jsonb);
> ERROR:  function unnest(jsonb) does not exist
>
> SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
>  value
> ───
>  "a"
>  "b"
>  "c"
> (3 rows)
>
>
​I'd be inclined to -1 such a proposal.  TIMTOWTDI is not a principle that
we endeavor to emulate.

Having an overloaded form:  is unappealing.
While likely not that common the introduction of an ambiguity makes raises
the bar considerably.

That said we do seem to be lacking any easy way to take a json array and
attempt to convert it directly into a PostgreSQL array.  Just a conversion
is not always going to succeed though the capability seems worthwhile if as
yet unasked for.  The each->convert->array_agg pattern works but is likely
inefficient for homogeneous json array cases.

David J.


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston
On Tue, May 31, 2016 at 4:12 PM, Robert Haas  wrote:

> On Tue, May 31, 2016 at 4:04 PM, Tom Lane  wrote:
> > Alvaro Herrera  writes:
> >> Robert Haas wrote:
> >>> So I think in the long run we should have three limits:
> >>>
> >>> 1. Cluster-wide limit on number of worker processes for all purposes
> >>> (currently, max_worker_processes).
> >>>
> >>> 2. Cluster-wide limit on number of worker processes for parallelism
> >>> (don't have this yet).
> >>>
> >>> 3. Per-operation limit on number of worker processes for parallelism
> >>> (currently, max_parallel_degree).
> >>>
> >>> Whatever we rename, there needs to be enough semantic space between #1
> >>> and #3 to allow for the possibility - I think the very likely
> >>> possibility - that we will eventually also want #2.
> >
> >> max_background_workers sounds fine to me for #1, and I propose to add #2
> >> in 9.6 rather than wait.
> >
> > +1 to both points.
> >
> >> max_total_parallel_query_workers ?
> >
> > The name should be closely related to what we use for #3.  I could go for
> > max_total_parallel_workers for #2 and max_parallel_workers for #3.
> > Or maybe max_parallel_workers_total?
>
> I just want to point out that if we change #1, we're breaking
> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
> I'd just leave it alone.
>

​+1

We shall assume that the DBA has set the value of max_worker_processes
​according to hardware specifications without regard to what exactly could
be parallelized.


>
> I would propose to call #2 max_parallel_workers and #3
> max_parallel_degree, which I think is clear as daylight, but since I
> invented all of these names, I guess I'm biased.
>

​So we have operation, session, and cluster scope yet no way to know which
is which without memorizing the documentation.​

​While we cannot enforce this I'd say any of these that are intended to be
changed by the user should have functions published as their official API.
The name of the function can be made to be user friendly.  For all the
others pick a name with something closer to the 64 character limit that is
as expressive as possible so that seeing the name in postgresql.conf is all
person really needs to see to know that they are changing the right thing.​

I say you expectations are off if you want to encode these differences by
using workers and degree at the same time.  Lets go for a part-time DBA
vocabulary here.  I get the merit of degree, and by itself it might even be
OK, but in our usage degree is theory while workers are actual and I'd say
people are likely to relate better to the later.

David J.


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan
On Tue, May 31, 2016 at 1:27 PM, Alvaro Herrera
 wrote:
> This is a very good point.
>
> I think parallel maintenance commands are going to require different
> tuning than different queries, and I'd rather have separate parameters
> for those things rather than force the same parameter being changed over
> and over depending on what is the session going to execute next.

FWIW, as things stand my parallel CREATE INDEX patch does have a cost
model which is pretty close to the one for parallel sequential scan.
The cost model cares about max_parallel_degree. However, it also
introduces a parallel_degree index storage parameter, which overrides
the cost model to make it indicate the number of parallel workers to
use (presumably, somebody will change master to make parallel_degree
in terms of "participant processes" rather than worker processes soon,
at which point I'll follow their lead).

The storage parameter sticks around, of course, so a REINDEX will
reuse it without asking. (I think CLUSTER should do the same with the
index storage parameter.)

What's novel about this is that for utility statements, you can
override the cost model, and so disregard max_parallel_degree if you
so choose. My guess is that DBAs will frequently want to do so.

I'm not attached to any of this, but that's what I've come up with to
date. Possibly the index storage parameter concept has baggage
elsewhere, which comes up when we later go to parallelize index scans.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] JSON[B] arrays are second-class citizens

2016-05-31 Thread David Fetter
Folks,

While querying some JSONB blobs at work in preparation for a massive
rework of the data infrastructure, I ran into things that really
puzzled me, to wit:

SELECT * FROM unnest('["a","b","c"]'::jsonb);
ERROR:  function unnest(jsonb) does not exist

SELECT * FROM jsonb_array_elements('["a","b","c"]'::jsonb);
 value 
───
 "a"
 "b"
 "c"
(3 rows)

Similar things happen with the other functions matching

jsonb?_array_elements(_text)?

These functions correctly identify JSON[B] things which are not, at
their top level, arrays, and error out appropriately.

What this hints to me is that json_array_elements() and friends have
access to things that at least in theory UNNEST could have access to.

Is making those things accessible to UNNEST, etc., a reasonable
direction to go?

Another option I came up with is to make functions that match

jsonb?_array_elements(_text)?(_with_ordinality), but that seems
somewhat tedious and error-prone on the maintenance side.

What say?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

2016-05-31 Thread Jeffrey.Marshall
Hi Folks!

The permissions on the RECOVERYXLOG file at the time of the error are 0400:
-r. 1 postgres postgres 16777216 May 31 09:51 RECOVERYXLOG

I sent that info to Tom earlier this afternoon (still learning the posting 
protocols - sorry) - his response is below:

From Tom:

Ah, that confirms the diagnosis --- the new fsync code is assuming that all 
files it might need to fsync are writable by the postgres user.

What would be useful to know is how it got to be that way rather than
0600 (-rw---).  Seems like it must be a property of your archive restore 
script, but we don't have the details.  Maybe you could show the script?

Also, I don't recall whether you said exactly what platform you're on, but 
that's probably important here as well.

regards, tom lane


OPERATING SYSTEM INFORMATION:

The database server is running on a Redhat Linux host (Red Hat Enterprise Linux 
Server release 6.8 (Santiago)) -

[postgres-pt(at)postXX pg_xlog]$ uname -a
Linux  2.6.32-573.22.1.el6.x86_64 #1 SMP Thu Mar 17 03:23:39 EDT 2016 x86_64 
x86_64 x86_64 GNU/Linux


The user account that creates the backup files is the same account that is used 
for performing the restores - and at no time are any elevated privileges used.  
The script we use for restoring our databases is not overly complex - a quick 
summary is below:

Save config files
Remove tablespace directories  
Remove cluster directory
untar tablespace backup files
untar cluster backup file
copy back the config files

At this point in the script the pg_xlog directory is cleaned out - there are no 
files in the pg_xlog directory when the restore starts.  The database server is 
then started.


Below are the contents of the pg_xlog directory from the unsuccessful restore 
attempts:

[postgres@postZZ postXX]$ pwd
/pgdata/pgsrv_xlog_data/postXX
[postgres@postZZ postXX]$ ls -lR
.:
total 4
drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0


[postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
pg_ctl: another server might be running; trying to start server anyway
server starting
2016-05-31 14:59:18 EDT 574ddf06.7d10 [1-1] user=,db=,remote= LOG:  loaded 
library "pg_stat_statements"


[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-r 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0


Even if the permissions on the RECOVERYXLOG file are change (0777 in this case) 
and the file is left in the pg_xlog directory, the postgres process will 
recreate the file as 0400:



[postgres@postZZ postXX]$ chmod 777 RECOVERYXLOG
[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-rwxrwxrwx 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0

[postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
server starting
2016-05-31 14:59:46 EDT 574ddf21.7d25 [1-1] user=,db=,remote= LOG:  loaded 
library "pg_stat_statements"


[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-r 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0



Interestingly enough, in the 9.2.15 version (successful restore), the file is 
also created 0400:



[postgres@postZZ postXX]$ pwd
/pgdata/pgsrv_xlog_data/postXX


[postgres@postZZ postXX]$ ls -lR
.:
total 4
drwx-- 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0

[postgres@postZZ postXX]$  pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
server starting
2016-05-31 15:36:27 EDT 574de7bb.7eff [1-1] user=,db=,remote= LOG:  loaded 
library "pg_stat_statements"


[postgres@postZZ postXX]$ ls -lR
.:
total 65544
-r 1 postgres postgres 16777216 May 31 14:46 000100050022
-rw--- 1 postgres postgres   56 May 31 14:46 0002.history
-rw--- 1 postgres postgres 16777216 May 31 14:46 000200050022
-rw--- 1 postgres postgres 16777216 May 31 14:46 000200050023
-rw--- 1 postgres postgres 16777216 May 31 14:46 000200050024
drwx-- 2 postgres postgres 4096 May 31 14:46 archive_status

./archive_status:
total 0
-rw--- 1 postgres postgres 0 May 31 14:46 000100050022.done
-rw--- 1 postgres postgres 0 May 31 14:46 0002.history.done
-rw--- 1 postgres postgres 0 May 31 14:46 000200050023.done





Some additional info.  We are in the initial stages of upgrading from 9.2 to 
9.5.  I upgraded one of our database servers to 9.5.1, then backed up the 
upgraded database with our same backup process.  Using both the 9.5.2 
executables (where the fsync/rename patch was introduced) and the 9.5.3 
executables I was able to successfully restore the 9.5.1 backup (same host/user 
account which produces an error 

Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Alvaro Herrera
Robert Haas wrote:

> I just want to point out that if we change #1, we're breaking
> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
> I'd just leave it alone.

We can add the old name as a synonym in guc.c to maintain compatibility.

> I would propose to call #2 max_parallel_workers and #3
> max_parallel_degree, which I think is clear as daylight, but since I
> invented all of these names, I guess I'm biased.

You are biased :-)

> In terms of forward-compatibility, one thing to note is that we
> currently have only parallel QUERY, but in the future we will no doubt
> also have parallel operations that are not queries, like VACUUM and
> CLUSTER and ALTER TABLE.  If we put the word "query" into the name of
> a GUC, we're committing to the idea that it only applies to queries,
> and that there will be some other limit for non-queries.  If we don't
> put the word query in there, we are not necessarily committing to the
> reverse, but we're at least leaning in that direction.  So far I've
> largely dodged having to make a decision about this, because talking
> about the degree of parallelism for CLUSTER makes just as much sense
> as talking about the degree of parallelism for a query, but we could
> also decide to have e.g. maintenance_parallel_degree instead of
> max_parallel_degree for such operations.  But as we commit to names
> it's something to be aware of.

This is a very good point.

I think parallel maintenance commands are going to require different
tuning than different queries, and I'd rather have separate parameters
for those things rather than force the same parameter being changed over
and over depending on what is the session going to execute next.

Also, I think your proposed change from "max" to "maintenance" does not
make much sense; I'd rather have max_maintenance_parallel_degree.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet

2016-05-31 Thread David G. Johnston
On Tue, May 31, 2016 at 3:55 PM, Nikolay Shaplov 
wrote:

> В письме от 31 мая 2016 15:38:38 пользователь Robert Haas написал:
>
> > >>> 99% of the time, you'd be right.  But this is an unusual case, for
> the
> > >>> reasons I mentioned before.
> > >>
> > >> I tend to agree with Nikolay.  I can't see much upside in making this
> > >> change.  At best, nothing will break.  At worst, something will break.
> > >> But how do we actually come out ahead?
> > >
> > > We come out ahead by not having to make the documentation more
> confusing.
> > >
> > > Basically, we have the opportunity to fix an ancient mistake here at
> > > very low cost.  I do not think that doubling down on the mistake is
> > > a better answer.
> >
> > I'm not convinced, but we don't have to agree on everything...
> I am not convinced too. But I will not argue hard for the patch as far as
> my
> main goal was to report inconsistency. Through the I consider Tom's
> proposal
> quite strange...
>
>
​We've recently chosen to not document the "ANALYZE -> ANALYSE" syntax, and
I'm sure there are other examples, so I don't see why the status quo
(pre-Tom's patch) is unacceptable...if adding USING to the synopsis is
prone to cause confusion then don't; but lets not break existing uses that
in no way harm the project.

Otherwise I presume Tom is correct that the true fix is more than a single
word in one file of our documentation.  If you want to see it stay and be
documented there needs to be a complete proposal that at least gets, even
if grudging, approval from a couple of people and a committer.

David J.
​


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Robert Haas
On Tue, May 31, 2016 at 4:04 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Robert Haas wrote:
>>> So I think in the long run we should have three limits:
>>>
>>> 1. Cluster-wide limit on number of worker processes for all purposes
>>> (currently, max_worker_processes).
>>>
>>> 2. Cluster-wide limit on number of worker processes for parallelism
>>> (don't have this yet).
>>>
>>> 3. Per-operation limit on number of worker processes for parallelism
>>> (currently, max_parallel_degree).
>>>
>>> Whatever we rename, there needs to be enough semantic space between #1
>>> and #3 to allow for the possibility - I think the very likely
>>> possibility - that we will eventually also want #2.
>
>> max_background_workers sounds fine to me for #1, and I propose to add #2
>> in 9.6 rather than wait.
>
> +1 to both points.
>
>> max_total_parallel_query_workers ?
>
> The name should be closely related to what we use for #3.  I could go for
> max_total_parallel_workers for #2 and max_parallel_workers for #3.
> Or maybe max_parallel_workers_total?

I just want to point out that if we change #1, we're breaking
postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
I'd just leave it alone.

I would propose to call #2 max_parallel_workers and #3
max_parallel_degree, which I think is clear as daylight, but since I
invented all of these names, I guess I'm biased.

In terms of forward-compatibility, one thing to note is that we
currently have only parallel QUERY, but in the future we will no doubt
also have parallel operations that are not queries, like VACUUM and
CLUSTER and ALTER TABLE.  If we put the word "query" into the name of
a GUC, we're committing to the idea that it only applies to queries,
and that there will be some other limit for non-queries.  If we don't
put the word query in there, we are not necessarily committing to the
reverse, but we're at least leaning in that direction.  So far I've
largely dodged having to make a decision about this, because talking
about the degree of parallelism for CLUSTER makes just as much sense
as talking about the degree of parallelism for a query, but we could
also decide to have e.g. maintenance_parallel_degree instead of
max_parallel_degree for such operations.  But as we commit to names
it's something to be aware of.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus
On 05/31/2016 01:04 PM, Tom Lane wrote:
> The name should be closely related to what we use for #3.  I could go for
> max_total_parallel_workers for #2 and max_parallel_workers for #3.
> Or maybe max_parallel_workers_total?

How about parallel_worker_pool?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
Alvaro Herrera  writes:
> Robert Haas wrote:
>> So I think in the long run we should have three limits:
>> 
>> 1. Cluster-wide limit on number of worker processes for all purposes
>> (currently, max_worker_processes).
>> 
>> 2. Cluster-wide limit on number of worker processes for parallelism
>> (don't have this yet).
>> 
>> 3. Per-operation limit on number of worker processes for parallelism
>> (currently, max_parallel_degree).
>> 
>> Whatever we rename, there needs to be enough semantic space between #1
>> and #3 to allow for the possibility - I think the very likely
>> possibility - that we will eventually also want #2.

> max_background_workers sounds fine to me for #1, and I propose to add #2
> in 9.6 rather than wait.

+1 to both points.

> max_total_parallel_query_workers ?

The name should be closely related to what we use for #3.  I could go for
max_total_parallel_workers for #2 and max_parallel_workers for #3.
Or maybe max_parallel_workers_total?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-31 Thread Robert Haas
On Fri, May 27, 2016 at 12:37 AM, Andres Freund  wrote:
> I don't think the situation is quite that simple. By *disabling* backend 
> flushing it's also easy to see massive performance regressions.  In 
> situations where shared buffers was configured appropriately for the workload 
> (not the case here IIRC).

On what kind of workload does setting backend_flush_after=0 represent
a large regression vs. the default settings?

I think we have to consider that pgbench and parallel copy are pretty
common things to want to do, and a non-zero default setting hurts
those workloads a LOT.  I have a really hard time believing that the
benefits on other workloads are large enough to compensate for the
slowdowns we're seeing here.  We have nobody writing in to say that
backend_flush_after>0 is making things way better for them, and
Ashutosh and I have independently hit massive slowdowns on unrelated
workloads.  We weren't looking for slowdowns in this patch.  We were
trying to measure other stuff, and ended up tracing the behavior back
to this patch.  That really, really suggests that other people will
have similar experiences.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan
On Tue, May 31, 2016 at 12:55 PM, Alvaro Herrera
 wrote:
> Agreed -- things like pglogical and BDR rely on background workers to do
> their jobs.  Many other users of bgworkers have popped up, so I think
> it'd be a bad idea if parallel queries are able to monopolize all the
> available slots.

That never occurred to me. I'd say it could be a serious problem, that
should be fixed promptly.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Alvaro Herrera
Robert Haas wrote:

> Also, I think that we might actually want to add an
> additional GUC to prevent the parallel query system from consuming the
> entire pool of processes established by max_worker_processes.  If
> you're doing anything else with worker processes on your system, you
> might well want to say, well, it's OK to have up to 20 worker
> processes, but at most 10 of those can be used for parallel queries,
> so that the other 10 are guaranteed to be available for whatever other
> stuff I'm running that uses the background process facility.  It's
> worth remembering that the background worker stuff was originally
> invented by Alvaro to allow users to run daemons, not for parallel
> query.

Agreed -- things like pglogical and BDR rely on background workers to do
their jobs.  Many other users of bgworkers have popped up, so I think
it'd be a bad idea if parallel queries are able to monopolize all the
available slots.

> So I think in the long run we should have three limits:
> 
> 1. Cluster-wide limit on number of worker processes for all purposes
> (currently, max_worker_processes).
> 
> 2. Cluster-wide limit on number of worker processes for parallelism
> (don't have this yet).
> 
> 3. Per-operation limit on number of worker processes for parallelism
> (currently, max_parallel_degree).
> 
> Whatever we rename, there needs to be enough semantic space between #1
> and #3 to allow for the possibility - I think the very likely
> possibility - that we will eventually also want #2.

max_background_workers sounds fine to me for #1, and I propose to add #2
in 9.6 rather than wait.  max_total_parallel_query_workers ?  I already
presented my proposal for #3 which, as you noted, nobody endorsed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet

2016-05-31 Thread Nikolay Shaplov
В письме от 31 мая 2016 15:38:38 пользователь Robert Haas написал:

> >>> 99% of the time, you'd be right.  But this is an unusual case, for the
> >>> reasons I mentioned before.
> >> 
> >> I tend to agree with Nikolay.  I can't see much upside in making this
> >> change.  At best, nothing will break.  At worst, something will break.
> >> But how do we actually come out ahead?
> > 
> > We come out ahead by not having to make the documentation more confusing.
> > 
> > Basically, we have the opportunity to fix an ancient mistake here at
> > very low cost.  I do not think that doubling down on the mistake is
> > a better answer.
> 
> I'm not convinced, but we don't have to agree on everything...
I am not convinced too. But I will not argue hard for the patch as far as my 
main goal was to report inconsistency. Through the I consider Tom's proposal 
quite strange...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COMMENT ON, psql and access methods

2016-05-31 Thread Robert Haas
On Fri, May 27, 2016 at 7:53 AM, Michael Paquier
 wrote:
> As far as I can see, COMMENT ON has no support for access methods.
> Wouldn't we want to add it as it is created by a command? On top of
> that, perhaps we could have a backslash command in psql to list the
> supported access methods, like \dam[S]? The system methods would be in
> this case all the in-core ones.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston
On Tue, May 31, 2016 at 3:35 PM, Robert Haas  wrote:

> On Tue, May 31, 2016 at 3:19 PM, Tom Lane  wrote:
> > I wrote:
> >> At the risk of opening another can of worms, what about renaming
> >> max_worker_processes as well?  It would be a good thing if that
> >> had "cluster" in it somewhere, or something that indicates it's a
> >> system-wide value not a per-session value.  "max_workers_per_cluster"
> >> would answer, though I'm not in love with it particularly.
> >
> > Actually, after a bit more thought, maybe "max_background_workers" would
> > be a good name?  That matches its existing documentation more closely:
> >
> >  Sets the maximum number of background processes that the system
> >  can support.  This parameter can only be set at server start.
> The
> >  default is 8.
> >
> > However, that would still leave us with max_background_workers as the
> > cluster-wide limit and max_parallel_workers as the per-query-node limit.
> > That naming isn't doing all that much to clarify the distinction.
>
> It sure isn't.  Also, I think that we might actually want to add an
> additional GUC to prevent the parallel query system from consuming the
> entire pool of processes established by max_worker_processes.


​My mind started going here too...though the question is whether you need a
reserved pool or whether we apply some algorithm if (max_parallel +
max_other > max_cluster).  That algorithm could be configurable (i.e.,
target ratio  20:10 where 20+10 == max_cluster).

David J.


Re: [HACKERS] [PATCH][Documination] Add optional USING keyword before opclass name in INSERT statemet

2016-05-31 Thread Robert Haas
On Tue, May 31, 2016 at 12:34 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, May 26, 2016 at 3:28 PM, Tom Lane  wrote:
>>> 99% of the time, you'd be right.  But this is an unusual case, for the
>>> reasons I mentioned before.
>
>> I tend to agree with Nikolay.  I can't see much upside in making this
>> change.  At best, nothing will break.  At worst, something will break.
>> But how do we actually come out ahead?
>
> We come out ahead by not having to make the documentation more confusing.
>
> Basically, we have the opportunity to fix an ancient mistake here at
> very low cost.  I do not think that doubling down on the mistake is
> a better answer.

I'm not convinced, but we don't have to agree on everything...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston
On Tue, May 31, 2016 at 3:19 PM, Tom Lane  wrote:

> I wrote:
> > At the risk of opening another can of worms, what about renaming
> > max_worker_processes as well?  It would be a good thing if that
> > had "cluster" in it somewhere, or something that indicates it's a
> > system-wide value not a per-session value.  "max_workers_per_cluster"
> > would answer, though I'm not in love with it particularly.
>
> Actually, after a bit more thought, maybe "max_background_workers" would
> be a good name?  That matches its existing documentation more closely:
>
>  Sets the maximum number of background processes that the system
>  can support.  This parameter can only be set at server start.  The
>  default is 8.
>
> However, that would still leave us with max_background_workers as the
> cluster-wide limit and max_parallel_workers as the per-query-node limit.
> That naming isn't doing all that much to clarify the distinction.
>
>
​Node execution is serial, right?​

​I know work_mem has a provision about possibly using multiples of the
maximum in a single query...

​The per-node dynamic does seem important to at least well-document so that
when users see 3 workers on one explain node and 2 on another they aren't
left scratching their heads.  We don't reserve a set number of workers
per-statement but rather they are retrieved as needed during query
execution.

I think I see the same amount of added value in tacking on 'per_cluster" as
you do in adding "additional" to get "max_additional_parallel_workers" -
though further refinement of trying to actively NOT consider a leader a
worker would support the omission of addition[al]...

If we left max_worker_processes as defining only the maximum allowed
non-parallel workers and added a new "max_cluster_workers" that would cap
the combination of both "max_worker_processes" and "max_parallel_workers"...

David J.​


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Robert Haas
On Tue, May 31, 2016 at 3:19 PM, Tom Lane  wrote:
> I wrote:
>> At the risk of opening another can of worms, what about renaming
>> max_worker_processes as well?  It would be a good thing if that
>> had "cluster" in it somewhere, or something that indicates it's a
>> system-wide value not a per-session value.  "max_workers_per_cluster"
>> would answer, though I'm not in love with it particularly.
>
> Actually, after a bit more thought, maybe "max_background_workers" would
> be a good name?  That matches its existing documentation more closely:
>
>  Sets the maximum number of background processes that the system
>  can support.  This parameter can only be set at server start.  The
>  default is 8.
>
> However, that would still leave us with max_background_workers as the
> cluster-wide limit and max_parallel_workers as the per-query-node limit.
> That naming isn't doing all that much to clarify the distinction.

It sure isn't.  Also, I think that we might actually want to add an
additional GUC to prevent the parallel query system from consuming the
entire pool of processes established by max_worker_processes.  If
you're doing anything else with worker processes on your system, you
might well want to say, well, it's OK to have up to 20 worker
processes, but at most 10 of those can be used for parallel queries,
so that the other 10 are guaranteed to be available for whatever other
stuff I'm running that uses the background process facility.  It's
worth remembering that the background worker stuff was originally
invented by Alvaro to allow users to run daemons, not for parallel
query.  So I think in the long run we should have three limits:

1. Cluster-wide limit on number of worker processes for all purposes
(currently, max_worker_processes).

2. Cluster-wide limit on number of worker processes for parallelism
(don't have this yet).

3. Per-operation limit on number of worker processes for parallelism
(currently, max_parallel_degree).

Whatever we rename, there needs to be enough semantic space between #1
and #3 to allow for the possibility - I think the very likely
possibility - that we will eventually also want #2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
I wrote:
> At the risk of opening another can of worms, what about renaming
> max_worker_processes as well?  It would be a good thing if that
> had "cluster" in it somewhere, or something that indicates it's a
> system-wide value not a per-session value.  "max_workers_per_cluster"
> would answer, though I'm not in love with it particularly.

Actually, after a bit more thought, maybe "max_background_workers" would
be a good name?  That matches its existing documentation more closely:

 Sets the maximum number of background processes that the system
 can support.  This parameter can only be set at server start.  The
 default is 8.

However, that would still leave us with max_background_workers as the
cluster-wide limit and max_parallel_workers as the per-query-node limit.
That naming isn't doing all that much to clarify the distinction.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tom Lane
Tomas Vondra  writes:
> I've checked how this worked in 9.2 (before the 9.3 patch that split the 
> file per db), and back then last_statsrequest (transformed to 
> request_time) was used to decide whether we need to write something. But 
> now we do that by simply checking whether the list is empty.

Right.  In effect, 9.3 moved the decision about "do we need to write stats
because of this request" from the main loop to pgstat_recv_inquiry()
... but we forgot to incorporate any check for whether the request was
already satisfied into pgstat_recv_inquiry().  We can do that, though,
as per either of the patches under discussion --- and once we do so,
maintaining DBWriteRequest.request_time seems a bit pointless.

It's conceivable that we'd try to implement merging of close-together
requests in a way that would take account of how far back the oldest
unsatisfied request is.  But I think that a global oldest-request time
would be sufficient for that; we don't need to track it per-database.
In any case, it's hard to see exactly how to make that work without
putting a gettimeofday() call into the inner message handling loop,
which I believe we won't want to do on performance grounds.  The previous
speculation about doing writes every N messages or when we have no input
to process seems more likely to lead to a useful answer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
Josh berkus  writes:
> On 05/31/2016 11:10 AM, Tom Lane wrote:
>> The 9.6 open-items list cites
>> https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5b...@alap3.anarazel.de

> Looks like we didn't decide for the release, just the beta.

Indeed.  I think it's premature to have this discussion.  The plan
was to evaluate near the end of beta, when we (hopefully) have a
better feeling for how buggy parallel query is likely to be.

> Also, defaulting to off lets users make more use of the parallel_degree
> table attribute to just enable parallelism on select tables.

Well, that's an interesting point.  The current coding is that
parallel_degree is an upper limit on per-table workers, and
max_parallel_degree also limits it.  So if you want parallel scans only on
a small set of tables, parallel_degree is not an especially convenient way
to get to that.  Whether we measure it in workers or cores doesn't change
this conclusion.

It might be worth reconsidering what per-table knobs we should provide
exactly, but that's orthogonal to the main point under discussion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename synchronous_standby_names?

2016-05-31 Thread David G. Johnston
On Tue, May 31, 2016 at 2:19 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 5/31/16 1:47 PM, Jaime Casanova wrote:
>
>> Are we going to change synchronous_standby_names? Certainly the GUC is
>> not *only* a list of names anymore.
>>
>> synchronous_standby_config?
>> synchronous_standbys (adjust to correct english if necesary)?
>>
>
> If the existing values are still going to be accepted, then I would leave
> it as is.


​+1
​
David J.


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tomas Vondra

On 05/31/2016 07:24 PM, Tom Lane wrote:

Tomas Vondra  writes:

On 05/31/2016 06:59 PM, Tom Lane wrote:

I'm confused here --- are you speaking of having removed
if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;
? That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've
seen more than one request. But since req->request_time isn't
actually being used anywhere, it's useless code.



Ah, you're right. I've made the mistake of writing the e-mail before
drinking any coffee today, and I got distracted by the comment change.



I reformatted the actual check for clock skew, but I do not think I
changed its behavior.



I'm not sure it does not change the behavior, though. request_time only
became unused after you removed the two places that set the value (one
of them in the clock skew check).


Well, it's unused in the sense that the if-test quoted above is the only
place in HEAD that examines the value of request_time.  And since that
if-test only controls whether we change the value, and not whether we
proceed to make the clock skew check, I don't see how it's related
to clock skew or indeed anything else at all.


I see, in that case it indeed is useless.

I've checked how this worked in 9.2 (before the 9.3 patch that split the 
file per db), and back then last_statsrequest (transformed to 
request_time) was used to decide whether we need to write something. But 
now we do that by simply checking whether the list is empty.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston
On Tue, May 31, 2016 at 2:37 PM, Josh berkus  wrote:

> On 05/31/2016 11:27 AM, Peter Geoghegan wrote:
> > On Tue, May 31, 2016 at 11:22 AM, Josh berkus  wrote:
> >>> I think we can hope that developers are going to be less confused about
> >>> that than users.
> >>
> >> Makes sense.
> >
> > Maybe EXPLAIN doesn't have to use the term parallel worker at all. It
> > can instead use a slightly broader terminology, possibly including the
> > term "core".
>
2. max_parallel_X and EXPLAIN output
>

​This is helped greatly if we can say:

Leader: xxx
Worker 0: yyy
Worker 1: zzz​

​David Rowley's  response on April 30th provides an example explain output
should anyone want to an easy source to look at.

​
https://www.postgresql.org/message-id/cakjs1f8_3y6xhp2xp5yovpetxmucmiwp6zwr_eterqy2fcn...@mail.gmail.com

Getting across the point that session process are not worker processes
seems doable.  The implementation detail that they do indeed perform work
can be made evident in the explain but otherwise left an implementation
detail.

David J.


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
Josh berkus  writes:
> On 05/31/2016 11:29 AM, Tom Lane wrote:
>> Josh berkus  writes:
>>> One more consistency question: what's the effect of running out of
>>> max_parallel_workers?

>> ITYM max_worker_processes (ie, the cluster-wide pool size)?

> Yes.  Sorry for contributing to the confusion.  Too many
> similar-sounding parameter names.

Yeah, I was thinking the same thing while preparing my docs patch.
At the risk of opening another can of worms, what about renaming
max_worker_processes as well?  It would be a good thing if that
had "cluster" in it somewhere, or something that indicates it's a
system-wide value not a per-session value.  "max_workers_per_cluster"
would answer, though I'm not in love with it particularly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?

2016-05-31 Thread Josh berkus
On 05/31/2016 11:10 AM, Tom Lane wrote:
> Josh berkus  writes:
>> Is there a thread on how we determined this default of 2?  I can't find
>> one under likely search terms.
> 
> The 9.6 open-items list cites
> 
> https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5b...@alap3.anarazel.de

Looks like we didn't decide for the release, just the beta.

I can see two ways to go for the final release:

1. Ship with max_parallel_X = 2 (or similar) and a very low
max_worker_processes (e.g. 4).

2. Ship with max_parallel_X = 1 (or 0, depending), and with a generous
max_worker_processes (e.g. 16).

Argument in favor of (1): we want parallelism to work out of the gate
for users running on low-concurrency systems.  These settings would let
some parallelism happen immediately, without overwhelming a 4-to-8-core
system/vm.  Tuning for the user would then be fairly easy, as we could
just tell them "set max_worker_processes to half the number of cores you
have".

Argument in favor of (2): parallelism is potentially risky for .0, and
as a result we want it disabled unless users choose to enable it.
Also, defaulting to off lets users make more use of the parallel_degree
table attribute to just enable parallelism on select tables.

Thoughts?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus
On 05/31/2016 11:29 AM, Tom Lane wrote:
> Josh berkus  writes:
>> One more consistency question: what's the effect of running out of
>> max_parallel_workers?
> 
> ITYM max_worker_processes (ie, the cluster-wide pool size)?

Yes.  Sorry for contributing to the confusion.  Too many
similar-sounding parameter names.


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus
On 05/31/2016 11:27 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 11:22 AM, Josh berkus  wrote:
>>> I think we can hope that developers are going to be less confused about
>>> that than users.
>>
>> Makes sense.
> 
> Maybe EXPLAIN doesn't have to use the term parallel worker at all. It
> can instead use a slightly broader terminology, possibly including the
> term "core".
> 
>> One more consistency question: what's the effect of running out of
>> max_parallel_workers?
>>
>> That is, say max_parallel_workers is set to 10, and 8 are already
>> allocated.  If I ask for max_parallel_X = 4, how many cores to I use?
> 
> Well, it depends on the planner, of course. But when constrained only
> by the availability of worker processes, then your example could use 3
> cores -- the 2 remaining parallel workers, plus the leader itself.
> 
>> Presumably the leader isn't counted towards max_parallel_workers?
 (oops, I meant max_worker_processes above)


So, there's six things we'd like to make consistent to limit user confusion:

1. max_parallel_X and number of cores used

2. max_parallel_X and EXPLAIN output

3. max_parallel_X and gatekeeping via max_worker_processes

4. max_parallel_X and parallelism of operations (i.e. 2 == 2 parallel
scanners)

5. settings in other similar databases (does someone have specific
citations for this)?

6. consistency with other GUCs (0 or -1 to disable settings)

We can't actually make all five things consistent, as some of them are
contradictory; for example, (1) and (3) cannot be reconciled.  So we
need to evaluate which things are more likely to cause confusion.

My general evaluation would be to make the GUC be the number of
additional workers used, not the total number (including the leader).
From my perspective, that makes (2), (3) and (6) consistent, and (4)
cannot be made consistent because different types of parallel nodes
behave different ways (i.e, some are parallel with only 1 additional
worker and some are not).

However, I'm resigned to the fact that user confusion is inevitable
whichever way we choose, and could be persuaded the other way.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
Josh berkus  writes:
> One more consistency question: what's the effect of running out of
> max_parallel_workers?

ITYM max_worker_processes (ie, the cluster-wide pool size)?

> That is, say max_parallel_workers is set to 10, and 8 are already
> allocated.  If I ask for max_parallel_X = 4, how many cores to I use?

One of my reasons for liking max_parallel_workers is that you can sensibly
compare it to max_worker_processes to figure out how many workers you're
likely to get.  If you think in terms of "degree" it takes some additional
mental arithmetic to understand what will happen.

> Presumably the leader isn't counted towards max_parallel_workers?

Not under what I'm proposing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston
On Tue, May 31, 2016 at 2:22 PM, Josh berkus  wrote:

> On 05/31/2016 11:17 AM, Peter Eisentraut wrote:
> > On 5/31/16 2:02 PM, Josh berkus wrote:
> >> I get where you're coming from, but I think Haas's query plan output is
> >> going to show us the confusion we're going to get.  So we need to either
> >> change the parameter, the explain output, or brace ourselves for endless
> >> repeated questions.
> >
> > Changing the explain output doesn't sound so bad to me.
> >
> > The users' problem is that the parameter setting ought to match the
> > EXPLAIN output.
> >
> > The developers' problem is that the EXPLAIN output actually corresponds
> > to leader + (N-1) workers internally.
> >
> > I think we can hope that developers are going to be less confused about
> > that than users.
>
> Makes sense.
>
> One more consistency question: what's the effect of running out of
> max_parallel_workers?
>
> That is, say max_parallel_workers is set to 10, and 8 are already
> allocated.  If I ask for max_parallel_X = 4, how many cores to I use?
>
> Presumably the leader isn't counted towards max_parallel_workers?
>

​You'd have three O/S processes - the one dedicated to your session and
you'd pick up two additional processes from the worker pool to assist.

How the O/S assigns those to cores is outside PostgreSQL's jurisdiction.

David J.
​


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan
On Tue, May 31, 2016 at 11:22 AM, Josh berkus  wrote:
>> I think we can hope that developers are going to be less confused about
>> that than users.
>
> Makes sense.

Maybe EXPLAIN doesn't have to use the term parallel worker at all. It
can instead use a slightly broader terminology, possibly including the
term "core".

> One more consistency question: what's the effect of running out of
> max_parallel_workers?
>
> That is, say max_parallel_workers is set to 10, and 8 are already
> allocated.  If I ask for max_parallel_X = 4, how many cores to I use?

Well, it depends on the planner, of course. But when constrained only
by the availability of worker processes, then your example could use 3
cores -- the 2 remaining parallel workers, plus the leader itself.

> Presumably the leader isn't counted towards max_parallel_workers?

Exactly.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus
On 05/31/2016 11:17 AM, Peter Eisentraut wrote:
> On 5/31/16 2:02 PM, Josh berkus wrote:
>> I get where you're coming from, but I think Haas's query plan output is
>> going to show us the confusion we're going to get.  So we need to either
>> change the parameter, the explain output, or brace ourselves for endless
>> repeated questions.
> 
> Changing the explain output doesn't sound so bad to me.
> 
> The users' problem is that the parameter setting ought to match the
> EXPLAIN output.
> 
> The developers' problem is that the EXPLAIN output actually corresponds
> to leader + (N-1) workers internally.
> 
> I think we can hope that developers are going to be less confused about
> that than users.

Makes sense.

One more consistency question: what's the effect of running out of
max_parallel_workers?

That is, say max_parallel_workers is set to 10, and 8 are already
allocated.  If I ask for max_parallel_X = 4, how many cores to I use?

Presumably the leader isn't counted towards max_parallel_workers?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
"David G. Johnston"  writes:
> On Tuesday, May 31, 2016, Tom Lane  wrote:
>> I really think that a GUC named "max_parallel_workers", which in fact
>> limits the number of workers and not something else, is the way to go.

> What is your opinion on the internal name for this?  Leave it as "degree"?

If that proposal is accepted, I am willing to go through the code and
rename everything internally to match (and, in fact, would insist on
that happening).  But I think the current debate is primarily around
how we present this to users, so talking about the documentation seems
sufficient for the moment.

> The change from two to three for the default sneaked in there...

Well, that would be the same effective value under Robert's patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename synchronous_standby_names?

2016-05-31 Thread Peter Eisentraut

On 5/31/16 1:47 PM, Jaime Casanova wrote:

Are we going to change synchronous_standby_names? Certainly the GUC is
not *only* a list of names anymore.

synchronous_standby_config?
synchronous_standbys (adjust to correct english if necesary)?


If the existing values are still going to be accepted, then I would 
leave it as is.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan
On Tue, May 31, 2016 at 11:17 AM, Peter Eisentraut
 wrote:
> Changing the explain output doesn't sound so bad to me.
>
> The users' problem is that the parameter setting ought to match the EXPLAIN
> output.
>
> The developers' problem is that the EXPLAIN output actually corresponds to
> leader + (N-1) workers internally.
>
> I think we can hope that developers are going to be less confused about that
> than users.

+1.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
"David G. Johnston"  writes:
> On Tuesday, May 31, 2016, Tom Lane  wrote:
>>> I really think that a GUC named "max_parallel_workers", which in fact
>>> limits the number of workers and not something else, is the way to go.

> If going this route I'd still rather add the word "assisting"
> or "additional" directly into the guc name so the need to read the docs to
> determine inclusive or exclusive of the leader is alleviated.

Dunno, "max_assisting_parallel_workers" seems awfully wordy and not
remarkably clearer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston
On Tuesday, May 31, 2016, Peter Geoghegan  wrote:

> On Tue, May 31, 2016 at 11:02 AM, Josh berkus  > wrote:
> > I get where you're coming from, but I think Haas's query plan output is
> > going to show us the confusion we're going to get.  So we need to either
> > change the parameter, the explain output, or brace ourselves for endless
> > repeated questions.
>
> I get where you're coming from, too -- I think our positions are very
> close.
>
> The only reason I favor defining parallel_degree = 1, rather than
> doing what Tom proposes to do with that patch, is that we might as
> well use the prevailing terminology used by SQL Server and Oracle (as
> long as we match those semantics). Also, I think that number of cores
> used is a more important consideration for users than the number of
> workers used.
>
> Users will definitely be confused about workers used vs. cores used,
> but I don't think that any proposal fixes that.
>
>
Ideally each worker gets its own core - though obviously physical
limitations apply.  If that's not the case then, yes, at least one user is
confused...

David J.


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Eisentraut

On 5/31/16 2:02 PM, Josh berkus wrote:

I get where you're coming from, but I think Haas's query plan output is
going to show us the confusion we're going to get.  So we need to either
change the parameter, the explain output, or brace ourselves for endless
repeated questions.


Changing the explain output doesn't sound so bad to me.

The users' problem is that the parameter setting ought to match the 
EXPLAIN output.


The developers' problem is that the EXPLAIN output actually corresponds 
to leader + (N-1) workers internally.


I think we can hope that developers are going to be less confused about 
that than users.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston
On Tuesday, May 31, 2016, Tom Lane  wrote:

> I wrote:
> > I really think that a GUC named "max_parallel_workers", which in fact
> > limits the number of workers and not something else, is the way to go.
>
> To be concrete, I suggest comparing the attached documentation patch
> with Robert's.  Which one is more understandable?
>
> (I have not bothered preparing code changes to go with this, but am
> willing to do so.)
>
>
If going this route I'd still rather add the word "assisting"
or "additional" directly into the guc name so the need to read the docs to
determine inclusive or exclusive of the leader is alleviated.

Robert that it would be confusing but it cannot be worse than this...

David J.


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan
On Tue, May 31, 2016 at 11:09 AM, Peter Geoghegan  wrote:
> The only reason I favor defining parallel_degree = 1

I meant "redefining max_parallel_degree =1 to effectively disable
parallel query", of course.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?

2016-05-31 Thread Joshua D. Drake

On 05/31/2016 11:05 AM, Josh berkus wrote:

On 05/31/2016 11:00 AM, Tom Lane wrote:

!  If this occurs, the plan will run with fewer workers than expected,
!  which may be inefficient.  The default value is 2.  Setting this
!  value to 0 disables parallel query execution.


Is there a thread on how we determined this default of 2?  I can't find
one under likely search terms.

I'm concerned about the effect of overallocating parallel workers on
systems which are already running out of cores (e.g. AWS instances), and
run with default settings.  Possibly max_parallel_workers takes care of
this, which is why I want to understand the logic here.



I don't remember the thread but I seem to recall that the default of 2 
is explicitly during Beta, RC etc... so that we can see what happens. 
Robert could speak better to that.


JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
Josh berkus  writes:
> Is there a thread on how we determined this default of 2?  I can't find
> one under likely search terms.

The 9.6 open-items list cites

https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5b...@alap3.anarazel.de

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan
On Tue, May 31, 2016 at 11:02 AM, Josh berkus  wrote:
> I get where you're coming from, but I think Haas's query plan output is
> going to show us the confusion we're going to get.  So we need to either
> change the parameter, the explain output, or brace ourselves for endless
> repeated questions.

I get where you're coming from, too -- I think our positions are very close.

The only reason I favor defining parallel_degree = 1, rather than
doing what Tom proposes to do with that patch, is that we might as
well use the prevailing terminology used by SQL Server and Oracle (as
long as we match those semantics). Also, I think that number of cores
used is a more important consideration for users than the number of
workers used.

Users will definitely be confused about workers used vs. cores used,
but I don't think that any proposal fixes that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
Peter Geoghegan  writes:
> Even when the leader is consuming input from workers, that's still perhaps
> pegging one CPU core. So, it doesn't really invalidate what I said about
> the number of cores being the primary consideration.

Agreed, but if we think that people need to be thinking in those terms,
maybe the parameter should be "max_parallel_cores".

The alternate docs patch I just posted tries to deal with this by
describing max_parallel_workers as being the max number of worker
processes used to "assist" a parallel query.  That was terminology
already being used in one place, but not consistently.  If we use it
consistently, I think it would be sufficient to remind people that
they need to figure on one more core for the leader.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread David G. Johnston
On Tuesday, May 31, 2016, Tom Lane  wrote:

> Josh berkus > writes:
> > On 05/31/2016 10:16 AM, Peter Geoghegan wrote:
> >> But the distinction between parallel workers and backends that can
> >> participate in parallel query does need to be user-visible. Worker
> >> processes are a commodity (i.e. the user must consider
> >> max_worker_processes).
>
> > It's still WAY simpler to understand "max_parallel is the number of
> > parallel workers I requested".
>
> > Any system where you set it to 2 and get only 1 worker on an idle system
> > is going to cause endless queries on the mailing lists.
>
> I really think that a GUC named "max_parallel_workers", which in fact
> limits the number of workers and not something else, is the way to go.
>
>
What is your opinion on the internal name for this?  Leave it as "degree"?
Skimming the patch the number of times that "parallel_degree - 1" is used
would concern me.

I would also probably reword "Parallel workers are taken from the pool...":
changing parallel with the word additional.  "Additional workers are taken
from the pool...".

I'm tending to think we are using the word "parallel" too often and that
selective removal would be better.  "Sets the maximum number of workers
(i.e., processes) that can be used to perform an operation."  An
introductory section on parallelism can lay out the framework and the
concept of parallelism.  Sentences like the one above can assume that the
reader understands that they work in parallel if the operation allows and
there are workers available.  The same observation can be applied to
"leader".  When more than one process is involved a leader is chosen.  This
is general background to cover upfront.  Elsewhere just talk of workers and
let the leader be implied.  No need to keep repeating "including the
leader".  In many ways leader is an implementation detail the user doesn't
care about.  The only impact is that 2 is not twice as good as 1 due to
overhead - whatever form that overhead takes.

The change from two to three for the default sneaked in there...

David J.


Re: [HACKERS] Logic behind parallel default? WAS: Rename max_parallel_degree?

2016-05-31 Thread Josh berkus
On 05/31/2016 11:00 AM, Tom Lane wrote:
> !  If this occurs, the plan will run with fewer workers than expected,
> !  which may be inefficient.  The default value is 2.  Setting this
> !  value to 0 disables parallel query execution.

Is there a thread on how we determined this default of 2?  I can't find
one under likely search terms.

I'm concerned about the effect of overallocating parallel workers on
systems which are already running out of cores (e.g. AWS instances), and
run with default settings.  Possibly max_parallel_workers takes care of
this, which is why I want to understand the logic here.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename synchronous_standby_names?

2016-05-31 Thread Tom Lane
Jaime Casanova  writes:
> Are we going to change synchronous_standby_names? Certainly the GUC is
> not *only* a list of names anymore.

> synchronous_standby_config?
> synchronous_standbys (adjust to correct english if necesary)?

I could get behind renaming it to synchronous_standby_config ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus
On 05/31/2016 10:51 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 10:46 AM, Josh berkus  wrote:
>> In parallel seq scan and join, do the "masters" behave as workers as well?
> 
> It depends. They will if they can. If the parallel seq scan leader
> isn't getting enough work to do from workers (enough tuples to process
> from the shared memory queue), it will start acting as a worker fairly 
> quickly.
> With parallel aggregate, and some other cases, that will always happen.
> 
> Even when the leader is consuming input from workers, that's still perhaps
> pegging one CPU core. So, it doesn't really invalidate what I said about
> the number of cores being the primary consideration.
> 

I get where you're coming from, but I think Haas's query plan output is
going to show us the confusion we're going to get.  So we need to either
change the parameter, the explain output, or brace ourselves for endless
repeated questions.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
I wrote:
> I really think that a GUC named "max_parallel_workers", which in fact
> limits the number of workers and not something else, is the way to go.

To be concrete, I suggest comparing the attached documentation patch
with Robert's.  Which one is more understandable?

(I have not bothered preparing code changes to go with this, but am
willing to do so.)

regards, tom lane

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 72b5920..fccde28 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** include_dir 'conf.d'
*** 1998,2019 
 

  
!   
!max_parallel_degree (integer)
 
! max_parallel_degree configuration parameter
 
 
 
  
!  Sets the maximum number of workers that can be started for an
!  individual parallel operation.  Parallel workers are taken from the
!  pool of processes established by
!  .  Note that the requested
!  number of workers may not actually be available at runtime.  If this
!  occurs, the plan will run with fewer workers than expected, which may
!  be inefficient.  The default value is 2.  Setting this value to 0
!  disables parallel query execution.
  
 

--- 1998,2020 
 

  
!   
!max_parallel_workers (integer)
 
! max_parallel_workers configuration parameter
 
 
 
  
!  Sets the maximum number of worker processes that can be used to
!  assist an individual parallelizable operation.  Parallel workers are
!  taken from the cluster-wide pool of processes established by
!  .  Depending on the size of
!  the pool and the number of other parallel queries active, the
!  requested number of workers may not actually be available at runtime.
!  If this occurs, the plan will run with fewer workers than expected,
!  which may be inefficient.  The default value is 2.  Setting this
!  value to 0 disables parallel query execution.
  
 

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index d1807ed..d73cf39 100644
*** a/doc/src/sgml/ref/create_table.sgml
--- b/doc/src/sgml/ref/create_table.sgml
*** CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY 
*** 909,922 
 
  
 
! parallel_degree (integer)
  
   
!   The parallel degree for a table is the number of workers that should
!   be used to assist a parallel scan of that table.  If not set, the
!   system will determine a value based on the relation size.  The actual
!   number of workers chosen by the planner may be less, for example due to
!   the setting of .
   
  
 
--- 909,923 
 
  
 
! parallel_workers (integer)
  
   
!   This setting limits the number of parallel worker processes that should
!   be used to assist a parallel scan of this table.  If not set, the
!   planner will select a value based on the relation size.  The actual
!   number of workers chosen by the planner may be less, for example
!   because it is also constrained by
!   ; it will not be more.
   
  
 
diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index 38e1967..3d78b00 100644
*** a/doc/src/sgml/release-9.6.sgml
--- b/doc/src/sgml/release-9.6.sgml
***
*** 153,159 
 
  Use of parallel query execution can be controlled through the new
  configuration parameters
! ,
  ,
  , and
  .
--- 153,159 
 
  Use of parallel query execution can be controlled through the new
  configuration parameters
! ,
  ,
  , and
  .

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan
On Tue, May 31, 2016 at 10:46 AM, Josh berkus  wrote:
> In parallel seq scan and join, do the "masters" behave as workers as well?

It depends. They will if they can. If the parallel seq scan leader
isn't getting enough work to do from workers (enough tuples to process
from the shared memory queue), it will start acting as a worker fairly quickly.
With parallel aggregate, and some other cases, that will always happen.

Even when the leader is consuming input from workers, that's still perhaps
pegging one CPU core. So, it doesn't really invalidate what I said about
the number of cores being the primary consideration.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Rename synchronous_standby_names?

2016-05-31 Thread Jaime Casanova
Hi,

Are we going to change synchronous_standby_names? Certainly the GUC is
not *only* a list of names anymore.

synchronous_standby_config?
synchronous_standbys (adjust to correct english if necesary)?

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus
On 05/31/2016 10:38 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 10:23 AM, Josh berkus  wrote:
>> It's still WAY simpler to understand "max_parallel is the number of
>> parallel workers I requested".
> 
> (Sorry Josh, somehow hit reply, not reply-all)
> 
> Yes, it is. But as long as parallel workers are not really that
> distinct to the leader-as-worker when executing a parallel query, then
> you have another consideration. Which is that you need to care about
> how many cores your query uses first and foremost, and not the number
> of parallel workers used. I don't think that having only one worker
> will cause too much confusion, because users will trust that we won't
> allow something that simply makes no sense to happen.
> 
> In my parallel create index patch, the leader participates as a worker
> to scan and sort runs. It's identical to a worker, practically
> speaking, at least until time comes to merge those runs. Similarly,
> parallel aggregate does not really have much for the leader process to
> do other than act as a worker.

In parallel seq scan and join, do the "masters" behave as workers as well?


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Joshua D. Drake

On 05/31/2016 10:30 AM, Tom Lane wrote:

Josh berkus  writes:

On 05/31/2016 10:16 AM, Peter Geoghegan wrote:

But the distinction between parallel workers and backends that can
participate in parallel query does need to be user-visible. Worker
processes are a commodity (i.e. the user must consider
max_worker_processes).



It's still WAY simpler to understand "max_parallel is the number of
parallel workers I requested".



Any system where you set it to 2 and get only 1 worker on an idle system
is going to cause endless queries on the mailing lists.


I really think that a GUC named "max_parallel_workers", which in fact
limits the number of workers and not something else, is the way to go.



I agree with Tom here. If we are being pedantic for the sake of being 
pedantic we are just causing frustration. The term max_parallel_workers 
is simple, easy to understand and accurate enough.


Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan
On Tue, May 31, 2016 at 10:23 AM, Josh berkus  wrote:
> It's still WAY simpler to understand "max_parallel is the number of
> parallel workers I requested".

(Sorry Josh, somehow hit reply, not reply-all)

Yes, it is. But as long as parallel workers are not really that
distinct to the leader-as-worker when executing a parallel query, then
you have another consideration. Which is that you need to care about
how many cores your query uses first and foremost, and not the number
of parallel workers used. I don't think that having only one worker
will cause too much confusion, because users will trust that we won't
allow something that simply makes no sense to happen.

In my parallel create index patch, the leader participates as a worker
to scan and sort runs. It's identical to a worker, practically
speaking, at least until time comes to merge those runs. Similarly,
parallel aggregate does not really have much for the leader process to
do other than act as a worker.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
Josh berkus  writes:
> On 05/31/2016 10:16 AM, Peter Geoghegan wrote:
>> But the distinction between parallel workers and backends that can
>> participate in parallel query does need to be user-visible. Worker
>> processes are a commodity (i.e. the user must consider
>> max_worker_processes).

> It's still WAY simpler to understand "max_parallel is the number of
> parallel workers I requested".

> Any system where you set it to 2 and get only 1 worker on an idle system
> is going to cause endless queries on the mailing lists.

I really think that a GUC named "max_parallel_workers", which in fact
limits the number of workers and not something else, is the way to go.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tom Lane
Tomas Vondra  writes:
> On 05/31/2016 06:59 PM, Tom Lane wrote:
>> I'm confused here --- are you speaking of having removed
>>  if (msg->cutoff_time > req->request_time)
>>  req->request_time = msg->cutoff_time;
>> ? That is not a check for clock skew, it's intending to be sure that
>> req->request_time reflects the latest request for this DB when we've
>> seen more than one request. But since req->request_time isn't
>> actually being used anywhere, it's useless code.

> Ah, you're right. I've made the mistake of writing the e-mail before 
> drinking any coffee today, and I got distracted by the comment change.

>> I reformatted the actual check for clock skew, but I do not think I
>> changed its behavior.

> I'm not sure it does not change the behavior, though. request_time only 
> became unused after you removed the two places that set the value (one 
> of them in the clock skew check).

Well, it's unused in the sense that the if-test quoted above is the only
place in HEAD that examines the value of request_time.  And since that
if-test only controls whether we change the value, and not whether we
proceed to make the clock skew check, I don't see how it's related
to clock skew or indeed anything else at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus
On 05/31/2016 10:16 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 10:10 AM, Josh berkus  wrote:
>> "max_parallel_degree is the amount of parallelism in the query, with the
>> understanding that the original parent process counts as 1, which means
>> that if you set it to 1 you get no parallelism, and if you want 4
>> parallel workers you need to set it to 5."
>>
>> Which one of those is going to require more explanations on -general and
>> -novice?  Bets?
>>
>> Let's not be complicated for the sake of being complicated.
> 
> But the distinction between parallel workers and backends that can
> participate in parallel query does need to be user-visible. Worker
> processes are a commodity (i.e. the user must consider
> max_worker_processes).

It's still WAY simpler to understand "max_parallel is the number of
parallel workers I requested".

Any system where you set it to 2 and get only 1 worker on an idle system
is going to cause endless queries on the mailing lists.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tomas Vondra

On 05/31/2016 06:59 PM, Tom Lane wrote:

Tomas Vondra  writes:

On 05/26/2016 10:10 PM, Tom Lane wrote:

I posted a patch at
https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do you
care to review it first?



I've reviewed the patch today, and it seems fine to me - correct and
achieving the same goal as the patch posted to this thread (plus fixing
the issue with shared catalogs and fixing many comments).


Thanks for reviewing!


FWIW do you still plan to back-patch this? Minimizing the amount of
changes was one of the things I had in mind when writing "my" patch,
which is why I ended up with parts that are less readable.


Yeah, I think it's a bug fix and should be back-patched.  I'm not in
favor of making things more complicated just to reduce the number of
lines a patch touches.


The one change I'm not quite sure about is the removal of clock skew
detection in pgstat_recv_inquiry(). You've removed the first check on
the inquiry, replacing it with this comment:
 It seems sufficient to check for clock skew once per write round.
But the first check was comparing msg/req, while the second check looks
at dbentry/cur_ts. I don't see how those two clock skew check are
redundant - if they are, the comment should explain that I guess.


I'm confused here --- are you speaking of having removed

if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;

? That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've
seen more than one request. But since req->request_time isn't
actually being used anywhere, it's useless code.


Ah, you're right. I've made the mistake of writing the e-mail before 
drinking any coffee today, and I got distracted by the comment change.



I reformatted the actual check for clock skew, but I do not think I
changed its behavior.


I'm not sure it does not change the behavior, though. request_time only 
became unused after you removed the two places that set the value (one 
of them in the clock skew check).


I'm not sure this is a bad change, though. But there was a dependency 
between the new request and the preceding one.





Another thing is that if you believe merging requests across databases
is a silly idea, maybe we should bite the bullet and replace the list of
requests with a single item. I'm not convinced about this, though.


No, I don't want to do that either.  We're not spending much code by
having pending_write_requests be a list rather than a single entry,
and we might eventually figure out a reasonable way to time the flushes
so that we can merge requests.



+1

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Peter Geoghegan
On Tue, May 31, 2016 at 10:10 AM, Josh berkus  wrote:
> "max_parallel_degree is the amount of parallelism in the query, with the
> understanding that the original parent process counts as 1, which means
> that if you set it to 1 you get no parallelism, and if you want 4
> parallel workers you need to set it to 5."
>
> Which one of those is going to require more explanations on -general and
> -novice?  Bets?
>
> Let's not be complicated for the sake of being complicated.

But the distinction between parallel workers and backends that can
participate in parallel query does need to be user-visible. Worker
processes are a commodity (i.e. the user must consider
max_worker_processes).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Joshua D. Drake

On 05/31/2016 10:10 AM, Josh berkus wrote:


Compare this:

"max_parallel is the maximum number of parallel workers which will work
on each stage of the query which is parallizable.  If you set it to 4,
you get up to 4 workers."

with this:

"max_parallel_degree is the amount of parallelism in the query, with the
understanding that the original parent process counts as 1, which means
that if you set it to 1 you get no parallelism, and if you want 4
parallel workers you need to set it to 5."

Which one of those is going to require more explanations on -general and
-novice?  Bets?

Let's not be complicated for the sake of being complicated.



+1

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Josh berkus
On 05/31/2016 10:03 AM, Tom Lane wrote:
> Josh berkus  writes:
>> I realize there's a lot of water under the bridge here, but I think
>> we're going to get 1000 questions on -general of the type:  "I asked for
>> 8 parallel workers, why did I only get 7?".  I believe we will regret
>> this change.
>> So, one vote from me to revert.
> 
> Well, that gets back to the question of whether average users will
> understand the "degree" terminology.  For the record, while I do not
> like the current behavior either, this was not the solution I favored.
> I thought we should rename the GUC and keep it as meaning the number
> of additional worker processes.

I will happily bet anyone a nice dinner in Ottawa that most users will
not understand it.

Compare this:

"max_parallel is the maximum number of parallel workers which will work
on each stage of the query which is parallizable.  If you set it to 4,
you get up to 4 workers."

with this:

"max_parallel_degree is the amount of parallelism in the query, with the
understanding that the original parent process counts as 1, which means
that if you set it to 1 you get no parallelism, and if you want 4
parallel workers you need to set it to 5."

Which one of those is going to require more explanations on -general and
-novice?  Bets?

Let's not be complicated for the sake of being complicated.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-31 Thread Tom Lane
Josh berkus  writes:
> I realize there's a lot of water under the bridge here, but I think
> we're going to get 1000 questions on -general of the type:  "I asked for
> 8 parallel workers, why did I only get 7?".  I believe we will regret
> this change.
> So, one vote from me to revert.

Well, that gets back to the question of whether average users will
understand the "degree" terminology.  For the record, while I do not
like the current behavior either, this was not the solution I favored.
I thought we should rename the GUC and keep it as meaning the number
of additional worker processes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tom Lane
Tomas Vondra  writes:
> On 05/26/2016 10:10 PM, Tom Lane wrote:
>> I posted a patch at
>> https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us
>> which I think is functionally equivalent to what you have here, but
>> it goes to some lengths to make the code more readable, whereas this
>> is just adding another layer of complication to something that's
>> already a mess (eg, the request_time field is quite useless as-is).
>> So I'd like to propose pushing that in place of this patch ... do you
>> care to review it first?

> I've reviewed the patch today, and it seems fine to me - correct and 
> achieving the same goal as the patch posted to this thread (plus fixing 
> the issue with shared catalogs and fixing many comments).

Thanks for reviewing!

> FWIW do you still plan to back-patch this? Minimizing the amount of 
> changes was one of the things I had in mind when writing "my" patch, 
> which is why I ended up with parts that are less readable.

Yeah, I think it's a bug fix and should be back-patched.  I'm not in
favor of making things more complicated just to reduce the number of
lines a patch touches.

> The one change I'm not quite sure about is the removal of clock skew 
> detection in pgstat_recv_inquiry(). You've removed the first check on 
> the inquiry, replacing it with this comment:
>  It seems sufficient to check for clock skew once per write round.
> But the first check was comparing msg/req, while the second check looks 
> at dbentry/cur_ts. I don't see how those two clock skew check are 
> redundant - if they are, the comment should explain that I guess.

I'm confused here --- are you speaking of having removed

if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;

?  That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've seen
more than one request.  But since req->request_time isn't actually being
used anywhere, it's useless code.

I reformatted the actual check for clock skew, but I do not think I
changed its behavior.

> Another thing is that if you believe merging requests across databases 
> is a silly idea, maybe we should bite the bullet and replace the list of 
> requests with a single item. I'm not convinced about this, though.

No, I don't want to do that either.  We're not spending much code by
having pending_write_requests be a list rather than a single entry,
and we might eventually figure out a reasonable way to time the flushes
so that we can merge requests.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel safety tagging of extension functions

2016-05-31 Thread Tom Lane
Andreas Karlsson  writes:
> So how to best change the function signatures? I do not think it is 
> possible without locking indexes by just using the SQL commands. You 
> cannot drop a function from the operator family without dropping the 
> operator class first.

Ugh.  I had been thinking that you could use ALTER OPERATOR FAMILY DROP
FUNCTION, but these functions are not "loose" in the opfamily --- they're
assumed to be bound to the specific opclass.  So there's an opclass
dependency preventing them from getting dropped; and we have no SQL
commands for changing the contents of an opclass.  After some fooling
around, I couldn't find a way to do it without some manual catalog update
or other.

Given that, your original approach of manually updating proargtypes in the
existing pg_proc row for the functions may be the best way.  Anything else
is going to be more complicated and ultimately will still require at least
one direct catalog update.

Sometime we might want to think about making this sort of thing cleaner
with more ALTER commands, but post-beta is probably not the time for that;
it wouldn't be a small patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >