Re: Let's remove DSM_INPL_NONE.

2018-03-08 Thread Kyotaro HORIGUCHI
At Wed, 28 Feb 2018 10:08:59 +0900, Michael Paquier  wrote 
in <20180228010859.gb1...@paquier.xyz>
> On Tue, Feb 27, 2018 at 02:53:55PM -0500, Tom Lane wrote:
> > Andres Freund  writes:
> >> On 2018-02-27 14:41:47 -0500, Tom Lane wrote:
> >>> What I didn't understand about it was what kind of testing this'd make
> >>> harder.  If we desupport dynamic_shared_memory_type=none, there aren't
> >>> any code paths that need to cope with the case, and we should just
> >>> remove any code that thereby becomes unreachable.
> > 
> >> What I'm concerned about isn't so much testing paths specific to
> >> dynamic_shared_memory_type=none, but paths where we currently need
> >> fallbacks for the case we couldn't actually allocate dynamic shared
> >> memory. Which I think we at least somewhat gracefully need to deal with.
> > 
> > Ah.  That's a fair point, but I do not think
> > dynamic_shared_memory_type=none is a good substitute for having a way to
> > provoke allocation failures.  That doesn't let you test recovery from
> > situations where your first allocation works and second one fails, for
> > example; and cleanup from that sort of case is likely to be more
> > complicated than the trivial case.

On the other hand, dynamic_shared_memory_type=none is not a
proper means to silence DSM failures. If this patch causes some
problems, exactly the same things would have happened for the
setting dynamic_shared_memory_type *!=* none.  If we need more
"graceful" behavior for some specific failure modes, it should be
amended separately.

> dynamic_shared_memory_type is used in six code paths where it is aimed
> at doing sanity checks:
...
> The point is that there are other control mechanisms for each one of
> them.  Mainly, for the executor portion, the number of workers is
> controlled by other GUCs on planner-side.

If this means just that there should be any means other than the
variable to turn off user-visible parallel features:

> - Mute DSM initialization at postmaster startup.
> - Mute cleanup of previous DSM segments from a past postmaster.
These should be allowed unconditionally and should succeed.

> - Block backend startup if there is no DSM.
This is the result of any parallel activity, so this also should
be allowed unconditionally and should succeed.

> - Mute parallel query in planner.
max_parallel_workers_per_gather=0 kills it.
(min_parallel_*_scan_size or parallel_*_cost effectively can do
the similar.)

> - Mute parallel query for parallel btree builds.
max_parallel_maintenance_workers = 0 does.

> - Mute creation of parallel contexts in executor.
No gahter or gather merge results in this. So
max_parallel_workers_per_gather=0 also forces this.


The attached is the rebased version, on which the symbols are
renumbered 0-based as per comments.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 0672b1d80a08b0fb0a33a26aad8b1cc77bd83083 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 15 Feb 2018 11:22:50 +0900
Subject: [PATCH] Remove dynamic_shared_memroy_type=none

Nowadays PostgreSQL on all supported platform offers any kind of
dynamic shared memory feature. Assuming none hinder us from using it
for core features.  Just removing the option leads to intermittent
failure of regression test on some platforms due to insufficient
max_connection on initdb. This patch requires the patch that increases
the minimum fallback value of max_connection of initdb.
---
 doc/src/sgml/config.sgml  |  6 +++---
 src/backend/access/transam/parallel.c |  7 ---
 src/backend/optimizer/plan/planner.c  |  4 +---
 src/backend/storage/ipc/dsm.c | 15 ---
 src/backend/storage/ipc/dsm_impl.c|  3 ---
 src/backend/utils/misc/postgresql.conf.sample |  1 -
 src/bin/initdb/initdb.c   | 21 ++---
 src/include/storage/dsm_impl.h|  9 -
 8 files changed, 22 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3a8fc7d803..932ce11a58 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1602,9 +1602,9 @@ include_dir 'conf.d'
 should use.  Possible values are posix (for POSIX shared
 memory allocated using shm_open), sysv
 (for System V shared memory allocated via shmget),
-windows (for Windows shared memory), mmap
-(to simulate shared memory using memory-mapped files stored in the
-data directory), and none (to disable this feature).
+windows (for Windows shared memory),
+and mmap (to simulate shared memory using
+memory-mapped files stored in the data directory).
 Not all values are supported on all platforms; the first supported
 option is the default for that platform.  The use of the
 mmap option, which is not the default on any platform,
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/

Re: Let's remove DSM_INPL_NONE.

2018-02-28 Thread Robert Haas
On Tue, Feb 27, 2018 at 11:30 PM, Tom Lane  wrote:
> I'd be in favor of having some normally-ifdef'd-out facility for causing
> failures of this kind.  (As I mentioned upthread, I do not think "always
> fail" is sufficient.)  That's very different from having a user-visible
> option, though.  We don't expose a GUC that enables CLOBBER_FREED_MEMORY,
> to take a relevant example.

Sure.  Feel free to propose something you like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Let's remove DSM_INPL_NONE.

2018-02-27 Thread Tom Lane
Robert Haas  writes:
> On Tue, Feb 27, 2018 at 2:50 PM, Andres Freund  wrote:
>> What I'm concerned about isn't so much testing paths specific to
>> dynamic_shared_memory_type=none, but paths where we currently need
>> fallbacks for the case we couldn't actually allocate dynamic shared
>> memory. Which I think we at least somewhat gracefully need to deal with.

> Well, it's not very hard to just hack the code to make dsm_create()
> always fail, or fail X% of the time, if you're so inclined.  I agree
> that -c dynamic_shared_memory_type=none is a little more convenient
> than sticking something like that into the code, but I don't think
> it's sufficiently more convenient to justify keeping an option we
> don't otherwise want.

I'd be in favor of having some normally-ifdef'd-out facility for causing
failures of this kind.  (As I mentioned upthread, I do not think "always
fail" is sufficient.)  That's very different from having a user-visible
option, though.  We don't expose a GUC that enables CLOBBER_FREED_MEMORY,
to take a relevant example.

regards, tom lane



Re: Let's remove DSM_INPL_NONE.

2018-02-27 Thread Robert Haas
On Tue, Feb 27, 2018 at 2:50 PM, Andres Freund  wrote:
> What I'm concerned about isn't so much testing paths specific to
> dynamic_shared_memory_type=none, but paths where we currently need
> fallbacks for the case we couldn't actually allocate dynamic shared
> memory. Which I think we at least somewhat gracefully need to deal with.

Well, it's not very hard to just hack the code to make dsm_create()
always fail, or fail X% of the time, if you're so inclined.  I agree
that -c dynamic_shared_memory_type=none is a little more convenient
than sticking something like that into the code, but I don't think
it's sufficiently more convenient to justify keeping an option we
don't otherwise want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Let's remove DSM_INPL_NONE.

2018-02-27 Thread Michael Paquier
On Tue, Feb 27, 2018 at 02:53:55PM -0500, Tom Lane wrote:
> Andres Freund  writes:
>> On 2018-02-27 14:41:47 -0500, Tom Lane wrote:
>>> What I didn't understand about it was what kind of testing this'd make
>>> harder.  If we desupport dynamic_shared_memory_type=none, there aren't
>>> any code paths that need to cope with the case, and we should just
>>> remove any code that thereby becomes unreachable.
> 
>> What I'm concerned about isn't so much testing paths specific to
>> dynamic_shared_memory_type=none, but paths where we currently need
>> fallbacks for the case we couldn't actually allocate dynamic shared
>> memory. Which I think we at least somewhat gracefully need to deal with.
> 
> Ah.  That's a fair point, but I do not think
> dynamic_shared_memory_type=none is a good substitute for having a way to
> provoke allocation failures.  That doesn't let you test recovery from
> situations where your first allocation works and second one fails, for
> example; and cleanup from that sort of case is likely to be more
> complicated than the trivial case.

dynamic_shared_memory_type is used in six code paths where it is aimed
at doing sanity checks:
- Mute DSM initialization at postmaster startup.
- Mute cleanup of previous DSM segments from a past postmaster.
- Block backend startup if there is no DSM.
- Mute parallel query in planner.
- Mute parallel query for parallel btree builds.
- Mute creation of parallel contexts in executor.
The point is that there are other control mechanisms for each one of
them.  Mainly, for the executor portion, the number of workers is
controlled by other GUCs on planner-side.
--
Michael


signature.asc
Description: PGP signature


Re: Let's remove DSM_INPL_NONE.

2018-02-27 Thread Michael Paquier
On Tue, Feb 27, 2018 at 02:00:36PM -0500, Robert Haas wrote:
> I think the two issues you are pointing out are the same issue --
> namely, if initdb probes for a max_connections setting with an invalid
> DSM implementation configured, it will fail the test for every value
> of max_connections and thus select the lowest possible value.  The
> solution to this is presumably just to make sure we choose the DSM
> implementation before we do the max_connections probing so that we can
> pass through the correct value there, which I think is what your patch
> does.

Yes, that's what the thing does.  It moves the routine to find the DSM
implementation before computing max_connections.
--
Michael


signature.asc
Description: PGP signature


Re: Let's remove DSM_INPL_NONE.

2018-02-27 Thread Tom Lane
Andres Freund  writes:
> On 2018-02-27 14:41:47 -0500, Tom Lane wrote:
>> What I didn't understand about it was what kind of testing this'd make
>> harder.  If we desupport dynamic_shared_memory_type=none, there aren't
>> any code paths that need to cope with the case, and we should just
>> remove any code that thereby becomes unreachable.

> What I'm concerned about isn't so much testing paths specific to
> dynamic_shared_memory_type=none, but paths where we currently need
> fallbacks for the case we couldn't actually allocate dynamic shared
> memory. Which I think we at least somewhat gracefully need to deal with.

Ah.  That's a fair point, but I do not think
dynamic_shared_memory_type=none is a good substitute for having a way to
provoke allocation failures.  That doesn't let you test recovery from
situations where your first allocation works and second one fails, for
example; and cleanup from that sort of case is likely to be more
complicated than the trivial case.

regards, tom lane



Re: Let's remove DSM_INPL_NONE.

2018-02-27 Thread Andres Freund
On 2018-02-27 14:41:47 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > On Thu, Feb 15, 2018 at 1:00 PM, Andres Freund  wrote:
> >> Hm, I'm not quite convinced by this. Seems to make testing a bunch of
> >> codepaths harder.  I think it's fine to say that pg doesn't work
> >> correctly with them disabled though.
> 
> > I'm not sure I understand this.  Do you mean we should just add a
> > disclaimer to the documentation?
> 
> What I didn't understand about it was what kind of testing this'd make
> harder.  If we desupport dynamic_shared_memory_type=none, there aren't
> any code paths that need to cope with the case, and we should just
> remove any code that thereby becomes unreachable.

What I'm concerned about isn't so much testing paths specific to
dynamic_shared_memory_type=none, but paths where we currently need
fallbacks for the case we couldn't actually allocate dynamic shared
memory. Which I think we at least somewhat gracefully need to deal with.

Greetings,

Andres Freund



Re: Let's remove DSM_INPL_NONE.

2018-02-27 Thread Tom Lane
Robert Haas  writes:
> On Thu, Feb 15, 2018 at 1:00 PM, Andres Freund  wrote:
>> Hm, I'm not quite convinced by this. Seems to make testing a bunch of
>> codepaths harder.  I think it's fine to say that pg doesn't work
>> correctly with them disabled though.

> I'm not sure I understand this.  Do you mean we should just add a
> disclaimer to the documentation?

What I didn't understand about it was what kind of testing this'd make
harder.  If we desupport dynamic_shared_memory_type=none, there aren't
any code paths that need to cope with the case, and we should just
remove any code that thereby becomes unreachable.

regards, tom lane



Re: Let's remove DSM_INPL_NONE.

2018-02-27 Thread Robert Haas
On Thu, Feb 15, 2018 at 1:00 PM, Andres Freund  wrote:
> Hm, I'm not quite convinced by this. Seems to make testing a bunch of
> codepaths harder.  I think it's fine to say that pg doesn't work
> correctly with them disabled though.

I'm not sure I understand this.  Do you mean we should just add a
disclaimer to the documentation?

As long as dynamic_shared_memory_type=none is a supported
configuration, we can't use DSM for anything critical, but there's a
desire to use it in things like AV and the stats collector, which are
pretty critical.  Instead of removing it, we could do something like
this:


If you configure dynamic_shared_memory_type=none, parallel query will
be disabled.  Also, the stats collector will fail to run, so there
will be no statistics to trigger autovacuum, and your database will
become horribly bloated.  Actually, that would be true anyway even if
the stats collector somehow managed to work, because autovacuum itself
will also fail.  Moreover, because we've decide that you really have
to have dynamic shared memory, there may probably be other things that
also fail, too, either now or in the future.  Basically, if you
configure dynamic_shared_memory_type=none, expect a swift and painful
death.


...but I'm not sure that's really a great option.  It seems like
leaving user-visible knobs lying around that break the entire world is
a bad plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Let's remove DSM_INPL_NONE.

2018-02-27 Thread Robert Haas
On Thu, Feb 15, 2018 at 5:58 AM, Kyotaro HORIGUCHI
 wrote:
> It is amost a just-to-delete work but I see two issues raised so
> far.

I think the two issues you are pointing out are the same issue --
namely, if initdb probes for a max_connections setting with an invalid
DSM implementation configured, it will fail the test for every value
of max_connections and thus select the lowest possible value.  The
solution to this is presumably just to make sure we choose the DSM
implementation before we do the max_connections probing so that we can
pass through the correct value there, which I think is what your patch
does.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Let's remove DSM_INPL_NONE.

2018-02-15 Thread Kyotaro HORIGUCHI
Hello, thank you for the comment.

At Fri, 16 Feb 2018 13:07:08 +0900, Michael Paquier  wrote 
in <20180216040708.ga1...@paquier.xyz>
> On Thu, Feb 15, 2018 at 07:58:57PM +0900, Kyotaro HORIGUCHI wrote:
> > It is amost a just-to-delete work but I see two issues raised so
> > far.
> 
> The patch looks good as-is.  This simplifies a couple of code paths
> deciding if parallel queries can be used or not, so future features in
> need of doing the same decision-making won't fall in the same trap as
> the recent parellel btree builds.  So I am +1 for having the buildfarm
> express its opinion about it.

Agreed. I'd like to see how buildfarm machines respond to this.

> > 1. That change can raise regression test failure on some
> >buildfarm machines[3].
> > 
> > The reason is that initdb creates a database with
> > max_connection=10 from DSM failure caused by semaphore exhaustion
> > , even though regression test requires at least 20
> > connections. At that time it was "fixed" by setting
> > dynamic_shared_memory_type=none while detecting the number of
> > usable max_connections and shared buffers[4].  Regardless of
> > whether the probing was succeeded or not, initdb finally creats a
> > database on that any DSM implementation is activated, since
> > initdb doesn't choose "none". Finally the test items for parallel
> > query actually suceeded.
> > 
> > For the reason above, I believe just increasing the minimum
> > fallback value of max_connections to 20 will work[5]. Anyway it
> > is a problem to be fixed that initdb creates an inexecutable
> > database if it uses the fallback values of max_connections=10.
> > 
> > [3]
> > https://www.postgresql.org/message-id/ca+tgmoyhiigrcvssjhmbsebmof2zx_9_9rwd75cwvu99yrd...@mail.gmail.com
> 
> Do actually buildfarm machines select max_connections = 10 now?  We
> would have surely seen failures since max_wal_senders has its default
> value set to 10 as well.  Hence this is a separate problem.

Even not being pretty confident, that's because the current
initdb runs probing with dynamic_s_m_type=none. So the same BF
animal can fail if initdb probes with dynamic_s_m_type=sysv and
semaphore is exchausted at the time.

> > [4] Commit: d41ab71712a4457ed39d5471b23949872ac91def
> > 
> > [5] 
> > https://www.postgresql.org/message-id/20180209.170823.42008365.horiguchi.kyot...@lab.ntt.co.jp
> > 
> > 
> > > Feel free to.  Be just careful that the connection attempts in
> > > test_config_settings() should try to use the DSM implementation defined
> > > by choose_dsm_implementation().
> > 
> > Thank you for the advice. The probing fails with the default
> > setting if posix shared memory somehow fails on a platform like
> > Linux that is usually expected to have it.  It's enough to choose
> > the implementation before probing. Some messages from initdb are
> > transposed as the result.
> > 
> > |   creating directory /home/horiguti/data/data_ndsm ... ok
> > |   creating subdirectories ... ok
> > | + selecting dynamic shared memory implementation ... posix
> > |   selecting default max_connections ... 100
> > |   selecting default shared_buffers ... 128MB
> > | - selecting dynamic shared memory implementation ... posix
> > 
> > Even though probing can end with failure in the case of [3], it
> > will not be a problem with the patch [4].
> 
> That's fine by me, as this is actually the purpose of your patch.
> 
> +++ b/src/include/storage/dsm_impl.h
> @@ -14,7 +14,6 @@
>  #define DSM_IMPL_H
> 
>  /* Dynamic shared memory implementations. */
>  -#define DSM_IMPL_NONE  0
>   #define DSM_IMPL_POSIX 1
>   #define DSM_IMPL_SYSV  2
>   #define DSM_IMPL_WINDOWS   3
> You may as well assign numbers from 0 here and reorder the whole set.

The reason of that is I prefered to leave 0 as unused default
value. But it doesn't have significance after a clean build. I'm
fine with reordering (or renumbering) them.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Let's remove DSM_INPL_NONE.

2018-02-15 Thread Michael Paquier
On Thu, Feb 15, 2018 at 10:00:39AM -0800, Andres Freund wrote:
> Hm, I'm not quite convinced by this. Seems to make testing a bunch of
> codepaths harder.  I think it's fine to say that pg doesn't work
> correctly with them disabled though.

Well, for what it's worth that's one thing less to be forgotten when
implementing features related to parallel query.  That's the lesson
88ef48c is telling us here, much unlikely anybody would disable it
though after an initdb.
--
Michael


signature.asc
Description: PGP signature


Re: Let's remove DSM_INPL_NONE.

2018-02-15 Thread Michael Paquier
On Thu, Feb 15, 2018 at 07:58:57PM +0900, Kyotaro HORIGUCHI wrote:
> It is amost a just-to-delete work but I see two issues raised so
> far.

The patch looks good as-is.  This simplifies a couple of code paths
deciding if parallel queries can be used or not, so future features in
need of doing the same decision-making won't fall in the same trap as
the recent parellel btree builds.  So I am +1 for having the buildfarm
express its opinion about it.

> 1. That change can raise regression test failure on some
>buildfarm machines[3].
> 
> The reason is that initdb creates a database with
> max_connection=10 from DSM failure caused by semaphore exhaustion
> , even though regression test requires at least 20
> connections. At that time it was "fixed" by setting
> dynamic_shared_memory_type=none while detecting the number of
> usable max_connections and shared buffers[4].  Regardless of
> whether the probing was succeeded or not, initdb finally creats a
> database on that any DSM implementation is activated, since
> initdb doesn't choose "none". Finally the test items for parallel
> query actually suceeded.
> 
> For the reason above, I believe just increasing the minimum
> fallback value of max_connections to 20 will work[5]. Anyway it
> is a problem to be fixed that initdb creates an inexecutable
> database if it uses the fallback values of max_connections=10.
> 
> [3]
> https://www.postgresql.org/message-id/ca+tgmoyhiigrcvssjhmbsebmof2zx_9_9rwd75cwvu99yrd...@mail.gmail.com

Do actually buildfarm machines select max_connections = 10 now?  We
would have surely seen failures since max_wal_senders has its default
value set to 10 as well.  Hence this is a separate problem.

> [4] Commit: d41ab71712a4457ed39d5471b23949872ac91def
> 
> [5] 
> https://www.postgresql.org/message-id/20180209.170823.42008365.horiguchi.kyot...@lab.ntt.co.jp
> 
> 
> > Feel free to.  Be just careful that the connection attempts in
> > test_config_settings() should try to use the DSM implementation defined
> > by choose_dsm_implementation().
> 
> Thank you for the advice. The probing fails with the default
> setting if posix shared memory somehow fails on a platform like
> Linux that is usually expected to have it.  It's enough to choose
> the implementation before probing. Some messages from initdb are
> transposed as the result.
> 
> |   creating directory /home/horiguti/data/data_ndsm ... ok
> |   creating subdirectories ... ok
> | + selecting dynamic shared memory implementation ... posix
> |   selecting default max_connections ... 100
> |   selecting default shared_buffers ... 128MB
> | - selecting dynamic shared memory implementation ... posix
> 
> Even though probing can end with failure in the case of [3], it
> will not be a problem with the patch [4].

That's fine by me, as this is actually the purpose of your patch.

+++ b/src/include/storage/dsm_impl.h
@@ -14,7 +14,6 @@
 #define DSM_IMPL_H

 /* Dynamic shared memory implementations. */
 -#define DSM_IMPL_NONE  0
  #define DSM_IMPL_POSIX 1
  #define DSM_IMPL_SYSV  2
  #define DSM_IMPL_WINDOWS   3
You may as well assign numbers from 0 here and reorder the whole set.
--
Michael


signature.asc
Description: PGP signature


Re: Let's remove DSM_INPL_NONE.

2018-02-15 Thread Andres Freund
Hi,

On 2018-02-15 19:58:57 +0900, Kyotaro HORIGUCHI wrote:
> As in the other threads[1][2], we have had several good reasons
> to remove DSM_IMPL_NONE in PG11. The attached patch doesn that.
> 
> [1] 
> https://www.postgresql.org/message-id/CA+Tgmoa0e23YC3SCwB85Yf5oUTRyirV=sq_rxyxaxabldpp...@mail.gmail.com
> 
> [2] 
> https://www.postgresql.org/message-id/20180214.103858.02713585.horiguchi.kyot...@lab.ntt.co.jp

Hm, I'm not quite convinced by this. Seems to make testing a bunch of
codepaths harder.  I think it's fine to say that pg doesn't work
correctly with them disabled though.

- Andres