Re: Let's remove DSM_INPL_NONE.
At Wed, 28 Feb 2018 10:08:59 +0900, Michael Paquier wrote in <20180228010859.gb1...@paquier.xyz> > On Tue, Feb 27, 2018 at 02:53:55PM -0500, Tom Lane wrote: > > Andres Freund writes: > >> On 2018-02-27 14:41:47 -0500, Tom Lane wrote: > >>> What I didn't understand about it was what kind of testing this'd make > >>> harder. If we desupport dynamic_shared_memory_type=none, there aren't > >>> any code paths that need to cope with the case, and we should just > >>> remove any code that thereby becomes unreachable. > > > >> What I'm concerned about isn't so much testing paths specific to > >> dynamic_shared_memory_type=none, but paths where we currently need > >> fallbacks for the case we couldn't actually allocate dynamic shared > >> memory. Which I think we at least somewhat gracefully need to deal with. > > > > Ah. That's a fair point, but I do not think > > dynamic_shared_memory_type=none is a good substitute for having a way to > > provoke allocation failures. That doesn't let you test recovery from > > situations where your first allocation works and second one fails, for > > example; and cleanup from that sort of case is likely to be more > > complicated than the trivial case. On the other hand, dynamic_shared_memory_type=none is not a proper means to silence DSM failures. If this patch causes some problems, exactly the same things would have happened for the setting dynamic_shared_memory_type *!=* none. If we need more "graceful" behavior for some specific failure modes, it should be amended separately. > dynamic_shared_memory_type is used in six code paths where it is aimed > at doing sanity checks: ... > The point is that there are other control mechanisms for each one of > them. Mainly, for the executor portion, the number of workers is > controlled by other GUCs on planner-side. If this means just that there should be any means other than the variable to turn off user-visible parallel features: > - Mute DSM initialization at postmaster startup. > - Mute cleanup of previous DSM segments from a past postmaster. These should be allowed unconditionally and should succeed. > - Block backend startup if there is no DSM. This is the result of any parallel activity, so this also should be allowed unconditionally and should succeed. > - Mute parallel query in planner. max_parallel_workers_per_gather=0 kills it. (min_parallel_*_scan_size or parallel_*_cost effectively can do the similar.) > - Mute parallel query for parallel btree builds. max_parallel_maintenance_workers = 0 does. > - Mute creation of parallel contexts in executor. No gahter or gather merge results in this. So max_parallel_workers_per_gather=0 also forces this. The attached is the rebased version, on which the symbols are renumbered 0-based as per comments. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 0672b1d80a08b0fb0a33a26aad8b1cc77bd83083 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 15 Feb 2018 11:22:50 +0900 Subject: [PATCH] Remove dynamic_shared_memroy_type=none Nowadays PostgreSQL on all supported platform offers any kind of dynamic shared memory feature. Assuming none hinder us from using it for core features. Just removing the option leads to intermittent failure of regression test on some platforms due to insufficient max_connection on initdb. This patch requires the patch that increases the minimum fallback value of max_connection of initdb. --- doc/src/sgml/config.sgml | 6 +++--- src/backend/access/transam/parallel.c | 7 --- src/backend/optimizer/plan/planner.c | 4 +--- src/backend/storage/ipc/dsm.c | 15 --- src/backend/storage/ipc/dsm_impl.c| 3 --- src/backend/utils/misc/postgresql.conf.sample | 1 - src/bin/initdb/initdb.c | 21 ++--- src/include/storage/dsm_impl.h| 9 - 8 files changed, 22 insertions(+), 44 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 3a8fc7d803..932ce11a58 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1602,9 +1602,9 @@ include_dir 'conf.d' should use. Possible values are posix (for POSIX shared memory allocated using shm_open), sysv (for System V shared memory allocated via shmget), -windows (for Windows shared memory), mmap -(to simulate shared memory using memory-mapped files stored in the -data directory), and none (to disable this feature). +windows (for Windows shared memory), +and mmap (to simulate shared memory using +memory-mapped files stored in the data directory). Not all values are supported on all platforms; the first supported option is the default for that platform. The use of the mmap option, which is not the default on any platform, diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/
Re: Let's remove DSM_INPL_NONE.
On Tue, Feb 27, 2018 at 11:30 PM, Tom Lane wrote: > I'd be in favor of having some normally-ifdef'd-out facility for causing > failures of this kind. (As I mentioned upthread, I do not think "always > fail" is sufficient.) That's very different from having a user-visible > option, though. We don't expose a GUC that enables CLOBBER_FREED_MEMORY, > to take a relevant example. Sure. Feel free to propose something you like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Let's remove DSM_INPL_NONE.
Robert Haas writes: > On Tue, Feb 27, 2018 at 2:50 PM, Andres Freund wrote: >> What I'm concerned about isn't so much testing paths specific to >> dynamic_shared_memory_type=none, but paths where we currently need >> fallbacks for the case we couldn't actually allocate dynamic shared >> memory. Which I think we at least somewhat gracefully need to deal with. > Well, it's not very hard to just hack the code to make dsm_create() > always fail, or fail X% of the time, if you're so inclined. I agree > that -c dynamic_shared_memory_type=none is a little more convenient > than sticking something like that into the code, but I don't think > it's sufficiently more convenient to justify keeping an option we > don't otherwise want. I'd be in favor of having some normally-ifdef'd-out facility for causing failures of this kind. (As I mentioned upthread, I do not think "always fail" is sufficient.) That's very different from having a user-visible option, though. We don't expose a GUC that enables CLOBBER_FREED_MEMORY, to take a relevant example. regards, tom lane
Re: Let's remove DSM_INPL_NONE.
On Tue, Feb 27, 2018 at 2:50 PM, Andres Freund wrote: > What I'm concerned about isn't so much testing paths specific to > dynamic_shared_memory_type=none, but paths where we currently need > fallbacks for the case we couldn't actually allocate dynamic shared > memory. Which I think we at least somewhat gracefully need to deal with. Well, it's not very hard to just hack the code to make dsm_create() always fail, or fail X% of the time, if you're so inclined. I agree that -c dynamic_shared_memory_type=none is a little more convenient than sticking something like that into the code, but I don't think it's sufficiently more convenient to justify keeping an option we don't otherwise want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Let's remove DSM_INPL_NONE.
On Tue, Feb 27, 2018 at 02:53:55PM -0500, Tom Lane wrote: > Andres Freund writes: >> On 2018-02-27 14:41:47 -0500, Tom Lane wrote: >>> What I didn't understand about it was what kind of testing this'd make >>> harder. If we desupport dynamic_shared_memory_type=none, there aren't >>> any code paths that need to cope with the case, and we should just >>> remove any code that thereby becomes unreachable. > >> What I'm concerned about isn't so much testing paths specific to >> dynamic_shared_memory_type=none, but paths where we currently need >> fallbacks for the case we couldn't actually allocate dynamic shared >> memory. Which I think we at least somewhat gracefully need to deal with. > > Ah. That's a fair point, but I do not think > dynamic_shared_memory_type=none is a good substitute for having a way to > provoke allocation failures. That doesn't let you test recovery from > situations where your first allocation works and second one fails, for > example; and cleanup from that sort of case is likely to be more > complicated than the trivial case. dynamic_shared_memory_type is used in six code paths where it is aimed at doing sanity checks: - Mute DSM initialization at postmaster startup. - Mute cleanup of previous DSM segments from a past postmaster. - Block backend startup if there is no DSM. - Mute parallel query in planner. - Mute parallel query for parallel btree builds. - Mute creation of parallel contexts in executor. The point is that there are other control mechanisms for each one of them. Mainly, for the executor portion, the number of workers is controlled by other GUCs on planner-side. -- Michael signature.asc Description: PGP signature
Re: Let's remove DSM_INPL_NONE.
On Tue, Feb 27, 2018 at 02:00:36PM -0500, Robert Haas wrote: > I think the two issues you are pointing out are the same issue -- > namely, if initdb probes for a max_connections setting with an invalid > DSM implementation configured, it will fail the test for every value > of max_connections and thus select the lowest possible value. The > solution to this is presumably just to make sure we choose the DSM > implementation before we do the max_connections probing so that we can > pass through the correct value there, which I think is what your patch > does. Yes, that's what the thing does. It moves the routine to find the DSM implementation before computing max_connections. -- Michael signature.asc Description: PGP signature
Re: Let's remove DSM_INPL_NONE.
Andres Freund writes: > On 2018-02-27 14:41:47 -0500, Tom Lane wrote: >> What I didn't understand about it was what kind of testing this'd make >> harder. If we desupport dynamic_shared_memory_type=none, there aren't >> any code paths that need to cope with the case, and we should just >> remove any code that thereby becomes unreachable. > What I'm concerned about isn't so much testing paths specific to > dynamic_shared_memory_type=none, but paths where we currently need > fallbacks for the case we couldn't actually allocate dynamic shared > memory. Which I think we at least somewhat gracefully need to deal with. Ah. That's a fair point, but I do not think dynamic_shared_memory_type=none is a good substitute for having a way to provoke allocation failures. That doesn't let you test recovery from situations where your first allocation works and second one fails, for example; and cleanup from that sort of case is likely to be more complicated than the trivial case. regards, tom lane
Re: Let's remove DSM_INPL_NONE.
On 2018-02-27 14:41:47 -0500, Tom Lane wrote: > Robert Haas writes: > > On Thu, Feb 15, 2018 at 1:00 PM, Andres Freund wrote: > >> Hm, I'm not quite convinced by this. Seems to make testing a bunch of > >> codepaths harder. I think it's fine to say that pg doesn't work > >> correctly with them disabled though. > > > I'm not sure I understand this. Do you mean we should just add a > > disclaimer to the documentation? > > What I didn't understand about it was what kind of testing this'd make > harder. If we desupport dynamic_shared_memory_type=none, there aren't > any code paths that need to cope with the case, and we should just > remove any code that thereby becomes unreachable. What I'm concerned about isn't so much testing paths specific to dynamic_shared_memory_type=none, but paths where we currently need fallbacks for the case we couldn't actually allocate dynamic shared memory. Which I think we at least somewhat gracefully need to deal with. Greetings, Andres Freund
Re: Let's remove DSM_INPL_NONE.
Robert Haas writes: > On Thu, Feb 15, 2018 at 1:00 PM, Andres Freund wrote: >> Hm, I'm not quite convinced by this. Seems to make testing a bunch of >> codepaths harder. I think it's fine to say that pg doesn't work >> correctly with them disabled though. > I'm not sure I understand this. Do you mean we should just add a > disclaimer to the documentation? What I didn't understand about it was what kind of testing this'd make harder. If we desupport dynamic_shared_memory_type=none, there aren't any code paths that need to cope with the case, and we should just remove any code that thereby becomes unreachable. regards, tom lane
Re: Let's remove DSM_INPL_NONE.
On Thu, Feb 15, 2018 at 1:00 PM, Andres Freund wrote: > Hm, I'm not quite convinced by this. Seems to make testing a bunch of > codepaths harder. I think it's fine to say that pg doesn't work > correctly with them disabled though. I'm not sure I understand this. Do you mean we should just add a disclaimer to the documentation? As long as dynamic_shared_memory_type=none is a supported configuration, we can't use DSM for anything critical, but there's a desire to use it in things like AV and the stats collector, which are pretty critical. Instead of removing it, we could do something like this: If you configure dynamic_shared_memory_type=none, parallel query will be disabled. Also, the stats collector will fail to run, so there will be no statistics to trigger autovacuum, and your database will become horribly bloated. Actually, that would be true anyway even if the stats collector somehow managed to work, because autovacuum itself will also fail. Moreover, because we've decide that you really have to have dynamic shared memory, there may probably be other things that also fail, too, either now or in the future. Basically, if you configure dynamic_shared_memory_type=none, expect a swift and painful death. ...but I'm not sure that's really a great option. It seems like leaving user-visible knobs lying around that break the entire world is a bad plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Let's remove DSM_INPL_NONE.
On Thu, Feb 15, 2018 at 5:58 AM, Kyotaro HORIGUCHI wrote: > It is amost a just-to-delete work but I see two issues raised so > far. I think the two issues you are pointing out are the same issue -- namely, if initdb probes for a max_connections setting with an invalid DSM implementation configured, it will fail the test for every value of max_connections and thus select the lowest possible value. The solution to this is presumably just to make sure we choose the DSM implementation before we do the max_connections probing so that we can pass through the correct value there, which I think is what your patch does. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Let's remove DSM_INPL_NONE.
Hello, thank you for the comment. At Fri, 16 Feb 2018 13:07:08 +0900, Michael Paquier wrote in <20180216040708.ga1...@paquier.xyz> > On Thu, Feb 15, 2018 at 07:58:57PM +0900, Kyotaro HORIGUCHI wrote: > > It is amost a just-to-delete work but I see two issues raised so > > far. > > The patch looks good as-is. This simplifies a couple of code paths > deciding if parallel queries can be used or not, so future features in > need of doing the same decision-making won't fall in the same trap as > the recent parellel btree builds. So I am +1 for having the buildfarm > express its opinion about it. Agreed. I'd like to see how buildfarm machines respond to this. > > 1. That change can raise regression test failure on some > >buildfarm machines[3]. > > > > The reason is that initdb creates a database with > > max_connection=10 from DSM failure caused by semaphore exhaustion > > , even though regression test requires at least 20 > > connections. At that time it was "fixed" by setting > > dynamic_shared_memory_type=none while detecting the number of > > usable max_connections and shared buffers[4]. Regardless of > > whether the probing was succeeded or not, initdb finally creats a > > database on that any DSM implementation is activated, since > > initdb doesn't choose "none". Finally the test items for parallel > > query actually suceeded. > > > > For the reason above, I believe just increasing the minimum > > fallback value of max_connections to 20 will work[5]. Anyway it > > is a problem to be fixed that initdb creates an inexecutable > > database if it uses the fallback values of max_connections=10. > > > > [3] > > https://www.postgresql.org/message-id/ca+tgmoyhiigrcvssjhmbsebmof2zx_9_9rwd75cwvu99yrd...@mail.gmail.com > > Do actually buildfarm machines select max_connections = 10 now? We > would have surely seen failures since max_wal_senders has its default > value set to 10 as well. Hence this is a separate problem. Even not being pretty confident, that's because the current initdb runs probing with dynamic_s_m_type=none. So the same BF animal can fail if initdb probes with dynamic_s_m_type=sysv and semaphore is exchausted at the time. > > [4] Commit: d41ab71712a4457ed39d5471b23949872ac91def > > > > [5] > > https://www.postgresql.org/message-id/20180209.170823.42008365.horiguchi.kyot...@lab.ntt.co.jp > > > > > > > Feel free to. Be just careful that the connection attempts in > > > test_config_settings() should try to use the DSM implementation defined > > > by choose_dsm_implementation(). > > > > Thank you for the advice. The probing fails with the default > > setting if posix shared memory somehow fails on a platform like > > Linux that is usually expected to have it. It's enough to choose > > the implementation before probing. Some messages from initdb are > > transposed as the result. > > > > | creating directory /home/horiguti/data/data_ndsm ... ok > > | creating subdirectories ... ok > > | + selecting dynamic shared memory implementation ... posix > > | selecting default max_connections ... 100 > > | selecting default shared_buffers ... 128MB > > | - selecting dynamic shared memory implementation ... posix > > > > Even though probing can end with failure in the case of [3], it > > will not be a problem with the patch [4]. > > That's fine by me, as this is actually the purpose of your patch. > > +++ b/src/include/storage/dsm_impl.h > @@ -14,7 +14,6 @@ > #define DSM_IMPL_H > > /* Dynamic shared memory implementations. */ > -#define DSM_IMPL_NONE 0 > #define DSM_IMPL_POSIX 1 > #define DSM_IMPL_SYSV 2 > #define DSM_IMPL_WINDOWS 3 > You may as well assign numbers from 0 here and reorder the whole set. The reason of that is I prefered to leave 0 as unused default value. But it doesn't have significance after a clean build. I'm fine with reordering (or renumbering) them. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Let's remove DSM_INPL_NONE.
On Thu, Feb 15, 2018 at 10:00:39AM -0800, Andres Freund wrote: > Hm, I'm not quite convinced by this. Seems to make testing a bunch of > codepaths harder. I think it's fine to say that pg doesn't work > correctly with them disabled though. Well, for what it's worth that's one thing less to be forgotten when implementing features related to parallel query. That's the lesson 88ef48c is telling us here, much unlikely anybody would disable it though after an initdb. -- Michael signature.asc Description: PGP signature
Re: Let's remove DSM_INPL_NONE.
On Thu, Feb 15, 2018 at 07:58:57PM +0900, Kyotaro HORIGUCHI wrote: > It is amost a just-to-delete work but I see two issues raised so > far. The patch looks good as-is. This simplifies a couple of code paths deciding if parallel queries can be used or not, so future features in need of doing the same decision-making won't fall in the same trap as the recent parellel btree builds. So I am +1 for having the buildfarm express its opinion about it. > 1. That change can raise regression test failure on some >buildfarm machines[3]. > > The reason is that initdb creates a database with > max_connection=10 from DSM failure caused by semaphore exhaustion > , even though regression test requires at least 20 > connections. At that time it was "fixed" by setting > dynamic_shared_memory_type=none while detecting the number of > usable max_connections and shared buffers[4]. Regardless of > whether the probing was succeeded or not, initdb finally creats a > database on that any DSM implementation is activated, since > initdb doesn't choose "none". Finally the test items for parallel > query actually suceeded. > > For the reason above, I believe just increasing the minimum > fallback value of max_connections to 20 will work[5]. Anyway it > is a problem to be fixed that initdb creates an inexecutable > database if it uses the fallback values of max_connections=10. > > [3] > https://www.postgresql.org/message-id/ca+tgmoyhiigrcvssjhmbsebmof2zx_9_9rwd75cwvu99yrd...@mail.gmail.com Do actually buildfarm machines select max_connections = 10 now? We would have surely seen failures since max_wal_senders has its default value set to 10 as well. Hence this is a separate problem. > [4] Commit: d41ab71712a4457ed39d5471b23949872ac91def > > [5] > https://www.postgresql.org/message-id/20180209.170823.42008365.horiguchi.kyot...@lab.ntt.co.jp > > > > Feel free to. Be just careful that the connection attempts in > > test_config_settings() should try to use the DSM implementation defined > > by choose_dsm_implementation(). > > Thank you for the advice. The probing fails with the default > setting if posix shared memory somehow fails on a platform like > Linux that is usually expected to have it. It's enough to choose > the implementation before probing. Some messages from initdb are > transposed as the result. > > | creating directory /home/horiguti/data/data_ndsm ... ok > | creating subdirectories ... ok > | + selecting dynamic shared memory implementation ... posix > | selecting default max_connections ... 100 > | selecting default shared_buffers ... 128MB > | - selecting dynamic shared memory implementation ... posix > > Even though probing can end with failure in the case of [3], it > will not be a problem with the patch [4]. That's fine by me, as this is actually the purpose of your patch. +++ b/src/include/storage/dsm_impl.h @@ -14,7 +14,6 @@ #define DSM_IMPL_H /* Dynamic shared memory implementations. */ -#define DSM_IMPL_NONE 0 #define DSM_IMPL_POSIX 1 #define DSM_IMPL_SYSV 2 #define DSM_IMPL_WINDOWS 3 You may as well assign numbers from 0 here and reorder the whole set. -- Michael signature.asc Description: PGP signature
Re: Let's remove DSM_INPL_NONE.
Hi, On 2018-02-15 19:58:57 +0900, Kyotaro HORIGUCHI wrote: > As in the other threads[1][2], we have had several good reasons > to remove DSM_IMPL_NONE in PG11. The attached patch doesn that. > > [1] > https://www.postgresql.org/message-id/CA+Tgmoa0e23YC3SCwB85Yf5oUTRyirV=sq_rxyxaxabldpp...@mail.gmail.com > > [2] > https://www.postgresql.org/message-id/20180214.103858.02713585.horiguchi.kyot...@lab.ntt.co.jp Hm, I'm not quite convinced by this. Seems to make testing a bunch of codepaths harder. I think it's fine to say that pg doesn't work correctly with them disabled though. - Andres