Re: [HACKERS] Dynamic Shared Memory stuff
On Tue, Apr 15, 2014 at 10:46 PM, Amit Kapila wrote: > On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas wrote: >> On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila >> wrote: >>> On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas wrote: For the create case, I'm wondering if we should put the block that tests for !hmap *before* the _dosmaperr() and check for EEXIST. What is your opinion? >>> >>> Either way is okay, but I think the way you are suggesting is better as it >>> will make code consistent with other place (PGSharedMemoryCreate()). >> >> OK, can you prepare a patch? > > Please find attached patch to address this issue. > One minor point to note is that now we have to call GetLastError() twice, > once inside error path and once to check EEXIST, but I think that is okay > as existing code in PGSharedMemoryCreate() does it that way. OK. I committed this blindly, but I don't have a Windows dev environment, so please keep an eye on the Windows buildfarm members and provide follow-on patches if any of them get unhappy about this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas wrote: > On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila wrote: >> On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas wrote: >>> For the create case, I'm wondering if we should put the block that >>> tests for !hmap *before* the _dosmaperr() and check for EEXIST. What >>> is your opinion? >> >> Either way is okay, but I think the way you are suggesting is better as it >> will make code consistent with other place (PGSharedMemoryCreate()). > > OK, can you prepare a patch? Please find attached patch to address this issue. One minor point to note is that now we have to call GetLastError() twice, once inside error path and once to check EEXIST, but I think that is okay as existing code in PGSharedMemoryCreate() does it that way. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_dsm_invalid_errcode_issue-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila wrote: > On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas wrote: >> On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila wrote: >>> I have checked that other place in code also check handle to >>> decide if API has failed. Refer function PGSharedMemoryIsInUse(). >>> So I think fix to call GetLastError() after checking handle is okay. >>> Attached patch fixes this issue. After patch, the server shows below >>> log which is exactly what is expected from test_shm_mq >> >> In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred >> way to test for a NULL pointer. I notice that the !hmap style is used >> throughout this code, so I guess cleaning that up is a matter for a >> separate commit. > > I think in that case we might want to cleanup some other similar usage > (PGSharedMemoryCreate) of !hmap. Ah. Well, in that case maybe we should just leave it alone, since it's been like that forever and nobody's cared until now. >> For the create case, I'm wondering if we should put the block that >> tests for !hmap *before* the _dosmaperr() and check for EEXIST. What >> is your opinion? > > Either way is okay, but I think the way you are suggesting is better as it > will make code consistent with other place (PGSharedMemoryCreate()). OK, can you prepare a patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas wrote: > On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila wrote: >> I have checked that other place in code also check handle to >> decide if API has failed. Refer function PGSharedMemoryIsInUse(). >> So I think fix to call GetLastError() after checking handle is okay. >> Attached patch fixes this issue. After patch, the server shows below >> log which is exactly what is expected from test_shm_mq > > In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred > way to test for a NULL pointer. I notice that the !hmap style is used > throughout this code, so I guess cleaning that up is a matter for a > separate commit. I think in that case we might want to cleanup some other similar usage (PGSharedMemoryCreate) of !hmap. > For the create case, I'm wondering if we should put the block that > tests for !hmap *before* the _dosmaperr() and check for EEXIST. What > is your opinion? Either way is okay, but I think the way you are suggesting is better as it will make code consistent with other place (PGSharedMemoryCreate()). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila wrote: > On Wed, Apr 9, 2014 at 9:20 PM, Robert Haas wrote: >> On Wed, Apr 9, 2014 at 7:41 AM, Amit Kapila wrote: >>> I am just not sure whether it is okay to rearrange the code and call >>> GetLastError() only if returned handle is Invalid (NULL) or try to look >>> for more info. >> >> Well, I don't know either. You wrote the Windows portion of this >> code, so you'll have to decide what's best. If the best practice in >> this area is to only call GetLastError() if the handle isn't valid, >> then that's probably what we should do, too. But I can't speak to >> what the best practice is. > > I have checked that other place in code also check handle to > decide if API has failed. Refer function PGSharedMemoryIsInUse(). > So I think fix to call GetLastError() after checking handle is okay. > Attached patch fixes this issue. After patch, the server shows below > log which is exactly what is expected from test_shm_mq In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred way to test for a NULL pointer. I notice that the !hmap style is used throughout this code, so I guess cleaning that up is a matter for a separate commit. For the create case, I'm wondering if we should put the block that tests for !hmap *before* the _dosmaperr() and check for EEXIST. What is your opinion? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Wed, Apr 9, 2014 at 9:20 PM, Robert Haas wrote: > On Wed, Apr 9, 2014 at 7:41 AM, Amit Kapila wrote: >> I am just not sure whether it is okay to rearrange the code and call >> GetLastError() only if returned handle is Invalid (NULL) or try to look >> for more info. > > Well, I don't know either. You wrote the Windows portion of this > code, so you'll have to decide what's best. If the best practice in > this area is to only call GetLastError() if the handle isn't valid, > then that's probably what we should do, too. But I can't speak to > what the best practice is. I have checked that other place in code also check handle to decide if API has failed. Refer function PGSharedMemoryIsInUse(). So I think fix to call GetLastError() after checking handle is okay. Attached patch fixes this issue. After patch, the server shows below log which is exactly what is expected from test_shm_mq LOG: registering background worker "test_shm_mq" LOG: starting background worker process "test_shm_mq" LOG: worker process: test_shm_mq (PID 4888) exited with exit code 1 LOG: unregistering background worker "test_shm_mq" LOG: registering background worker "test_shm_mq" LOG: starting background worker process "test_shm_mq" LOG: worker process: test_shm_mq (PID 3128) exited with exit code 1 LOG: unregistering background worker "test_shm_mq" With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_dsm_invalid_errcode_issue.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On 2014-04-09 11:50:33 -0400, Robert Haas wrote: > > One question: > > 1. I have seen that initdb still creates pg_dynshmem, is it required > > after your latest changes? > > It's only used now if dynamic_shared_memory_type = mmap. I know > Andres was never a huge fan of the mmap implementation, so we could > rip that out and get rid of the directory, too, but I kind of liked > having it, and broke the tie in favor of myself. It's purely a toy thing. I think pretty much every dynshm user that actually transfers data through it will be better off falling back to single process style work, rather than parellizing it. Anyway, it's there, it doesn't presently cause problems, and I don't have to maintain it, so ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Wed, Apr 9, 2014 at 7:41 AM, Amit Kapila wrote: > Few Observations: > > 1. One new warning has been introduced in code. > 1>src\backend\port\win32_shmem.c(295): warning C4013: > 'dsm_set_control_handle' undefined; assuming extern returning int > Attached patch fixes this warning. OK, committed, after moving the header to the correct position in alphabetical order. > 2. On running test_shm_mq manually, the client side didn't get > any problem (the results are as expected). However I am seeing > below log on server: > LOG: registering background worker "test_shm_mq" > LOG: starting background worker process "test_shm_mq" > LOG: unrecognized win32 error code: 127 > LOG: worker process: test_shm_mq (PID 2528) exited with exit code 1 > LOG: unregistering background worker "test_shm_mq" > > I think below message in log is not expected: > "LOG: unrecognized win32 error code: 127" > > This has been started appearing after your commit related to DSM. > I have debugged this issue and found that it comes from below part > of code: > dsm_impl_windows > { > .. > else > { > hmap = OpenFileMapping(FILE_MAP_WRITE | FILE_MAP_READ, > FALSE, /* do not inherit the name */ > name); /* name of mapping object */ > _dosmaperr(GetLastError()); > } > } > > Here even though handle returned by OpenFileMapping() is a valid handle, > but still GetLastError() return 127 (The specified procedure could not be > found.). Now the specs[1] of API says that if handle is non-NULL, then > consider it success, so I am not sure whether we should bother about this > error or not. I have tried many ways (trying different parameters, search on > net) to change this and related API's, but the same problem exists. One > strange thing is if I just call the API twice (I know this is not > right, but just > to experiment for finding some info), second time this error doesn't > occur. > The only difference in latest changes is that now the OpenFileMapping() > is getting called by Child process rather than peer process (w.r.t process > that has called CreateFileMapping), don't know why this should make > such a difference. > > On net whatever examples I have seen for such usage, they call > GetLastError() only if handle is invalid, we have called in above fashion > just to keep code little bit simple. > > I am just not sure whether it is okay to rearrange the code and call > GetLastError() only if returned handle is Invalid (NULL) or try to look > for more info. Well, I don't know either. You wrote the Windows portion of this code, so you'll have to decide what's best. If the best practice in this area is to only call GetLastError() if the handle isn't valid, then that's probably what we should do, too. But I can't speak to what the best practice is. > One question: > 1. I have seen that initdb still creates pg_dynshmem, is it required > after your latest changes? It's only used now if dynamic_shared_memory_type = mmap. I know Andres was never a huge fan of the mmap implementation, so we could rip that out and get rid of the directory, too, but I kind of liked having it, and broke the tie in favor of myself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Tue, Apr 8, 2014 at 9:15 PM, Robert Haas wrote: > Apparently not. However, I'm fairly sure this is a step toward > addressing the complaints previously raised, even if there may be some > details people still want changed, so I've gone ahead and committed > it. Few Observations: 1. One new warning has been introduced in code. 1>src\backend\port\win32_shmem.c(295): warning C4013: 'dsm_set_control_handle' undefined; assuming extern returning int Attached patch fixes this warning. 2. On running test_shm_mq manually, the client side didn't get any problem (the results are as expected). However I am seeing below log on server: LOG: registering background worker "test_shm_mq" LOG: starting background worker process "test_shm_mq" LOG: unrecognized win32 error code: 127 LOG: worker process: test_shm_mq (PID 2528) exited with exit code 1 LOG: unregistering background worker "test_shm_mq" I think below message in log is not expected: "LOG: unrecognized win32 error code: 127" This has been started appearing after your commit related to DSM. I have debugged this issue and found that it comes from below part of code: dsm_impl_windows { .. else { hmap = OpenFileMapping(FILE_MAP_WRITE | FILE_MAP_READ, FALSE, /* do not inherit the name */ name); /* name of mapping object */ _dosmaperr(GetLastError()); } } Here even though handle returned by OpenFileMapping() is a valid handle, but still GetLastError() return 127 (The specified procedure could not be found.). Now the specs[1] of API says that if handle is non-NULL, then consider it success, so I am not sure whether we should bother about this error or not. I have tried many ways (trying different parameters, search on net) to change this and related API's, but the same problem exists. One strange thing is if I just call the API twice (I know this is not right, but just to experiment for finding some info), second time this error doesn't occur. The only difference in latest changes is that now the OpenFileMapping() is getting called by Child process rather than peer process (w.r.t process that has called CreateFileMapping), don't know why this should make such a difference. On net whatever examples I have seen for such usage, they call GetLastError() only if handle is invalid, we have called in above fashion just to keep code little bit simple. I am just not sure whether it is okay to rearrange the code and call GetLastError() only if returned handle is Invalid (NULL) or try to look for more info. One question: 1. I have seen that initdb still creates pg_dynshmem, is it required after your latest changes? Let me know your opinion? [1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa366791(v=vs.85).aspx With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_dsm_win_warning.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Fri, Apr 4, 2014 at 10:01 AM, Robert Haas wrote: > On Wed, Jan 22, 2014 at 10:17 AM, Noah Misch wrote: >> Yeah, abandoning the state file is looking attractive. > > Here's a draft patch getting rid of the state file. This should > address concerns raised by Heikki and Fujii Masao and echoed by Tom > that dynamic shared memory behaves differently than the main shared > memory segment. The control segment ID is stored in the System V > shared memory block, so we're guaranteed that when using either System > V or POSIX shared memory we'll always latch onto the control segment > that matches up with the main shared memory segment we latched on to. > Cleanup for the file-mmap and Windows methods doesn't need to change, > because the former always just means clearing out $PGDATA/pg_dynshmem, > and the latter is done automatically by the operating system. > > Comments? Apparently not. However, I'm fairly sure this is a step toward addressing the complaints previously raised, even if there may be some details people still want changed, so I've gone ahead and committed it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Wed, Jan 22, 2014 at 10:17 AM, Noah Misch wrote: > Yeah, abandoning the state file is looking attractive. Here's a draft patch getting rid of the state file. This should address concerns raised by Heikki and Fujii Masao and echoed by Tom that dynamic shared memory behaves differently than the main shared memory segment. The control segment ID is stored in the System V shared memory block, so we're guaranteed that when using either System V or POSIX shared memory we'll always latch onto the control segment that matches up with the main shared memory segment we latched on to. Cleanup for the file-mmap and Windows methods doesn't need to change, because the former always just means clearing out $PGDATA/pg_dynshmem, and the latter is done automatically by the operating system. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company commit 6095f077a0d9a45e422087be1c7e8791c247a4e9 Author: Robert Haas Date: Fri Apr 4 09:06:23 2014 -0400 Get rid of the dynamic shared memory state file. Instead, store the control segment ID in the header for the main shared memory segment, and recover it from there when we restart after a crash. This ensures that we never latch onto the main shared memory segment from one previous run and the dynamic shared memory control segment from a different one. It also avoids problems if you take a base backup and start it up as hot standby on the same machine while the previous postmaster is still running. diff --git a/src/backend/port/ipc_test.c b/src/backend/port/ipc_test.c index 6259919..936c1a4 100644 --- a/src/backend/port/ipc_test.c +++ b/src/backend/port/ipc_test.c @@ -30,6 +30,7 @@ #include #include "miscadmin.h" +#include "storage/dsm.h" #include "storage/ipc.h" #include "storage/pg_sema.h" #include "storage/pg_shmem.h" @@ -214,12 +215,13 @@ int main(int argc, char **argv) { MyStorage *storage; + PGShmemHeader *shim; int cpid; printf("Creating shared memory ... "); fflush(stdout); - storage = (MyStorage *) PGSharedMemoryCreate(8192, false, 5433); + storage = (MyStorage *) PGSharedMemoryCreate(8192, false, 5433, &shim); storage->flag = 1234; diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index 51c1a2b..5e3850b 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -30,6 +30,7 @@ #include "miscadmin.h" #include "portability/mem.h" +#include "storage/dsm.h" #include "storage/ipc.h" #include "storage/pg_shmem.h" #include "utils/guc.h" @@ -421,7 +422,8 @@ CreateAnonymousSegment(Size *size) * zero will be passed. */ PGShmemHeader * -PGSharedMemoryCreate(Size size, bool makePrivate, int port) +PGSharedMemoryCreate(Size size, bool makePrivate, int port, + PGShmemHeader **shim) { IpcMemoryKey NextShmemSegID; void *memAddress; @@ -509,10 +511,13 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port) /* * The segment appears to be from a dead Postgres process, or from a - * previous cycle of life in this same process. Zap it, if possible. + * previous cycle of life in this same process. Zap it, if possible, + * and any associated dynamic shared memory segments, as well. * This probably shouldn't fail, but if it does, assume the segment * belongs to someone else after all, and continue quietly. */ + if (hdr->dsm_control != 0) + dsm_cleanup_using_control_segment(hdr->dsm_control); shmdt(memAddress); if (shmctl(shmid, IPC_RMID, NULL) < 0) continue; @@ -539,6 +544,7 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port) hdr = (PGShmemHeader *) memAddress; hdr->creatorPID = getpid(); hdr->magic = PGShmemMagic; + hdr->dsm_control = 0; /* Fill in the data directory ID info, too */ if (stat(DataDir, &statbuf) < 0) @@ -554,6 +560,7 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port) */ hdr->totalsize = size; hdr->freeoffset = MAXALIGN(sizeof(PGShmemHeader)); + *shim = hdr; /* Save info for possible future use */ UsedShmemSegAddr = memAddress; @@ -608,6 +615,7 @@ PGSharedMemoryReAttach(void) if (hdr != origUsedShmemSegAddr) elog(FATAL, "reattaching to shared memory returned unexpected address (got %p, expected %p)", hdr, origUsedShmemSegAddr); + dsm_set_control_handle(((PGShmemHeader *) hdr)->dsm_control); UsedShmemSegAddr = hdr; /* probably redundant */ } diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c index dca371c..3a0ded4 100644 --- a/src/backend/port/win32_shmem.c +++ b/src/backend/port/win32_shmem.c @@ -117,7 +117,8 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2) * */ PGShmemHeader * -PGSharedMemoryCreate(Size size, bool makePrivate, int port) +PGSharedMemoryCreate(Size size, bool makePrivate, int port, + PGShmemHeader **shim) { void *memAddress; PGShmemHeader *hdr; @@ -245,12 +246,14 @@ PGSharedMemoryCreate(Size size, bo
Re: [HACKERS] Dynamic Shared Memory stuff
On Wed, Jan 22, 2014 at 09:32:09AM -0500, Robert Haas wrote: > On Tue, Jan 21, 2014 at 2:58 PM, Noah Misch wrote: > >> What do people prefer? > > > > I recommend performing cleanup on the control segment named in PGShmemHeader > > just before shmdt() in PGSharedMemoryCreate(). No new ERROR or WARNING > > sites > > are necessary. Have dsm_postmaster_startup() continue to perform a cleanup > > on > > the control segment named in the state file. > > I think I'm on board with the first two sentences of this, but after > Fujii Masao's email yesterday, I can't help thinking that what you > propose the third sentence is a bad idea. He cloned a master to > create a standby server on the same machine, and the standby startup > ate the master's dynamic shared memory. We could teach pg_basebackup > not to copy the state file, but that wouldn't help people who take > base backups using the file system copy method, which is a lot of > people. I agree we should not rely on folks learning to omit the state file from base backups. Abandoning the state file is one way to resolve that, and the reasons I outlined for preferring to keep it were not overriding concerns. We could instead store a postmaster PID in dsm_control_header and only clean if that PID is dead. We could make DSM startup aware of whether we're using a backup label, but that would be awkward thanks to StartupXLOG() happening a good bit later. Yeah, abandoning the state file is looking attractive. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Tue, Jan 21, 2014 at 2:58 PM, Noah Misch wrote: >> What do people prefer? > > I recommend performing cleanup on the control segment named in PGShmemHeader > just before shmdt() in PGSharedMemoryCreate(). No new ERROR or WARNING sites > are necessary. Have dsm_postmaster_startup() continue to perform a cleanup on > the control segment named in the state file. I think I'm on board with the first two sentences of this, but after Fujii Masao's email yesterday, I can't help thinking that what you propose the third sentence is a bad idea. He cloned a master to create a standby server on the same machine, and the standby startup ate the master's dynamic shared memory. We could teach pg_basebackup not to copy the state file, but that wouldn't help people who take base backups using the file system copy method, which is a lot of people. >> 5. Give up on this approach. We could keep what we have now, or make >> the DSM control segment land at a well-known address as we do for the >> main segment. > > How would having the DSM control segment at a well-known address affect the > problem at hand? Did you mean a well-known dsm_handle? Yeah. So the idea is that we'd always use a dsm_handle of 100 + (100 * port) or something like that, and then search forward until we find a dsm_handle that works. This is basically the same algorithm we're using today for the main shared memory segment, but with a large additive constant so that they don't collide with each other. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Wed, Dec 18, 2013 at 12:21:08PM -0500, Robert Haas wrote: > On Tue, Dec 10, 2013 at 6:26 PM, Tom Lane wrote: > > The larger point is that such a shutdown process has never in the history > > of Postgres been successful at removing shared-memory (or semaphore) > > resources. I do not feel a need to put a higher recoverability standard > > onto the DSM code than what we've had for fifteen years with SysV shmem. > > > > But, by the same token, it shouldn't be any *less* recoverable. In this > > context that means that starting a new postmaster should recover the > > resources, at least 90% of the time (100% if we still have the old > > postmaster lockfile). I think the idea of keeping enough info in the > > SysV segment to permit recovery of DSM resources is a good one. Then, > > any case where the existing logic is able to free the old SysV segment > > is guaranteed to also work for DSM. > > So I'm taking a look at this. There doesn't appear to be anything > intrinsically intractable here, but it seems important to time the > order of operations so as to minimize the chances that things fail in > awkward places. The point where we remove the old shared memory > segment from the previous postmaster invocation is here: > > /* > * The segment appears to be from a dead Postgres process, or from a > * previous cycle of life in this same process. Zap it, if possible. > * This probably shouldn't fail, but if it does, assume the segment > * belongs to someone else after all, and continue quietly. > */ > shmdt(memAddress); > if (shmctl(shmid, IPC_RMID, NULL) < 0) > continue; > > My first thought was to remember the control segment ID from the > header just before the shmdt() there, and then later when the DSM > module is starting, do cleanup. But that doesn't seem like a good > idea, because then if startup fails after we remove the old shared > memory segment and before we get to the DSM initialization code, we'll > have lost the information on what control segment needs to be cleaned > up. A subsequent startup attempt won't see the old shm again, because > it's already gone. I'm fairly sure that this would be a net reduction > in reliability vs. the status quo. > > So now what I'm thinking is that we ought to actually perform the DSM > cleanup before detaching the old segment and trying to remove it. > That shouldn't fail, but if it does, or if we get killed before > completing it, the next run will hopefully find the same old shm and > finish the cleanup. But that kind of flies in the face of the comment > above: if we perform DSM cleanup and then discover that the segment > wasn't ours after all, that means we just stomped all over somebody > else's stuff. That's not too good. But trying to remove the segment > first and then perform the cleanup creates a window where, if we get > killed, the next restart will have lost information about how to > finish cleaning up. So it seems that some kind of logic tweak is > needed here, but I'm not sure exactly what. As I see it, the options > are: > > 1. Make failure to remove the shared memory segment we thought was > ours an error. This will definitely show up any problems, but only > after we've burned down some other processes's dynamic shared memory > segments. The most likely outcome is creating failure-to-start > problems that don't exist today. > > 2. Make it a warning. We'll still burn down somebody else's DSMs, but > at least we'll still start ourselves. Sadly, "WARNING: You have just > done a bad thing. It's too late to fix it. Sorry!" is not very > appealing. It has long been the responsibility of PGSharedMemoryCreate() to determine that a segment is unimportant before calling IPC_RMID. The success or failure of IPC_RMID is an unreliable guide to the correctness of that determination. IPC_RMID will succeed on an important segment owned by the same UID, and it will fail if some other process removed the segment after our shmat(). Such a failure does not impugn our having requested DSM cleanup on the basis of the PGShmemHeader we did read, so an apologetic WARNING would be wrong. > 3. Remove the old shared memory segment first, then perform the > cleanup immediately afterwards. If we get killed before completing > the cleanup, we'll leak the un-cleaned-up stuff. Decide that's OK and > move on. The period in which process exit would leak segments is narrow enough that this seems fine. I see no net advantage over performing the cleanup before shmdt(), though. > 4. Adopt some sort of belt-and-suspenders approach, keeping the state > file we have now and backstopping it with this mechanism, so that we > really only need this to work when $PGDATA has been blown away and > recreated. This seems pretty inelegant, and I'm not sure who it'll > benefit other than those (few?) people who kill -9 the postmaster (or > it segfaults or otherwise dies without
Re: [HACKERS] Dynamic Shared Memory stuff
On Tue, Dec 10, 2013 at 6:26 PM, Tom Lane wrote: > Noah Misch writes: >> On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote: >>> Let's not add more cases like that, if we can avoid it. > >> Only if we can avoid it for a modicum of effort and feature compromise. >> You're asking for PostgreSQL to reshape its use of persistent resources so >> you >> can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a >> memory leak. That use case, not PostgreSQL, has the defect here. > > The larger point is that such a shutdown process has never in the history > of Postgres been successful at removing shared-memory (or semaphore) > resources. I do not feel a need to put a higher recoverability standard > onto the DSM code than what we've had for fifteen years with SysV shmem. > > But, by the same token, it shouldn't be any *less* recoverable. In this > context that means that starting a new postmaster should recover the > resources, at least 90% of the time (100% if we still have the old > postmaster lockfile). I think the idea of keeping enough info in the > SysV segment to permit recovery of DSM resources is a good one. Then, > any case where the existing logic is able to free the old SysV segment > is guaranteed to also work for DSM. So I'm taking a look at this. There doesn't appear to be anything intrinsically intractable here, but it seems important to time the order of operations so as to minimize the chances that things fail in awkward places. The point where we remove the old shared memory segment from the previous postmaster invocation is here: /* * The segment appears to be from a dead Postgres process, or from a * previous cycle of life in this same process. Zap it, if possible. * This probably shouldn't fail, but if it does, assume the segment * belongs to someone else after all, and continue quietly. */ shmdt(memAddress); if (shmctl(shmid, IPC_RMID, NULL) < 0) continue; My first thought was to remember the control segment ID from the header just before the shmdt() there, and then later when the DSM module is starting, do cleanup. But that doesn't seem like a good idea, because then if startup fails after we remove the old shared memory segment and before we get to the DSM initialization code, we'll have lost the information on what control segment needs to be cleaned up. A subsequent startup attempt won't see the old shm again, because it's already gone. I'm fairly sure that this would be a net reduction in reliability vs. the status quo. So now what I'm thinking is that we ought to actually perform the DSM cleanup before detaching the old segment and trying to remove it. That shouldn't fail, but if it does, or if we get killed before completing it, the next run will hopefully find the same old shm and finish the cleanup. But that kind of flies in the face of the comment above: if we perform DSM cleanup and then discover that the segment wasn't ours after all, that means we just stomped all over somebody else's stuff. That's not too good. But trying to remove the segment first and then perform the cleanup creates a window where, if we get killed, the next restart will have lost information about how to finish cleaning up. So it seems that some kind of logic tweak is needed here, but I'm not sure exactly what. As I see it, the options are: 1. Make failure to remove the shared memory segment we thought was ours an error. This will definitely show up any problems, but only after we've burned down some other processes's dynamic shared memory segments. The most likely outcome is creating failure-to-start problems that don't exist today. 2. Make it a warning. We'll still burn down somebody else's DSMs, but at least we'll still start ourselves. Sadly, "WARNING: You have just done a bad thing. It's too late to fix it. Sorry!" is not very appealing. 3. Remove the old shared memory segment first, then perform the cleanup immediately afterwards. If we get killed before completing the cleanup, we'll leak the un-cleaned-up stuff. Decide that's OK and move on. 4. Adopt some sort of belt-and-suspenders approach, keeping the state file we have now and backstopping it with this mechanism, so that we really only need this to work when $PGDATA has been blown away and recreated. This seems pretty inelegant, and I'm not sure who it'll benefit other than those (few?) people who kill -9 the postmaster (or it segfaults or otherwise dies without running the code to remove shared memory segments) and then remove $PGDATA and then re-initdb and then expect cleanup to find the old DSMs. 5. Give up on this approach. We could keep what we have now, or make the DSM control segment land at a well-known address as we do for the main segment. What do people prefer? The other thing I'm a bit squidgy about is injecting more code that runs before we get the new shared memory segment up and r
Re: [HACKERS] Dynamic Shared Memory stuff
Noah Misch writes: > On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote: >> Let's not add more cases like that, if we can avoid it. > Only if we can avoid it for a modicum of effort and feature compromise. > You're asking for PostgreSQL to reshape its use of persistent resources so you > can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a > memory leak. That use case, not PostgreSQL, has the defect here. The larger point is that such a shutdown process has never in the history of Postgres been successful at removing shared-memory (or semaphore) resources. I do not feel a need to put a higher recoverability standard onto the DSM code than what we've had for fifteen years with SysV shmem. But, by the same token, it shouldn't be any *less* recoverable. In this context that means that starting a new postmaster should recover the resources, at least 90% of the time (100% if we still have the old postmaster lockfile). I think the idea of keeping enough info in the SysV segment to permit recovery of DSM resources is a good one. Then, any case where the existing logic is able to free the old SysV segment is guaranteed to also work for DSM. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On 2013-12-10 18:12:53 -0500, Noah Misch wrote: > On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote: > > On 12/10/2013 07:27 PM, Noah Misch wrote: > > >On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote: > > Let's not add more cases like that, if we can avoid it. > > Only if we can avoid it for a modicum of effort and feature compromise. > You're asking for PostgreSQL to reshape its use of persistent resources so you > can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a > memory leak. That use case, not PostgreSQL, has the defect here. Empathically seconded. > > > Another refinement is to wait for all the processes to attach before > > > setting > > > the segment's size with ftruncate(). That way, when the window is open for > > > leaking the segment, it's still 0-sized so leaking it is not a big deal. > > > >and it is less > > >general: not every use of DSM is conducive to having all processes attach > > >in a > > >short span of time. > > Let's cross that bridge when we get there. AFAICS it fits all the > > use cases discussed this far. The primary use case I have for dsm, namely writing extensions that can use shared memory without having to be listed in shared_preload_libraries, certainly wouldn't work in any sensible way with such a restriction. And I don't think that's an insignificant usecase. So I really fail to see what this would buy us. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote: > On 12/10/2013 07:27 PM, Noah Misch wrote: > >On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote: > >>>On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas > >>> wrote: > * As discussed in the "Something fishy happening on frogmouth" thread, I > don't like the fact that the dynamic shared memory segments will be > permanently leaked if you kill -9 postmaster and destroy the data > directory. > >>I really think we need to do something about it. To use your earlier > >>example of parallel sort, it's not acceptable to permanently leak a 512 > >>GB segment on a system with 1 TB of RAM. > > > >I don't. Erasing your data directory after an unclean shutdown voids any > >expectations for a thorough, automatic release of system resources. Don't do > >that. The next time some new use of a persistent resource violates your hope > >for this scenario, there may be no remedy. > > Well, the point of erasing the data directory is to release system > resources. I would normally expect "killall -9 ; rm -rf > " to thorougly get rid of the running program and all the > resources. It's surprising enough that the regular shared memory > segment is left behind Your expectation is misplaced. Processes and files are simply not the only persistent system resources of interest. > but at least that one gets cleaned up when > you start a new server (on same port). In the most-typical case, yes. In rare cases involving multiple postmasters starting and stopping, the successor to the erased data directory will not clean up the sysv segment. > Let's not add more cases like that, if we can avoid it. Only if we can avoid it for a modicum of effort and feature compromise. You're asking for PostgreSQL to reshape its use of persistent resources so you can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a memory leak. That use case, not PostgreSQL, has the defect here. > BTW, what if the data directory is seriously borked, and the server > won't start? Sure, don't do that, but it would be nice to have a way > to recover if you do anyway. (docs?) If something is corrupting your data directory in an open-ended manner, you have bigger problems than a memory leak until reboot. Recovering DSM happens before we read the control file, so the damage would need to fall among a short list of files for this to happen (bugs excluded). Nonetheless, I don't object to documenting the varieties of system resources that PostgreSQL may reserve and referencing the OS facilities for inspecting them. Are you actually using PostgreSQL this way: frequent "killall -9 postgres; rm -rf $PGDATA" after arbitrarily-bad $PGDATA corruption? Some automated fault injection test rig, perhaps? > >>One idea is to create the shared memory object with shm_open, and wait > >>until all the worker processes that need it have attached to it. Then, > >>shm_unlink() it, before using it for anything. That way the segment will > >>be automatically released once all the processes close() it, or die. In > >>particular, kill -9 will release it. (This is a variant of my earlier > >>idea to create a small number of anonymous shared memory file > >>descriptors in postmaster startup with shm_open(), and pass them down to > >>child processes with fork()). I think you could use that approach with > >>SysV shared memory as well, by destroying the segment with > >>sgmget(IPC_RMID) immediately after all processes have attached to it. > > > >That leaves a window in which we still leak the segment, > > A small window is better than a large one. Yes. > Another refinement is to wait for all the processes to attach before > setting the segment's size with ftruncate(). That way, when the > window is open for leaking the segment, it's still 0-sized so > leaking it is not a big deal. > > >and it is less > >general: not every use of DSM is conducive to having all processes attach in > >a > >short span of time. > > Let's cross that bridge when we get there. AFAICS it fits all the > use cases discussed this far. It does fit the use cases discussed thus far. nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On 12/10/2013 07:27 PM, Noah Misch wrote: On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote: On 11/20/2013 09:58 PM, Robert Haas wrote: On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas wrote: * As discussed in the "Something fishy happening on frogmouth" thread, I don't like the fact that the dynamic shared memory segments will be permanently leaked if you kill -9 postmaster and destroy the data directory. Your test elicited different behavior for the dsm code vs. the main shared memory segment because it involved running a new postmaster with a different data directory but the same port number on the same machine, and expecting that that new - and completely unrelated - postmaster would clean up the resources left behind by the old, now-destroyed cluster. I tend to view that as a defect in your test case more than anything else, but as I suggested previously, we could potentially change the code to use something like 100 + (port * 100) with a forward search for the control segment identifier, instead of using a state file, mimicking the behavior of the main shared memory segment. I'm not sure we ever reached consensus on whether that was overall better than what we have now. I really think we need to do something about it. To use your earlier example of parallel sort, it's not acceptable to permanently leak a 512 GB segment on a system with 1 TB of RAM. I don't. Erasing your data directory after an unclean shutdown voids any expectations for a thorough, automatic release of system resources. Don't do that. The next time some new use of a persistent resource violates your hope for this scenario, there may be no remedy. Well, the point of erasing the data directory is to release system resources. I would normally expect "killall -9 ; rm -rf dir>" to thorougly get rid of the running program and all the resources. It's surprising enough that the regular shared memory segment is left behind, but at least that one gets cleaned up when you start a new server (on same port). Let's not add more cases like that, if we can avoid it. BTW, what if the data directory is seriously borked, and the server won't start? Sure, don't do that, but it would be nice to have a way to recover if you do anyway. (docs?) One idea is to create the shared memory object with shm_open, and wait until all the worker processes that need it have attached to it. Then, shm_unlink() it, before using it for anything. That way the segment will be automatically released once all the processes close() it, or die. In particular, kill -9 will release it. (This is a variant of my earlier idea to create a small number of anonymous shared memory file descriptors in postmaster startup with shm_open(), and pass them down to child processes with fork()). I think you could use that approach with SysV shared memory as well, by destroying the segment with sgmget(IPC_RMID) immediately after all processes have attached to it. That leaves a window in which we still leak the segment, A small window is better than a large one. Another refinement is to wait for all the processes to attach before setting the segment's size with ftruncate(). That way, when the window is open for leaking the segment, it's still 0-sized so leaking it is not a big deal. and it is less general: not every use of DSM is conducive to having all processes attach in a short span of time. Let's cross that bridge when we get there. AFAICS it fits all the use cases discussed this far. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote: > On 11/20/2013 09:58 PM, Robert Haas wrote: >> On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas >> wrote: >>> * As discussed in the "Something fishy happening on frogmouth" thread, I >>> don't like the fact that the dynamic shared memory segments will be >>> permanently leaked if you kill -9 postmaster and destroy the data directory. >> >> Your test elicited different behavior for the dsm code vs. the main >> shared memory segment because it involved running a new postmaster >> with a different data directory but the same port number on the same >> machine, and expecting that that new - and completely unrelated - >> postmaster would clean up the resources left behind by the old, >> now-destroyed cluster. I tend to view that as a defect in your test >> case more than anything else, but as I suggested previously, we could >> potentially change the code to use something like 100 + (port * >> 100) with a forward search for the control segment identifier, instead >> of using a state file, mimicking the behavior of the main shared >> memory segment. I'm not sure we ever reached consensus on whether >> that was overall better than what we have now. > > I really think we need to do something about it. To use your earlier > example of parallel sort, it's not acceptable to permanently leak a 512 > GB segment on a system with 1 TB of RAM. I don't. Erasing your data directory after an unclean shutdown voids any expectations for a thorough, automatic release of system resources. Don't do that. The next time some new use of a persistent resource violates your hope for this scenario, there may be no remedy. Having said that, I'm not opposed to storing the DSM control segment handle in PGShmemHeader, which would enable DSM cleanup whenever we find cause to reclaim a main sysv segment. > One idea is to create the shared memory object with shm_open, and wait > until all the worker processes that need it have attached to it. Then, > shm_unlink() it, before using it for anything. That way the segment will > be automatically released once all the processes close() it, or die. In > particular, kill -9 will release it. (This is a variant of my earlier > idea to create a small number of anonymous shared memory file > descriptors in postmaster startup with shm_open(), and pass them down to > child processes with fork()). I think you could use that approach with > SysV shared memory as well, by destroying the segment with > sgmget(IPC_RMID) immediately after all processes have attached to it. That leaves a window in which we still leak the segment, and it is less general: not every use of DSM is conducive to having all processes attach in a short span of time. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Thu, Dec 5, 2013 at 4:06 PM, Heikki Linnakangas wrote: >> That's a very interesting idea. I've been thinking that we needed to >> preserve the property that new workers could attach to the shared >> memory segment at any time, but that might not be necessary in all >> case. We could introduce a new dsm operation that means "i promise no >> one else needs to attach to this segment". Further attachments would >> be disallowed by dsm.c regardless of the implementation in use, and >> dsm_impl.c would also be given a chance to perform >> implementation-specific operations, like shm_unlink and >> shmctl(IPC_RMID). This new operation, when used, would help to reduce >> the chance of leaks and perhaps catch other programming errors as >> well. >> >> What should we call it? dsm_finalize() is the first thing that comes >> to mind, but I'm not sure I like that. > > dsm_unlink() would mirror the underlying POSIX shm_unlink() call, and would > be familiar to anyone who understands how unlink() on a file works on Unix. OK, let me work on that once this CommitFest is done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On 12/05/2013 09:34 PM, Robert Haas wrote: On Thu, Dec 5, 2013 at 11:12 AM, Heikki Linnakangas wrote: One idea is to create the shared memory object with shm_open, and wait until all the worker processes that need it have attached to it. Then, shm_unlink() it, before using it for anything. That way the segment will be automatically released once all the processes close() it, or die. In particular, kill -9 will release it. (This is a variant of my earlier idea to create a small number of anonymous shared memory file descriptors in postmaster startup with shm_open(), and pass them down to child processes with fork()). I think you could use that approach with SysV shared memory as well, by destroying the segment with sgmget(IPC_RMID) immediately after all processes have attached to it. That's a very interesting idea. I've been thinking that we needed to preserve the property that new workers could attach to the shared memory segment at any time, but that might not be necessary in all case. We could introduce a new dsm operation that means "i promise no one else needs to attach to this segment". Further attachments would be disallowed by dsm.c regardless of the implementation in use, and dsm_impl.c would also be given a chance to perform implementation-specific operations, like shm_unlink and shmctl(IPC_RMID). This new operation, when used, would help to reduce the chance of leaks and perhaps catch other programming errors as well. What should we call it? dsm_finalize() is the first thing that comes to mind, but I'm not sure I like that. dsm_unlink() would mirror the underlying POSIX shm_unlink() call, and would be familiar to anyone who understands how unlink() on a file works on Unix. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Thu, Dec 5, 2013 at 11:12 AM, Heikki Linnakangas wrote: > Hmm. Those two use cases are quite different. For message-passing, you want > a lot of small queues, but for parallel sort, you want one huge allocation. > I wonder if we shouldn't even try a one-size-fits-all solution. > > For message-passing, there isn't much need to even use dynamic shared > memory. You could just assign one fixed-sized, single-reader multiple-writer > queue for each backend. True, although if the queue needs to 1MB, or even 128kB, that would bloat the static shared-memory footprint over the server pretty significantly. And I don't know that we know that a small queue will be adequate in all cases. If you've got a worker backend feeding data back to the user backend, the size of the queue limits how far ahead of the user backend that worker can get. Big is good, because then the user backend won't stall on read, but small is also good, in case the query is cancelled or hits an error. It is far from obvious to me that one-size-fits-all is the right solution. > For parallel sort, you'll want to utilize all the available memory and all > CPUs for one huge sort. So all you really need is a single huge shared > memory segment. If one process is already using that 512GB segment to do a > sort, you do *not* want to allocate a second 512GB segment. You'll want to > wait for the first operation to finish first. Or maybe you'll want to have > 3-4 somewhat smaller segments in use at the same time, but not more than > that. This is all true, but it has basically nothing to do with parallelism. work_mem is a poor model, but I didn't invent it. Hopefully some day someone will fix it, maybe even me, but that's a separate project. > I really think we need to do something about it. To use your earlier example > of parallel sort, it's not acceptable to permanently leak a 512 GB segment > on a system with 1 TB of RAM. > > One idea is to create the shared memory object with shm_open, and wait until > all the worker processes that need it have attached to it. Then, > shm_unlink() it, before using it for anything. That way the segment will be > automatically released once all the processes close() it, or die. In > particular, kill -9 will release it. (This is a variant of my earlier idea > to create a small number of anonymous shared memory file descriptors in > postmaster startup with shm_open(), and pass them down to child processes > with fork()). I think you could use that approach with SysV shared memory as > well, by destroying the segment with sgmget(IPC_RMID) immediately after all > processes have attached to it. That's a very interesting idea. I've been thinking that we needed to preserve the property that new workers could attach to the shared memory segment at any time, but that might not be necessary in all case. We could introduce a new dsm operation that means "i promise no one else needs to attach to this segment". Further attachments would be disallowed by dsm.c regardless of the implementation in use, and dsm_impl.c would also be given a chance to perform implementation-specific operations, like shm_unlink and shmctl(IPC_RMID). This new operation, when used, would help to reduce the chance of leaks and perhaps catch other programming errors as well. What should we call it? dsm_finalize() is the first thing that comes to mind, but I'm not sure I like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On 11/20/2013 09:58 PM, Robert Haas wrote: On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas wrote: How many allocations? What size will they have have typically, minimum and maximum? The facility is intended to be general, so the answer could vary widely by application. The testing that I have done so far suggests that for message-passing, relatively small queue sizes (a few kB, perhaps 1 MB at the outside) should be sufficient. However, applications such as parallel sort could require vast amounts of shared memory. Consider a machine with 1TB of memory performing a 512GB internal sort. You're going to need 512GB of shared memory for that. Hmm. Those two use cases are quite different. For message-passing, you want a lot of small queues, but for parallel sort, you want one huge allocation. I wonder if we shouldn't even try a one-size-fits-all solution. For message-passing, there isn't much need to even use dynamic shared memory. You could just assign one fixed-sized, single-reader multiple-writer queue for each backend. For parallel sort, you'll want to utilize all the available memory and all CPUs for one huge sort. So all you really need is a single huge shared memory segment. If one process is already using that 512GB segment to do a sort, you do *not* want to allocate a second 512GB segment. You'll want to wait for the first operation to finish first. Or maybe you'll want to have 3-4 somewhat smaller segments in use at the same time, but not more than that. * As discussed in the "Something fishy happening on frogmouth" thread, I don't like the fact that the dynamic shared memory segments will be permanently leaked if you kill -9 postmaster and destroy the data directory. Your test elicited different behavior for the dsm code vs. the main shared memory segment because it involved running a new postmaster with a different data directory but the same port number on the same machine, and expecting that that new - and completely unrelated - postmaster would clean up the resources left behind by the old, now-destroyed cluster. I tend to view that as a defect in your test case more than anything else, but as I suggested previously, we could potentially change the code to use something like 100 + (port * 100) with a forward search for the control segment identifier, instead of using a state file, mimicking the behavior of the main shared memory segment. I'm not sure we ever reached consensus on whether that was overall better than what we have now. I really think we need to do something about it. To use your earlier example of parallel sort, it's not acceptable to permanently leak a 512 GB segment on a system with 1 TB of RAM. One idea is to create the shared memory object with shm_open, and wait until all the worker processes that need it have attached to it. Then, shm_unlink() it, before using it for anything. That way the segment will be automatically released once all the processes close() it, or die. In particular, kill -9 will release it. (This is a variant of my earlier idea to create a small number of anonymous shared memory file descriptors in postmaster startup with shm_open(), and pass them down to child processes with fork()). I think you could use that approach with SysV shared memory as well, by destroying the segment with sgmget(IPC_RMID) immediately after all processes have attached to it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Sat, Nov 23, 2013 at 4:21 PM, Jeremy Harris wrote: > Its performance shines on partially- or reverse-sorted input. Search the archives for the work I did on timsort support a while back. A patch was posted, that had some impressive results provided you just considered the number of comparisons (and not TPS when sorting text), but at the time my sense was that it didn't have broad enough applicability for me to pursue further. That doesn't mean the idea wasn't useful, and it certainly doesn't mean that my rough patch couldn't be improved upon. For one thing, if there was a type that had a comparator that was, say, an order of magnitude more expensive than bttextcmp, it would definitely be a big win. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On 20/11/13 19:58, Robert Haas wrote: Parallel sort, and then parallel other stuff. Eventually general parallel query. I have recently updated https://wiki.postgresql.org/wiki/Parallel_Sort and you may find that interesting/helpful as a statement of intent. I've been playing with an internal merge sort which may be of interest in this area of work. While I've not worked on parallelizing the merge stages it does not feel like an impossible task. More to the immediate point, the input stage and readout stage can do useful work overlapped with the data source and sink (respectively). The current implementation uses the same amount of memory as the quicksort one, and does approximately the same number of compares (on random input). The overheads are higher than the quicksort implementation (up to 50% higher cpu time, on a single key of random integers). Its performance shines on partially- or reverse-sorted input. Transition to (the existing) external sort is implemented, as is the limited random-access facility, and bounded output. It supports unique-output. The planner interface is extended to return an indication of dedup guarantee, and the Unique node uses this to elide itself at planning time. Dedup operations also cover external sorts. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas wrote: > I'm trying to catch up on all of this dynamic shared memory stuff. A bunch > of random questions and complaints: > > What kind of usage are we trying to cater with the dynamic shared memory? Parallel sort, and then parallel other stuff. Eventually general parallel query. I have recently updated https://wiki.postgresql.org/wiki/Parallel_Sort and you may find that interesting/helpful as a statement of intent. > How many allocations? What size will they have have typically, minimum and > maximum? The facility is intended to be general, so the answer could vary widely by application. The testing that I have done so far suggests that for message-passing, relatively small queue sizes (a few kB, perhaps 1 MB at the outside) should be sufficient. However, applications such as parallel sort could require vast amounts of shared memory. Consider a machine with 1TB of memory performing a 512GB internal sort. You're going to need 512GB of shared memory for that. > I looked at the message queue implementation you posted, but I > wonder if that's the use case you're envisioning for this, or if you have > more things in mind. I consider that to be the first application of dynamic shared memory and expect it to be used for (1) returning errors from background workers to the user backend and (2) funneling tuples from one portion of a query tree that has been split off to run in a background worker back to the user backend. However, I expect that many clients of the dynamic shared memory system will want to roll their own data structures. Parallel internal sort (or external sort) is obviously one, and in addition to that we might have parallel construction of in-memory hash tables for a hash join or hash agg, or, well, anything else you'd like to parallelize. I expect that many of those case will result in much larger allocations than what we need just for message passing. > * dsm_handle is defined in dsm_impl.h, but it's exposed in the function > signatures in dsm.h. ISTM it should be moved to dsm.h Well, dsm_impl.h is the low-level stuff, and dsm.h is intended as the user API. Unfortunately, whichever file declares that will have to be included by the other one, so the separation is not as clean as I would like, but I thought it made more sense for the high-level stuff to depend on the low-level stuff rather than the other way around. > * The DSM API contains functions for resizing the segment. That's not > exercised by the MQ or TOC facilities. Is that going to stay dead code, or > do you envision a user for it? I dunno. It certainly seems like a useful thing to be able to do - if we run out of memory, go get some more. It'd obviously be more useful if we had a full-fledged allocator for dynamic shared memory, which is something that I'd like to build but haven't built yet. However, after discovering that it doesn't work either on Windows or with System V shared memory, I'm less sanguine about the chances of finding good uses for it. I haven't completely given up hope, but I don't have anything concrete in mind at the moment. It'd be a little more plausible if we adjusted things so that the mmap() implementation works on Windows. > * dsm_impl_can_resize() incorrectly returns false for DSM_IMPL_MMAP. The > mmap() implementation can resize. Oops, that's a bug. > * This is an issue I've seen for some time with git master, while working on > various things. Sometimes, when I kill the server with CTRL-C, I get this in > the log: > > ^CLOG: received fast shutdown request > LOG: aborting any active transactions > FATAL: terminating connection due to administrator command > LOG: autovacuum launcher shutting down > LOG: shutting down > LOG: database system is shut down > LOG: could not remove shared memory segment "/PostgreSQL.1804289383": > Tiedostoa tai hakemistoa ei ole > > (that means ENOENT) > > And I just figured out why that happens: If you take a base backup of a > running system, the pg_dynshmem/state file is included in the backup. If you > now start up a standby from the backup on the same system, it will "clean > up" and reuse the dynshmem segment still used by the master system. Now, > when you shut down the master, you get that message in the log. If the > segment was actually used for something, the master would naturally crash. Ooh. Well, pg_basebackup can be fixed not to copy that, but there's still going to be a problem with old-style base backups. We could try to figure out some additional sanity check for the dsm code to use, to determine whether or not it belongs to the same cluster, like storing the port number or the system identifier or some other value in the shared memory segment and then comparing it to verify whether we've got the same one. Or perhaps we could store the PID of the creating postmaster in there and check whether that PID is still alive, although we might get confused if the PID has been rec
[HACKERS] Dynamic Shared Memory stuff
I'm trying to catch up on all of this dynamic shared memory stuff. A bunch of random questions and complaints: What kind of usage are we trying to cater with the dynamic shared memory? How many allocations? What size will they have have typically, minimum and maximum? I looked at the message queue implementation you posted, but I wonder if that's the use case you're envisioning for this, or if you have more things in mind. * dsm_handle is defined in dsm_impl.h, but it's exposed in the function signatures in dsm.h. ISTM it should be moved to dsm.h * The DSM API contains functions for resizing the segment. That's not exercised by the MQ or TOC facilities. Is that going to stay dead code, or do you envision a user for it? * dsm_impl_can_resize() incorrectly returns false for DSM_IMPL_MMAP. The mmap() implementation can resize. * This is an issue I've seen for some time with git master, while working on various things. Sometimes, when I kill the server with CTRL-C, I get this in the log: ^CLOG: received fast shutdown request LOG: aborting any active transactions FATAL: terminating connection due to administrator command LOG: autovacuum launcher shutting down LOG: shutting down LOG: database system is shut down LOG: could not remove shared memory segment "/PostgreSQL.1804289383": Tiedostoa tai hakemistoa ei ole (that means ENOENT) And I just figured out why that happens: If you take a base backup of a running system, the pg_dynshmem/state file is included in the backup. If you now start up a standby from the backup on the same system, it will "clean up" and reuse the dynshmem segment still used by the master system. Now, when you shut down the master, you get that message in the log. If the segment was actually used for something, the master would naturally crash. * As discussed in the "Something fishy happening on frogmouth" thread, I don't like the fact that the dynamic shared memory segments will be permanently leaked if you kill -9 postmaster and destroy the data directory. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers