Re: Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)
On 26.11.2012 14:53, Amit Kapila wrote: On Friday, November 23, 2012 7:03 PM Heikki Linnakangas This is what I came up with. It adds a new function, OpenFile, that returns a raw file descriptor like BasicOpenFile, but the file descriptor is associated with the current subtransaction and automatically closed at end-of-xact, in the same way that AllocateFile and AllocateDir work. In other words, OpenFile is to open() what AllocateFile is to fopen(). BasicOpenFile is unchanged, it returns a raw fd and it's solely the caller's responsibility to close it, but many of the places that used to call BasicOpenFile now use the safer OpenFile function instead. This patch plugs three existing fd (or virtual fd) leaks: 1. copy_file() - fixed by by using OpenFile instead of BasicOpenFile 2. XLogFileLinit() - fixed by adding close() calls to the error cases. Can't use OpenFile here because the fd is supposed to persist over transaction boundaries. 3. lo_import/lo_export - fixed by using OpenFile instead of PathNameOpenFile. I have gone through the patch and find it okay except for one minor suggestion 1. Can we put below log in OpenFile as well +DO_DB(elog(LOG, "CloseFile: Allocated %d", numAllocatedDescs)); Thanks. Added that and committed. I didn't dare to backpatch this, even though it could be fairly easily backpatched. The leaks exist in older versions too, but since they're extremely rare (zero complaints from the field and it's been like that forever), I didn't want to take the risk. Maybe later, after this has had more testing in master. One thing I'm not too fond of is the naming. I'm calling the new functions OpenFile and CloseFile. There's some danger of confusion there, as the function to close a virtual file opened with PathNameOpenFile is called FileClose. OpenFile is really the same kind of operation as AllocateFile and AllocateDir, but returns an unbuffered fd. So it would be nice if it was called AllocateSomething, too. But AllocateFile is already taken. And I don't much like the Allocate* naming for these anyway, you really would expect the name to contain "open". OpenFileInTrans OpenTransactionAwareFile In anycase OpenFile is also okay. I ended up calling the functions OpenTransientFile and CloseTransientFile. Windows has a library function called "OpenFile", so that was a pretty bad choice after all. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)
Amit Kapila writes: > On Monday, November 26, 2012 7:01 PM Heikki Linnakangas wrote: >> Hmm, if it's just for locking purposes, how about using a lwlock or a >> heavy-weight lock instead? > Its not only for lock, the main idea is that we create temp file and write > modified configuration in that temp file. > In end if it's success, then we rename temp file to .conf file but if it > error out then at abort we need to delete temp file. > So in short, main point is to close/rename the file in case of success (at > end of command) and remove in case of abort. I'd go with the TRY/CATCH solution. It would be worth extending the fd.c infrastructure if there were multiple users of the feature, but there are not, nor do I see likely new candidates on the horizon. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)
On Monday, November 26, 2012 7:01 PM Heikki Linnakangas wrote: > On 26.11.2012 14:53, Amit Kapila wrote: > > I have one usecase in feature (SQL Command to edit postgresql.conf) > very > > similar to OpenFile/CloseFile, but I want that when CloseFile is > called from > > abort, it should remove(unlink) the file as well and during open it > has to > > retry few times if open is not success. > > I have following options: > > 1. Extend OpenFile/CloseFile or PathNameOpenFile > > 2. Write new functions similar to OpenFile/CloseFile, something like > > OpenConfLockFile/CloseConfLockFile > > 3. Use OpenFile/CloseFile and handle my specific case with PG_TRY .. > > PG_CATCH > > > > Any suggestions? > > Hmm, if it's just for locking purposes, how about using a lwlock or a > heavy-weight lock instead? Its not only for lock, the main idea is that we create temp file and write modified configuration in that temp file. In end if it's success, then we rename temp file to .conf file but if it error out then at abort we need to delete temp file. So in short, main point is to close/rename the file in case of success (at end of command) and remove in case of abort. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)
On 26.11.2012 14:53, Amit Kapila wrote: I have one usecase in feature (SQL Command to edit postgresql.conf) very similar to OpenFile/CloseFile, but I want that when CloseFile is called from abort, it should remove(unlink) the file as well and during open it has to retry few times if open is not success. I have following options: 1. Extend OpenFile/CloseFile or PathNameOpenFile 2. Write new functions similar to OpenFile/CloseFile, something like OpenConfLockFile/CloseConfLockFile 3. Use OpenFile/CloseFile and handle my specific case with PG_TRY .. PG_CATCH Any suggestions? Hmm, if it's just for locking purposes, how about using a lwlock or a heavy-weight lock instead? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)
On Friday, November 23, 2012 7:03 PM Heikki Linnakangas > On 15.11.2012 17:16, Heikki Linnakangas wrote: > > On 15.11.2012 16:55, Tom Lane wrote: > >> Heikki Linnakangas writes: > >>> This is a fairly general issue, actually. Looking around, I can see > >>> at least two similar cases in existing code, with BasicOpenFile, > >>> where we will leak file descriptors on error: > >> > >> Um, don't we automatically clean those up during transaction abort? > > > > Not the ones allocated with PathNameOpenFile or BasicOpenFile. Files > > allocated with AllocateFile() and OpenTemporaryFile() are cleaned up > > at abort. > > > >> If we don't, we ought to think about that, not about cluttering > >> calling code with certain-to-be-inadequate cleanup in error cases. > > > > Agreed. Cleaning up at end-of-xact won't help walsender or other > > non-backend processes, though, because they don't do transactions. But > > a top-level ResourceOwner that's reset in the sigsetjmp() cleanup > > routine would work. > > This is what I came up with. It adds a new function, OpenFile, that > returns a raw file descriptor like BasicOpenFile, but the file > descriptor is associated with the current subtransaction and > automatically closed at end-of-xact, in the same way that AllocateFile > and AllocateDir work. In other words, OpenFile is to open() what > AllocateFile is to fopen(). BasicOpenFile is unchanged, it returns a raw > fd and it's solely the caller's responsibility to close it, but many of > the places that used to call BasicOpenFile now use the safer OpenFile > function instead. > > This patch plugs three existing fd (or virtual fd) leaks: > > 1. copy_file() - fixed by by using OpenFile instead of BasicOpenFile 2. > XLogFileLinit() - fixed by adding close() calls to the error cases. > Can't use OpenFile here because the fd is supposed to persist over > transaction boundaries. > 3. lo_import/lo_export - fixed by using OpenFile instead of > PathNameOpenFile. I have gone through the patch and find it okay except for one minor suggestion 1. Can we put below log in OpenFile as well +DO_DB(elog(LOG, "CloseFile: Allocated %d", numAllocatedDescs)); > One thing I'm not too fond of is the naming. I'm calling the new > functions OpenFile and CloseFile. There's some danger of confusion > there, as the function to close a virtual file opened with > PathNameOpenFile is called FileClose. OpenFile is really the same kind > of operation as AllocateFile and AllocateDir, but returns an unbuffered > fd. So it would be nice if it was called AllocateSomething, too. But > AllocateFile is already taken. And I don't much like the Allocate* > naming for these anyway, you really would expect the name to contain > "open". OpenFileInTrans OpenTransactionAwareFile In anycase OpenFile is also okay. I have one usecase in feature (SQL Command to edit postgresql.conf) very similar to OpenFile/CloseFile, but I want that when CloseFile is called from abort, it should remove(unlink) the file as well and during open it has to retry few times if open is not success. I have following options: 1. Extend OpenFile/CloseFile or PathNameOpenFile 2. Write new functions similar to OpenFile/CloseFile, something like OpenConfLockFile/CloseConfLockFile 3. Use OpenFile/CloseFile and handle my specific case with PG_TRY .. PG_CATCH Any suggestions? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)
On 15.11.2012 17:16, Heikki Linnakangas wrote: On 15.11.2012 16:55, Tom Lane wrote: Heikki Linnakangas writes: This is a fairly general issue, actually. Looking around, I can see at least two similar cases in existing code, with BasicOpenFile, where we will leak file descriptors on error: Um, don't we automatically clean those up during transaction abort? Not the ones allocated with PathNameOpenFile or BasicOpenFile. Files allocated with AllocateFile() and OpenTemporaryFile() are cleaned up at abort. If we don't, we ought to think about that, not about cluttering calling code with certain-to-be-inadequate cleanup in error cases. Agreed. Cleaning up at end-of-xact won't help walsender or other non-backend processes, though, because they don't do transactions. But a top-level ResourceOwner that's reset in the sigsetjmp() cleanup routine would work. This is what I came up with. It adds a new function, OpenFile, that returns a raw file descriptor like BasicOpenFile, but the file descriptor is associated with the current subtransaction and automatically closed at end-of-xact, in the same way that AllocateFile and AllocateDir work. In other words, OpenFile is to open() what AllocateFile is to fopen(). BasicOpenFile is unchanged, it returns a raw fd and it's solely the caller's responsibility to close it, but many of the places that used to call BasicOpenFile now use the safer OpenFile function instead. This patch plugs three existing fd (or virtual fd) leaks: 1. copy_file() - fixed by by using OpenFile instead of BasicOpenFile 2. XLogFileLinit() - fixed by adding close() calls to the error cases. Can't use OpenFile here because the fd is supposed to persist over transaction boundaries. 3. lo_import/lo_export - fixed by using OpenFile instead of PathNameOpenFile. In addition, this replaces many BasicOpenFile() calls with OpenFile() that were not leaking, because the code meticulously closed the file on error. That wasn't strictly necessary, but IMHO it's good for robustness. One thing I'm not too fond of is the naming. I'm calling the new functions OpenFile and CloseFile. There's some danger of confusion there, as the function to close a virtual file opened with PathNameOpenFile is called FileClose. OpenFile is really the same kind of operation as AllocateFile and AllocateDir, but returns an unbuffered fd. So it would be nice if it was called AllocateSomething, too. But AllocateFile is already taken. And I don't much like the Allocate* naming for these anyway, you really would expect the name to contain "open". Do we want to backpatch this? We've had zero complaints, but this seems fairly safe to backpatch, and at least the leak in copy_file() can be quite annoying. If you run out of disk space in CREATE DATABASE, the target file is kept open even though it's deleted, so the space isn't reclaimed until you disconnect. - Heikki diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c index dd69c23..cd60dd8 100644 --- a/src/backend/access/transam/slru.c +++ b/src/backend/access/transam/slru.c @@ -531,7 +531,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruFlush fdata) int i; for (i = 0; i < fdata->num_files; i++) - close(fdata->fd[i]); + CloseFile(fdata->fd[i]); } /* Re-acquire control lock and update page state */ @@ -593,7 +593,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno) * SlruPhysicalWritePage). Hence, if we are InRecovery, allow the case * where the file doesn't exist, and return zeroes instead. */ - fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR); + fd = OpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR); if (fd < 0) { if (errno != ENOENT || !InRecovery) @@ -614,7 +614,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno) { slru_errcause = SLRU_SEEK_FAILED; slru_errno = errno; - close(fd); + CloseFile(fd); return false; } @@ -623,11 +623,11 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno) { slru_errcause = SLRU_READ_FAILED; slru_errno = errno; - close(fd); + CloseFile(fd); return false; } - if (close(fd)) + if (CloseFile(fd)) { slru_errcause = SLRU_CLOSE_FAILED; slru_errno = errno; @@ -740,7 +740,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata) * don't use O_EXCL or O_TRUNC or anything like that. */ SlruFileName(ctl, path, segno); - fd = BasicOpenFile(path, O_RDWR | O_CREAT | PG_BINARY, + fd = OpenFile(path, O_RDWR | O_CREAT | PG_BINARY, S_IRUSR | S_IWUSR); if (fd < 0) { @@ -773,7 +773,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata) slru_errcause = SLRU_SEEK_FAILED; slru_errno = errno; if (!fdata) - close(fd); + CloseFile(fd); return false; } @@ -786,7 +786,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata) slru_errcause = SLRU_WRITE_FAILED; slru_e