On Tue, 19 Nov 2019 at 14:07, Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, Nov 18, 2019 at 5:50 PM Amit Khandekar <amitdkhan...@gmail.com> wrote: > > > > On Mon, 18 Nov 2019 at 17:20, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > I see that you have made changes in ReorderBufferRestoreChanges to use > > > PathNameOpenFile, but not in ReorderBufferSerializeTXN. Is there a > > > reason for the same? In my test environment, with the test provided > > > by you, I got the error (reported in this thread) via > > > ReorderBufferSerializeTXN. > > > > You didn't get this error with the patch applied, did you ? > > > > No, I got this before applying the patch. However, after applying the > patch, I got below error in the same test: > > postgres=# SELECT 1 from > pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1; > ERROR: could not read from reorderbuffer spill file: Invalid argument > > It seems to me that FileRead API used in the patch can return value < > 0 on EOF. See the API usage in BufFileLoadBuffer. I got this error > on a windows machine and in the server log the message was "LOG: > unrecognized win32 error code: 38" which indicates "Reached the end of > the file."
On Windows, it is documented that ReadFile() (which is called by pg_pread) will return false on EOF but only when the file is open for asynchronous reads/writes. But here we are just dealing with usual synchronous reads. So pg_pread() code should indeed return 0 on EOF on Windows. Not yet able to figure out how FileRead() managed to return this error on Windows. But from your symptoms, it does look like pg_pread()=>ReadFile() returned false (despite doing asynchronous reads), and so _dosmaperr() gets called, and then it does not find the eof error in doserrors[], so the "unrecognized win32 error code" message is printed. May have to dig up more on this. > > > If you were debugging this without the patch applied, I suspect that > > the reason why ReorderBufferSerializeTXN() => OpenTransientFile() is > > generating this error is because the max limit must be already crossed > > because of earlier calls to ReorderBufferRestoreChanges(). > > > > Note that in ReorderBufferSerializeTXN(), OpenTransientFile() is > > sufficient because the code in that function has made sure the fd gets > > closed there itself. > > > > Okay, then we might not need it there, but we should at least add a > comment in ReorderBufferRestoreChanges to explain why we have used a > different function to operate on the file at that place. Yeah, that might make sense. > > > > > For the API's that use VFDs (like PathNameOpenFile), the files opened > > are always recorded in the VfdCache array. So it is not required to do > > the cleanup at (sub)transaction end, because the kernel fds get closed > > dynamically in ReleaseLruFiles() whenever they reach max_safe_fds > > limit. So if a transaction aborts, the fds might remain open, but > > those will get cleaned up whenever we require more fds, through > > ReleaseLruFiles(). Whereas, for files opened through > > OpenTransientFile(), VfdCache is not involved, so this needs > > transaction end cleanup. > > > > Have you tried by injecting some error? After getting the error > mentioned above in email, when I retried the same query, I got the > below message. > > postgres=# SELECT 1 from > pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1; > ERROR: could not remove file > "pg_replslot/regression_slot/xid-1693-lsn-0-18000000.spill" during > removal of pg_replslot/regression_slot/xid*: Permission denied > > And, then I tried to drop the replication slot and I got below error. > postgres=# SELECT * FROM pg_drop_replication_slot('regression_slot'); > ERROR: could not rename file "pg_replslot/regression_slot" to > "pg_replslot/regression_slot.tmp": Permission denied > > It might be something related to Windows Oh ok, I missed the fact that on Windows we can't delete the files that are already open, unlike Linux/Unix. I guess, I may have to use FD_CLOSE_AT_EOXACT flags; or simply use OpenTemporaryFile(). I wonder though if this same issue might come up for the other use-case of PathNameOpenFile() : logical_rewrite_log_mapping(). > but you can once try by > injecting some error after reading a few files in the code path and > see the behavior. Yeah, will check the behaviour, although on Linux, I think I won't get this error. But yes, like I mentioned above, I think we might have to arrange for something. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company