Thanks Michael! On Sun, Jan 2, 2022 at 11:56 PM Michael Paquier <mich...@paquier.xyz> wrote:
> On Sun, Jan 02, 2022 at 09:27:43PM -0800, SATYANARAYANA NARLAPURAM wrote: > > I noticed that pg_receivewal fails to stream when the partial file to > write > > is not fully initialized and fails with the error message something like > > below. This requires an extra step of deleting the partial file that is > not > > fully initialized before starting the pg_receivewal. Attaching a simple > > patch that creates a temp file, fully initialize it and rename the file > to > > the desired wal segment name. > > Are you referring to the pre-padding when creating a new partial > segment, aka when we write chunks of XLOG_BLCKSZ full of zeros until > the file is fully created? What kind of error did you see? I guess > that a write() with ENOSPC would be more likely, but you got a > different problem? I see two cases, 1/ when no space is left on the device and 2/ when the process is taken down forcibly (a VM/container crash) > I don't disagree with improving such cases, but we > should not do things so as there is a risk of leaving behind an > infinite set of segments in case of repeated errors Do you see a problem with the proposed patch that leaves the files behind, at least in my testing I don't see any files left behind? > , and partial > segments are already a kind of temporary file. > if the .partial file exists with not zero-padded up to the wal segment size (WalSegSz), then open_walfile fails with the below error. I have two options here, 1/ to continue padding the existing partial file and let it zero up to WalSegSz , 2/create a temp file as I did in the patch. I thought the latter is safe because it can handle corrupt cases as described below. Thoughts? * When streaming to files, if an existing file exists we verify that it's * either empty (just created), or a complete WalSegSz segment (in which * case it has been created and padded). Anything else indicates a corrupt * file. Compressed files have no need for padding, so just ignore this * case. > > - if (dir_data->sync) > + if (shouldcreatetempfile) > + { > + if (durable_rename(tmpsuffixpath, targetpath) != 0) > + { > + close(fd); > + unlink(tmpsuffixpath); > + return NULL; > + } > + } > > durable_rename() does a set of fsync()'s, but --no-sync should not > flush any data. > I need to look into this further, without this I am seeing random file close and rename failures and disconnecting the stream. Also it appears we are calling durable_rename when we are closing the file (dir_close) even without --no-sync. Should we worry about the padding case? > -- > Michael >