RE: [PATCH v2 0/6] Force pipes to flush immediately on NonStop platform
On January 23, 2018 1:13 PM, Junio C Hamano wrote: > > "Randall S. Becker"writes: > > >> IOW, I do not see it explained clearly why this change is needed on > >> any single platform---so "that issue may be shared by others, too" > >> is a bit premature thing for me to listen to and understand, as "that > >> issue" is quite unclear to me. > > > > v4 might be a little better. The issue seems to be specific to NonStop > > that it's PIPE mechanism needs to have setbuf(pipe,NULL) called for > > git to be happy. The default behaviour appears to be different on > > NonStop from other platforms from our testing. We get hung up waiting > > on pipes unless this is done. > > I am afraid that that is not a "diagnosis" enough to allow us moving forward. > We get hung up because...? When the process that has the other end of > pipe open exits, NonStop does not close the pipe properly? Or NonStop > does not flush the data buffered in the pipe? > Would it help if a compat wrapper that does setbuf(fd, NULL) immediately > before closing the fd, or some other more targetted mechanism, is used only > on NonStop, for example? Potentially megabytes of data can pass thru a > pipe, and if the platform bug affects only at the tail end of the transfer, > marking the pipe not to buffer at all at the beginning is too big a hammer to > work it around. With the explanation given so far, this still smells more > like > "we have futzed around without understanding why, and this happens to > work." It may be good enough for your purpose of making progress (after > all, I'd imagine that you'd need to work this around one way or another to > hunt for and fix more issues on the platform), but it does not sound like "we > know what the problem is, and this is the best workaround for that", which is > what we want if it wants to become part of the official codebase. As I feared, the test suite was unable to reproduce the issue without setbuf(NULL) - primary because the test structure ends up with both ends of the git dialogs on clone and fetch in the same CPU (even if different IPUs), which does not experience the issue and we can't loop-back through the platform's proprietary SSH. I am not comfortable releasing without it at this stage, but if you don't want to go forward with this fix, my team can run it for a few months internally in the hope that this works out for the better. The situation is timing related and is fine 99.98-ish% of the time. I really do want the setbuf present in any compiled versions that our community might get, primarily because I don't like sleepless nights chasing this down (again). Cheers, Randall
RE: [PATCH v2 0/6] Force pipes to flush immediately on NonStop platform
On January 23, 2018 1:13 PM, Junio C Hamano wrote: > "Randall S. Becker"writes: > > >> IOW, I do not see it explained clearly why this change is needed on > >> any single platform---so "that issue may be shared by others, too" > >> is a bit premature thing for me to listen to and understand, as "that > >> issue" is quite unclear to me. > > > > v4 might be a little better. The issue seems to be specific to NonStop > > that it's PIPE mechanism needs to have setbuf(pipe,NULL) called for > > git to be happy. The default behaviour appears to be different on > > NonStop from other platforms from our testing. We get hung up waiting > > on pipes unless this is done. > > I am afraid that that is not a "diagnosis" enough to allow us moving forward. > We get hung up because...? When the process that has the other end of > pipe open exits, NonStop does not close the pipe properly? Or NonStop > does not flush the data buffered in the pipe? > Would it help if a compat wrapper that does setbuf(fd, NULL) immediately > before closing the fd, or some other more targetted mechanism, is used only > on NonStop, for example? Potentially megabytes of data can pass thru a > pipe, and if the platform bug affects only at the tail end of the transfer, > marking the pipe not to buffer at all at the beginning is too big a hammer to > work it around. With the explanation given so far, this still smells more > like > "we have futzed around without understanding why, and this happens to > work." It may be good enough for your purpose of making progress (after > all, I'd imagine that you'd need to work this around one way or another to > hunt for and fix more issues on the platform), but it does not sound like "we > know what the problem is, and this is the best workaround for that", which is > what we want if it wants to become part of the official codebase. I am retesting without setbuf(NULL) but this is unlikely to be enlightening. The test cases do not adequately simulate the configuration in which my team originally encountered the problem. This requires a guarantee of the source and destination coming through different logical CPUs. We never encountered the issue in the test suite, only when end users got hold of it. We had two distinct problems, one which was the revent=0 related hang (already solved) and other was a stream flush problem. The two are related but distinct. The platform support group insisted that we should have the setbuf(NULL) in place for interprocess communications in the form used here - I'm worried about losing this, but completely aware that this is far too heavy for other platforms (hence the __TANDEM guard in wrapper.c). If the form of the wrapper should be different, I would be happy to comply. I have a much longer explanation about the platform message stack structure, but that doesn't belong here. Happy to respond privately if requested. Cheers, Randall -- Brief whoami: NonStop developer since approximately 2112884442 UNIX developer since approximately 421664400 -- In my real life, I talk too much.
Re: [PATCH v2 0/6] Force pipes to flush immediately on NonStop platform
"Randall S. Becker"writes: >> IOW, I do not see it explained clearly why this change is needed on any >> single >> platform---so "that issue may be shared by others, too" >> is a bit premature thing for me to listen to and understand, as "that issue" >> is >> quite unclear to me. > > v4 might be a little better. The issue seems to be specific to > NonStop that it's PIPE mechanism needs to have setbuf(pipe,NULL) > called for git to be happy. The default behaviour appears to be > different on NonStop from other platforms from our testing. We get > hung up waiting on pipes unless this is done. I am afraid that that is not a "diagnosis" enough to allow us moving forward. We get hung up because...? When the process that has the other end of pipe open exits, NonStop does not close the pipe properly? Or NonStop does not flush the data buffered in the pipe? Would it help if a compat wrapper that does setbuf(fd, NULL) immediately before closing the fd, or some other more targetted mechanism, is used only on NonStop, for example? Potentially megabytes of data can pass thru a pipe, and if the platform bug affects only at the tail end of the transfer, marking the pipe not to buffer at all at the beginning is too big a hammer to work it around. With the explanation given so far, this still smells more like "we have futzed around without understanding why, and this happens to work." It may be good enough for your purpose of making progress (after all, I'd imagine that you'd need to work this around one way or another to hunt for and fix more issues on the platform), but it does not sound like "we know what the problem is, and this is the best workaround for that", which is what we want if it wants to become part of the official codebase.
RE: [PATCH v2 0/6] Force pipes to flush immediately on NonStop platform
On January 22, 2018 5:36 PM, Junio C Hamano wrote: > Torsten Bögershausenwrites: > > > On Fri, Jan 19, 2018 at 12:33:59PM -0500, randall.s.bec...@rogers.com > wrote: > >> From: "Randall S. Becker" > >> > >> * wrapper.c: called setbuf(stream,0) to force pipe flushes not enabled by > >> default on the NonStop platform. > >> > >> Signed-off-by: Randall S. Becker > >> --- > >> wrapper.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/wrapper.c b/wrapper.c > >> index d20356a77..671cbb4b4 100644 > >> --- a/wrapper.c > >> +++ b/wrapper.c > >> @@ -403,6 +403,9 @@ FILE *xfdopen(int fd, const char *mode) > >>FILE *stream = fdopen(fd, mode); > >>if (stream == NULL) > >>die_errno("Out of memory? fdopen failed"); > >> +#ifdef __TANDEM > >> + setbuf(stream,0); > >> +#endif > > > > Reading the commit message, I would have expected someting similar to > > > > #ifdef FORCE_PIPE_FLUSHES > > setbuf(stream,0); > > #endif > > > > (Because other systems may need the tweak as well, some day) Of course > > you need to change that in the Makefile and config.mak.uname > > I actually wouldn't have expected anything like that after reading the commit > message. > > First I thought it was describing only what it does (i.e. "let's use > setbuf() to set the stream unbuffered on TANDEM"), which is a useless > description that only says what it does which we can read from the diff, but > "NonStop by default creates pipe that does not flush" is a potentially useful > information the log message adds. > But it is just "potentially"---we cannot read what exact problem the change is > trying to address. Making a pipe totally unbuffered is a heavy hammer that > may not be an appropriate solution---it could be that we are missing calls to > fflush() where we need and have been lucky because most of the systems > we deal with do line-buffered by default, or something silly/implausible like > that, and if that is the case, a more proper fix may be to add these missing > fflush() to right places. > > IOW, I do not see it explained clearly why this change is needed on any single > platform---so "that issue may be shared by others, too" > is a bit premature thing for me to listen to and understand, as "that issue" > is > quite unclear to me. v4 might be a little better. The issue seems to be specific to NonStop that it's PIPE mechanism needs to have setbuf(pipe,NULL) called for git to be happy. The default behaviour appears to be different on NonStop from other platforms from our testing. We get hung up waiting on pipes unless this is done. At the moment, this is platform-specific. Other parts of the discussion led to the conclusion that we should make this available to any platform using a new configuration option, but my objective is to get the NonStop port integrated with the main git code base and when my $DAYJOB permits it, spend the time adding the option. Note: __TANDEM is #define automatically emitted by the NonStop compilers. Does that help? Sincerely, Randall
Re: [PATCH v2 0/6] Force pipes to flush immediately on NonStop platform
Torsten Bögershausenwrites: > On Fri, Jan 19, 2018 at 12:33:59PM -0500, randall.s.bec...@rogers.com wrote: >> From: "Randall S. Becker" >> >> * wrapper.c: called setbuf(stream,0) to force pipe flushes not enabled by >> default on the NonStop platform. >> >> Signed-off-by: Randall S. Becker >> --- >> wrapper.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/wrapper.c b/wrapper.c >> index d20356a77..671cbb4b4 100644 >> --- a/wrapper.c >> +++ b/wrapper.c >> @@ -403,6 +403,9 @@ FILE *xfdopen(int fd, const char *mode) >> FILE *stream = fdopen(fd, mode); >> if (stream == NULL) >> die_errno("Out of memory? fdopen failed"); >> +#ifdef __TANDEM >> +setbuf(stream,0); >> +#endif > > Reading the commit message, I would have expected someting similar to > > #ifdef FORCE_PIPE_FLUSHES > setbuf(stream,0); > #endif > > (Because other systems may need the tweak as well, some day) > Of course you need to change that in the Makefile and config.mak.uname I actually wouldn't have expected anything like that after reading the commit message. First I thought it was describing only what it does (i.e. "let's use setbuf() to set the stream unbuffered on TANDEM"), which is a useless description that only says what it does which we can read from the diff, but "NonStop by default creates pipe that does not flush" is a potentially useful information the log message adds. But it is just "potentially"---we cannot read what exact problem the change is trying to address. Making a pipe totally unbuffered is a heavy hammer that may not be an appropriate solution---it could be that we are missing calls to fflush() where we need and have been lucky because most of the systems we deal with do line-buffered by default, or something silly/implausible like that, and if that is the case, a more proper fix may be to add these missing fflush() to right places. IOW, I do not see it explained clearly why this change is needed on any single platform---so "that issue may be shared by others, too" is a bit premature thing for me to listen to and understand, as "that issue" is quite unclear to me.
RE: [PATCH v2 0/6] Force pipes to flush immediately on NonStop platform
On January 20, 2018 6:10 AM, Torsten Bögershausen wrote: > On Fri, Jan 19, 2018 at 12:33:59PM -0500, randall.s.bec...@rogers.com > wrote: > > From: "Randall S. Becker"> > > > * wrapper.c: called setbuf(stream,0) to force pipe flushes not enabled by > > default on the NonStop platform. > > > > Signed-off-by: Randall S. Becker > > --- > > wrapper.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/wrapper.c b/wrapper.c > > index d20356a77..671cbb4b4 100644 > > --- a/wrapper.c > > +++ b/wrapper.c > > @@ -403,6 +403,9 @@ FILE *xfdopen(int fd, const char *mode) > > FILE *stream = fdopen(fd, mode); > > if (stream == NULL) > > die_errno("Out of memory? fdopen failed"); > > +#ifdef __TANDEM > > + setbuf(stream,0); > > +#endif > > Reading the commit message, I would have expected someting similar to > > #ifdef FORCE_PIPE_FLUSHES > setbuf(stream,0); > #endif > > (Because other systems may need the tweak as well, some day) Of course > you need to change that in the Makefile and config.mak.uname > > > return stream; > > } I can definitely see the point. Would you be agreeable to expanding the scope of this as a separate patch after this one is applied? I would want to bring at least one more Not NonStop machine into the mix for testing, which is more than I can do this weekend . Cheers, Randall
Re: [PATCH v2 0/6] Force pipes to flush immediately on NonStop platform
On Fri, Jan 19, 2018 at 12:33:59PM -0500, randall.s.bec...@rogers.com wrote: > From: "Randall S. Becker"> > * wrapper.c: called setbuf(stream,0) to force pipe flushes not enabled by > default on the NonStop platform. > > Signed-off-by: Randall S. Becker > --- > wrapper.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/wrapper.c b/wrapper.c > index d20356a77..671cbb4b4 100644 > --- a/wrapper.c > +++ b/wrapper.c > @@ -403,6 +403,9 @@ FILE *xfdopen(int fd, const char *mode) > FILE *stream = fdopen(fd, mode); > if (stream == NULL) > die_errno("Out of memory? fdopen failed"); > +#ifdef __TANDEM > + setbuf(stream,0); > +#endif Reading the commit message, I would have expected someting similar to #ifdef FORCE_PIPE_FLUSHES setbuf(stream,0); #endif (Because other systems may need the tweak as well, some day) Of course you need to change that in the Makefile and config.mak.uname > return stream; > } > > -- > 2.16.0.31.gf1a482c >
Re: [PATCH v2 0/6] Force pipes to flush immediately on NonStop platform
On Fri, Jan 19, 2018 at 9:33 AM,wrote: > From: "Randall S. Becker" > > * wrapper.c: called setbuf(stream,0) to force pipe flushes not enabled by > default on the NonStop platform. > Due to my review of a previous patch, I now know about the __TANDEM directive and why we use this to guard this line. :) The setbuf(0) is easier than sprinkling (guarded) flushes all over the code, so that seems like a clean solution, which we currently don't use. (it occurred twice in history, see eac14f8909 (Win32: Thread-safe windows console output, 2012-01-14) for its introduction and fcd428f4a9 (Win32: fix broken pipe detection, 2012-03-01) for its deletion). > Signed-off-by: Randall S. Becker > --- > wrapper.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/wrapper.c b/wrapper.c > index d20356a77..671cbb4b4 100644 > --- a/wrapper.c > +++ b/wrapper.c > @@ -403,6 +403,9 @@ FILE *xfdopen(int fd, const char *mode) > FILE *stream = fdopen(fd, mode); > if (stream == NULL) > die_errno("Out of memory? fdopen failed"); > +#ifdef __TANDEM > + setbuf(stream,0); My man page tells me the second arg is a pointer, so we'd prefer NULL instead? Thanks, Stefan > +#endif > return stream; > } > > -- > 2.16.0.31.gf1a482c >