[kudu-CR] Kudu-2208 Avoid system call interruption in Subprocess
Jeffrey F. Lukman has posted comments on this change. ( http://gerrit.cloudera.org:8080/9011 ) Change subject: Kudu-2208 Avoid system call interruption in Subprocess .. Patch Set 1: (12 comments) http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc File src/kudu/util/subprocess-test.cc: http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@301 PS1, Line 301: thd > nit: either abbreviate to just 't' or 'thr' would be more in line with what Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@302 PS1, Line 302: 0.5 > nit: why not just '1'? Using non-integer numbers for /bin/sleep is conside Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@303 PS1, Line 303: subprocessThread > nit: we use snake_case style in C++ Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@305 PS1, Line 305: ASSERT_OK(p.Start()); > I think adding a short sleep here (eg 50ms) is a good idea to ensure that t Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@310 PS1, Line 310: // create signal > you aren't creating a signal here, but rather setting up a signal handler Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@317 PS1, Line 317: // send at most 10 kill signals to subprocess > why at most 10? Why not loop until we see that the subprocess has exited? I Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@319 PS1, Line 319: int signalCounter = 0; : bool hasError = false; : > nit: snake_case Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@322 PS1, Line 322: if (signalCounter > 10 || hasError) break; > why not just put this into the while condition above? Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@325 PS1, Line 325: hasError = pthread_kill(thd, SIGUSR2) == 0; > do we expect any errors from this? if not, could we just ASSERT this? Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@330 PS1, Line 330: SleepFor(MonoDelta::FromMilliseconds(10)); > why sleep in between? if our goal is to provoke races we want to send as ma I added sleep time here to avoid undefined signal 2 exception. http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess.cc File src/kudu/util/subprocess.cc: http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess.cc@67 PS1, Line 67: // Retry on EINTR for functions like read() that return -1 on error. > can we move this function to a utility header since it's used in a few plac Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess.cc@69 PS1, Line 69: == true > nit: can we drop this extra? Done -- To view, visit http://gerrit.cloudera.org:8080/9011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I66ab2ec391eced5ea1c37d4294d151fe03c7d586 Gerrit-Change-Number: 9011 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. Lukman Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 12 Jan 2018 19:30:59 + Gerrit-HasComments: Yes
[kudu-CR] Kudu-2208 Avoid system call interruption in Subprocess
Jeffrey F. Lukman has abandoned this change. ( http://gerrit.cloudera.org:8080/9011 ) Change subject: Kudu-2208 Avoid system call interruption in Subprocess .. Abandoned replaced with a new patch. -- To view, visit http://gerrit.cloudera.org:8080/9011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I66ab2ec391eced5ea1c37d4294d151fe03c7d586 Gerrit-Change-Number: 9011 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. Lukman Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Kudu-2208 Avoid system call interruption in Subprocess
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9011 ) Change subject: Kudu-2208 Avoid system call interruption in Subprocess .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc File src/kudu/util/subprocess-test.cc: http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@26 PS1, Line 26: #include nit: move this header into the C-headers-group along with the header. http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@302 PS1, Line 302: 0.5 nit: why not just '1'? Using non-integer numbers for /bin/sleep is considered as a non-portable extension. http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@314 PS1, Line 314: sigaction It would be nice to assert that sigaction() returns success. http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess.cc File src/kudu/util/subprocess.cc: http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess.cc@69 PS1, Line 69: == true nit: can we drop this extra? -- To view, visit http://gerrit.cloudera.org:8080/9011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I66ab2ec391eced5ea1c37d4294d151fe03c7d586 Gerrit-Change-Number: 9011 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. Lukman Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 12 Jan 2018 07:49:12 + Gerrit-HasComments: Yes
[kudu-CR] Kudu-2208 Avoid system call interruption in Subprocess
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9011 ) Change subject: Kudu-2208 Avoid system call interruption in Subprocess .. Patch Set 1: (10 comments) I think it's also worth testing Subprocess::Call which seems to have some bugs as well http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc File src/kudu/util/subprocess-test.cc: http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@301 PS1, Line 301: thd nit: either abbreviate to just 't' or 'thr' would be more in line with what we do elsewhere. http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@303 PS1, Line 303: subprocessThread nit: we use snake_case style in C++ http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@305 PS1, Line 305: ASSERT_OK(p.Start()); I think adding a short sleep here (eg 50ms) is a good idea to ensure that the signals being sent from the other thread have a chance to race against Start() and not just Wait() http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@310 PS1, Line 310: // create signal you aren't creating a signal here, but rather setting up a signal handler http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@317 PS1, Line 317: // send at most 10 kill signals to subprocess why at most 10? Why not loop until we see that the subprocess has exited? I think there is a getter like 'joinable()' that you could use in a while loop. Sending only 10 leaves open a possibility that we dont' cover all the potential races http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@319 PS1, Line 319: int signalCounter = 0; : bool hasError = false; : nit: snake_case http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@322 PS1, Line 322: if (signalCounter > 10 || hasError) break; why not just put this into the while condition above? http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@325 PS1, Line 325: hasError = pthread_kill(thd, SIGUSR2) == 0; do we expect any errors from this? if not, could we just ASSERT this? http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@330 PS1, Line 330: SleepFor(MonoDelta::FromMilliseconds(10)); why sleep in between? if our goal is to provoke races we want to send as many signals as we can as fast as possible. http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess.cc File src/kudu/util/subprocess.cc: http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess.cc@67 PS1, Line 67: // Retry on EINTR for functions like read() that return -1 on error. can we move this function to a utility header since it's used in a few places now? maybe os-util.h -- To view, visit http://gerrit.cloudera.org:8080/9011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I66ab2ec391eced5ea1c37d4294d151fe03c7d586 Gerrit-Change-Number: 9011 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 12 Jan 2018 00:03:45 + Gerrit-HasComments: Yes
[kudu-CR] Kudu-2208 Avoid system call interruption in Subprocess
Jeffrey F. Lukman has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9011 Change subject: Kudu-2208 Avoid system call interruption in Subprocess .. Kudu-2208 Avoid system call interruption in Subprocess This patch includes additional unit test to detect this bug. To fix the issue, this patch added RETRY_ON_EINTR in Subprocess::DoWait() if the waitpid() return -1. Change-Id: I66ab2ec391eced5ea1c37d4294d151fe03c7d586 --- M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 2 files changed, 52 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/9011/1 -- To view, visit http://gerrit.cloudera.org:8080/9011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I66ab2ec391eced5ea1c37d4294d151fe03c7d586 Gerrit-Change-Number: 9011 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. Lukman