Hi Florian, our mails crossed... I think I am fine now with posix_spawn(), provided we do enough testing.
But I'll answer your questions inline. On Mon, Oct 22, 2018 at 9:00 PM Florian Weimer <fwei...@redhat.com> wrote: > > * Thomas Stüfe: > > > So far I have not read a single technical reason in this thread why > > vfork needs to be abandoned now - apart from it being obsolete. If you > > read my initial thread from September, you know that I think we have > > understood vfork's shortcomings very well, and that our (SAPs) > > proposed patch shows that they can be dealt with. In our port, our > > vfork+exec*2 is solid since many years, without any issues. > > The main problem for vfork in application code is that you need to *all* > disable signals, even signals used by the implementation. If a signal > handler runs by accident while the vfork is active, memory corruption is > practically guaranteed. The only way to disable the signals is with a > direct system call; sigprocmask/pthread_sigmask do not work. > > Does your implementation do this? I understand. No, admittedly not. But we squeeze the vulnerable time window to the minimal possible: if (vfork() == 0) exec(..); which was a large step forward from the stock Ojdk solution. While not completely bullet proof, I saw not a single instance of an error in all these years (I understand those errors would be very intermittent and difficult to attribute to vfork+signalling, so we may have missed some). > > > The current posix_spawn() implementation was added to glibc with glibc > > 2.24. So, what was the state of posix_spawn() before that version? Is > > it safe to use, does it do the right thing, or will we encounter > > regressions? > > It uses fork by default. It can be told to use vfork, via > POSIX_SPAWN_USEVFORK, but then it is buggy. For generic JDK code, this > seems hardly appropriate. Are you sure about this? The coding I saw in glibc < 2.24 was that it would use vfork if both attributes and file actions were NULL, which should be the case with the OpenJDK and jspawnhelper. fork() would be bad and a reason not to use posix_spawn(). > > > My Ubuntu 16.04 box runs glibc 2.23. Arguably, Ubuntu 16.04 is quite a > > common distro. I have to check our machines at work, but I am very > > sure that our zoo of SLES and RHEL servers do not all run glibc>=2.24, > > especially on the more exotic architectures. > > In glibc, the vfork-based performance does not bring in any new ABIs, so > it is in theory backportable. The main risk is that the vfork > optimization landed in glibc 2.24, and the PID cache was removed in > glibc 2.25. vfork with the PID cache was really iffy, but I would not > recommend to backport the PID cache removal. But Debian 9/stretch uses > glibc 2.24, and I think that shows that the vfork optimization with the > PID cache should be safe enough. (Of course you need to remove the > assert that fires if the vfork does not actually stop the parent process > and is implemented as a fork; the glibc implementation still works, but > with somewhat degraded error checking.) > > How far back would you want to see this changed? Debian jessie and Red > Hat Enterprise Linux 6 would be rather unlikely. If you want to target > those, your only chance is to essentially duplicate the glibc > implementation in OpenJDK. As I wrote before, if I understand the coding in glibc between 2.4 and 2.24 correctly, I think it uses vfork() and that should be fine by me: posix_spawn() using vfork(), with no attributes/file actions and in conjunction with the jspawnhelper, is almost exactly the same as the proposed vfork() + exec*2 patch: posix_spawn() will exec() immediately after the vfork(), then, in jspwnhelper, we set up the new process and exec() again. So I am fine with that. Provided I have understood all that stuff correctly and not made a thinking error somewhere. Cheers, Thomas > > Thanks, > Florian