On 20 maj 2014, at 10:34, Volker Simonis <volker.simo...@gmail.com> wrote:
> Hi everybody, > > I've analyzed this issue on AIX (and HPUX) and came to the conclusion > that it is not a problem on these platforms. > > Just to make sure I got everything right, I'll summarize my > understanding of the problem here: > > On MacOS, setjmp() saves the signal mask of the current thread BUT > restores the signal mask of the whole process after the corresponding > longjmp(). This obviously changes the signal mask of all threads which > had a different signal mask than the initial thread who called > setjmp(). I could easily reproduce this by loading a class which > provokes a verification exception in the un-fixed libverify.so. > Afterwards, the VM-thread won't listen to SIGQUIT anymore because it > now masks this signal in the same way like the Java thread which had > the verification exception. > > On AIX/HPUX setjmp()/longjmp() is defined in the same way like on > MacOS in that it saves and restores the signal mask (and both > platforms provide _setjmp()/_longjmp() versions as well which do not > manipulate the signal mask). But while it is not clear from the > documentation, all my tests showed that longjmp() only restores the > signal mask of the current thread, which should be OK (i.e. I couldn't > see a blocked VM thread which doesn't react on SIQUIT anymore). > > So to cut a long story short - no action seems to be required to fix > this issue on AIX: > > As a side note I wonder why we didn't use sigsetjmp()/siglongjmp() > with a zero 'savemask' argument to fix this problem? It is > standardized by POSIX and would avoid the usage of "#ifdef __APPLE”. It’s not clear to me if sigsetjmp()/siglongjmp() sets the signal mask on the process or on the pthread. The problem on OS X was that the VM sets up the sigprocmask on the pthreads (using pthread_sigmask()), but longjmp() restores the mask for the process (using sigprocmask()). /Staffan > > Thank you and best regards, > Volker > > > On Fri, May 16, 2014 at 6:04 PM, David DeHaven <david.deha...@oracle.com> > wrote: >> >> I'd thought about that, since Apple borrowed most of it's underpinnings from >> BSD, but had no evidence to suggest it was necessary. >> >> Anyways, at least you've identified and can rectify the issue :) >> >> -DrD- >> >>> Wow, sometimes it really makes sense to read apparently unrelated >>> email-threads on Friday afternoon:) >>> >>> It seems that AIX/HPUX have exactly the same problem like MacOS X. >>> From the AIX setjmp man-page: >>> >>> "The setjmp and longjmp subroutines save and restore the signal mask >>> sigmask (2), while _setjmp and _longjmp manipulate only the stack >>> context." >>> >>> From the HPUXM setjmp man-page: >>> >>> "setjmp(), longjmp() These always save and restore the signal mask. >>> _setjmp(), _longjmp() These never manipulate the signal mask." >>> >>> I think it doesn't make sense for you to wait pushing this until I >>> provide the corresponding AIX patches because anyway I'll have to >>> provide a fix not only for this issue but also for the other bugs you >>> mentioned (i.e. 8023786 and 8023720). So I'll better create a new bug >>> for AIX. >>> >>> Thank you and best regards, >>> Volker >>> >>> On Fri, May 16, 2014 at 5:38 PM, David DeHaven <david.deha...@oracle.com> >>> wrote: >>>> >>>> Thanks! >>>> >>>> -DrD- >>>> >>>>> The splashscreen changes look fine to me. Approved. >>>>> >>>>> -- >>>>> best regards, >>>>> Anthony >>>>> >>>>> On 5/16/2014 7:18 PM, David DeHaven wrote: >>>>>> >>>>>> Could someone on AWT team approve the splashscreen changes? >>>>>> >>>>>> -DrD- >>>>>> >>>>>>> Approved. >>>>>>> >>>>>>> -phil. >>>>>>> >>>>>>> On 5/15/2014 9:31 AM, David DeHaven wrote: >>>>>>>> Ping! >>>>>>>> >>>>>>>> Does this look OK? >>>>>>>> >>>>>>>> I've also filed an issue against JavaFX: >>>>>>>> https://javafx-jira.kenai.com/browse/RT-37125 >>>>>>>> >>>>>>>> -DrD- >>>>>>>> >>>>>>>>>>>> I tried not modifying libpng but still ended up with lingering >>>>>>>>>>>> references to longjmp in pngread.o, despite libpng having >>>>>>>>>>>> png_ptr->longjmp_fn (bug in libpng?). pngread.c calls setjmp to >>>>>>>>>>>> set a default location to jump to in case the caller doesn't call >>>>>>>>>>>> setjmp, so if we continue down this path something in libpng must >>>>>>>>>>>> be modified. The only other option is to create our own setjmp.h >>>>>>>>>>>> and order it before /usr/include/setjmp.h, which seems dubious at >>>>>>>>>>>> best. >>>>>>>>>>>> >>>>>>>>>>>> I'm curious if the libpng changes are even needed since it's only >>>>>>>>>>>> used for splashscreen, which happens very early in the launch >>>>>>>>>>>> process. Also note that we didn't originally even call >>>>>>>>>>>> png_set_longjmp_fn, so any error should have resulted in an >>>>>>>>>>>> abort() instead of a call to longjmp... it appears we could retain >>>>>>>>>>>> the functionality we have today and #undef PNG_SETJMP_SUPPORTED >>>>>>>>>>>> (pngconf.h?). That would put the onus on developers to make sure >>>>>>>>>>>> their pngs don't have errors in them, or libsplashscreen will >>>>>>>>>>>> abort()... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> That's an interesting question and the answer might extend to the >>>>>>>>>>> splashscreen changes too. >>>>>>>>>>> Its platform specific code and on MAC, the thread is created using >>>>>>>>>>> pthreads directly and that >>>>>>>>>>> thread goes away once splashscreen is done. But its running at the >>>>>>>>>>> same time as the VM >>>>>>>>>>> is booting up and creating threads and setting their signal masks. >>>>>>>>>>> So I don't think you can >>>>>>>>>>> guarantee that it won't mess up the masks on the JRE threads if the >>>>>>>>>>> PNG is bad. And I'm >>>>>>>>>>> also not sure you want to remove error handling from the library >>>>>>>>>>> either. >>>>>>>>>>> So a HIGHLY VISIBLE DO NOT REMOVE comment might be the best you can >>>>>>>>>>> do here. >>>>>>>>>> I have a better idea: >>>>>>>>>> >>>>>>>>>> png_default_error is the only place where png_longjmp is called. We >>>>>>>>>> could call png_set_error_fn to set up our own error handler (for Mac >>>>>>>>>> only), compile with PNG_SETJMP_SUPPORTED unset so it doesn't pull in >>>>>>>>>> setjmp/longjmp and our own implementation of the error handler would >>>>>>>>>> call _longjmp, which would jump back to where we call setjmp >>>>>>>>>> currently. >>>>>>>>> Ok, I figured out what's going on. It's not quite intuitive... >>>>>>>>> >>>>>>>>> png_jmpbuf is a macro defined in png.h, this calls png_set_longjmp_fn >>>>>>>>> with longjmp, which is why I was seeing references to longjmp in the >>>>>>>>> object file. That's what was throwing me off as it seems like it >>>>>>>>> should only be getting the jmp_buf ptr stored in the png_ptr. I guess >>>>>>>>> the intention was that setjmp/longjmp was optional, if you don't call >>>>>>>>> setjmp then it just abort()s. >>>>>>>>> >>>>>>>>> >>>>>>>>> I changed splashscreen_png.c to: >>>>>>>>> #ifdef __APPLE__ >>>>>>>>> if (_setjmp(png_set_longjmp_fn(png_ptr, _longjmp, sizeof(jmp_buf)))) >>>>>>>>> { >>>>>>>>> #else >>>>>>>>> if (setjmp(png_jmpbuf(png_ptr))) { >>>>>>>>> #endif >>>>>>>>> >>>>>>>>> and it calls _longjmp instead. I verified this works by changing the >>>>>>>>> macro to set png_longjmp to exit() and without the above change it >>>>>>>>> does indeed exit prematurely with a bad png, with the change it >>>>>>>>> reports the error but continues to load the application as would be >>>>>>>>> expected. >>>>>>>>> >>>>>>>>> pngread.o still has a symbol table entry for _longjmp instead of >>>>>>>>> __longjmp, but it's benign since we're ultimately forcing it to use >>>>>>>>> the correct function. So I've left libpng completely unchanged. >>>>>>>>> >>>>>>>>> >>>>>>>>> With the change and using a bad png for splashscreen, I was able to >>>>>>>>> get a stack trace once the application was running. Without the >>>>>>>>> change to splashscreen_png.c, jstack was unable to connect to the >>>>>>>>> process. So splashscreen absolutely can interfere with the signal >>>>>>>>> handling. >>>>>>>>> >>>>>>>>> >>>>>>>>> Updated webrev: >>>>>>>>> http://cr.openjdk.java.net/~ddehaven/8026385/jdk.1/ >>>>>>>>> >>>>>>>>> I can look into writing a regression test for this. It might not be >>>>>>>>> trivial though since we're dealing with signal handlers, and if >>>>>>>>> timing is a factor the test may not be reliable. >>>>>>>>> >>>>>>>>> -DrD- >>>>>>>>> >>>>>>> >>>>>> >>>> >>