Wow, sometimes it really makes sense to read apparently unrelated email-threads on Friday afternoon:)
It seems that AIX/HPUX have exactly the same problem like MacOS X. >From the AIX setjmp man-page: "The setjmp and longjmp subroutines save and restore the signal mask sigmask (2), while _setjmp and _longjmp manipulate only the stack context." >From the HPUXM setjmp man-page: "setjmp(), longjmp() These always save and restore the signal mask. _setjmp(), _longjmp() These never manipulate the signal mask." I think it doesn't make sense for you to wait pushing this until I provide the corresponding AIX patches because anyway I'll have to provide a fix not only for this issue but also for the other bugs you mentioned (i.e. 8023786 and 8023720). So I'll better create a new bug for AIX. Thank you and best regards, Volker On Fri, May 16, 2014 at 5:38 PM, David DeHaven <david.deha...@oracle.com> wrote: > > Thanks! > > -DrD- > >> The splashscreen changes look fine to me. Approved. >> >> -- >> best regards, >> Anthony >> >> On 5/16/2014 7:18 PM, David DeHaven wrote: >>> >>> Could someone on AWT team approve the splashscreen changes? >>> >>> -DrD- >>> >>>> Approved. >>>> >>>> -phil. >>>> >>>> On 5/15/2014 9:31 AM, David DeHaven wrote: >>>>> Ping! >>>>> >>>>> Does this look OK? >>>>> >>>>> I've also filed an issue against JavaFX: >>>>> https://javafx-jira.kenai.com/browse/RT-37125 >>>>> >>>>> -DrD- >>>>> >>>>>>>>> I tried not modifying libpng but still ended up with lingering >>>>>>>>> references to longjmp in pngread.o, despite libpng having >>>>>>>>> png_ptr->longjmp_fn (bug in libpng?). pngread.c calls setjmp to set a >>>>>>>>> default location to jump to in case the caller doesn't call setjmp, >>>>>>>>> so if we continue down this path something in libpng must be >>>>>>>>> modified. The only other option is to create our own setjmp.h and >>>>>>>>> order it before /usr/include/setjmp.h, which seems dubious at best. >>>>>>>>> >>>>>>>>> I'm curious if the libpng changes are even needed since it's only >>>>>>>>> used for splashscreen, which happens very early in the launch >>>>>>>>> process. Also note that we didn't originally even call >>>>>>>>> png_set_longjmp_fn, so any error should have resulted in an abort() >>>>>>>>> instead of a call to longjmp... it appears we could retain the >>>>>>>>> functionality we have today and #undef PNG_SETJMP_SUPPORTED >>>>>>>>> (pngconf.h?). That would put the onus on developers to make sure >>>>>>>>> their pngs don't have errors in them, or libsplashscreen will >>>>>>>>> abort()... >>>>>>>>> >>>>>>>>> >>>>>>>> That's an interesting question and the answer might extend to the >>>>>>>> splashscreen changes too. >>>>>>>> Its platform specific code and on MAC, the thread is created using >>>>>>>> pthreads directly and that >>>>>>>> thread goes away once splashscreen is done. But its running at the >>>>>>>> same time as the VM >>>>>>>> is booting up and creating threads and setting their signal masks. So >>>>>>>> I don't think you can >>>>>>>> guarantee that it won't mess up the masks on the JRE threads if the >>>>>>>> PNG is bad. And I'm >>>>>>>> also not sure you want to remove error handling from the library >>>>>>>> either. >>>>>>>> So a HIGHLY VISIBLE DO NOT REMOVE comment might be the best you can do >>>>>>>> here. >>>>>>> I have a better idea: >>>>>>> >>>>>>> png_default_error is the only place where png_longjmp is called. We >>>>>>> could call png_set_error_fn to set up our own error handler (for Mac >>>>>>> only), compile with PNG_SETJMP_SUPPORTED unset so it doesn't pull in >>>>>>> setjmp/longjmp and our own implementation of the error handler would >>>>>>> call _longjmp, which would jump back to where we call setjmp currently. >>>>>> Ok, I figured out what's going on. It's not quite intuitive... >>>>>> >>>>>> png_jmpbuf is a macro defined in png.h, this calls png_set_longjmp_fn >>>>>> with longjmp, which is why I was seeing references to longjmp in the >>>>>> object file. That's what was throwing me off as it seems like it should >>>>>> only be getting the jmp_buf ptr stored in the png_ptr. I guess the >>>>>> intention was that setjmp/longjmp was optional, if you don't call setjmp >>>>>> then it just abort()s. >>>>>> >>>>>> >>>>>> I changed splashscreen_png.c to: >>>>>> #ifdef __APPLE__ >>>>>> if (_setjmp(png_set_longjmp_fn(png_ptr, _longjmp, sizeof(jmp_buf)))) { >>>>>> #else >>>>>> if (setjmp(png_jmpbuf(png_ptr))) { >>>>>> #endif >>>>>> >>>>>> and it calls _longjmp instead. I verified this works by changing the >>>>>> macro to set png_longjmp to exit() and without the above change it does >>>>>> indeed exit prematurely with a bad png, with the change it reports the >>>>>> error but continues to load the application as would be expected. >>>>>> >>>>>> pngread.o still has a symbol table entry for _longjmp instead of >>>>>> __longjmp, but it's benign since we're ultimately forcing it to use the >>>>>> correct function. So I've left libpng completely unchanged. >>>>>> >>>>>> >>>>>> With the change and using a bad png for splashscreen, I was able to get >>>>>> a stack trace once the application was running. Without the change to >>>>>> splashscreen_png.c, jstack was unable to connect to the process. So >>>>>> splashscreen absolutely can interfere with the signal handling. >>>>>> >>>>>> >>>>>> Updated webrev: >>>>>> http://cr.openjdk.java.net/~ddehaven/8026385/jdk.1/ >>>>>> >>>>>> I can look into writing a regression test for this. It might not be >>>>>> trivial though since we're dealing with signal handlers, and if timing >>>>>> is a factor the test may not be reliable. >>>>>> >>>>>> -DrD- >>>>>> >>>> >>> >