On 20 maj 2014, at 10:34, Volker Simonis <volker.simo...@gmail.com> wrote:

> Hi everybody,
> 
> I've analyzed this issue on AIX (and HPUX) and came to the conclusion
> that it is not a problem on these platforms.
> 
> Just to make sure I got everything right, I'll summarize my
> understanding of the problem here:
> 
> On MacOS, setjmp() saves the signal mask of the current thread BUT
> restores the signal mask of the whole process after the corresponding
> longjmp(). This obviously changes the signal mask of all threads which
> had a different signal mask than the initial thread who called
> setjmp(). I could easily reproduce this by loading a class which
> provokes a verification exception in the un-fixed libverify.so.
> Afterwards, the VM-thread won't listen to SIGQUIT anymore because it
> now masks this signal in the same way like the Java thread which had
> the verification exception.
> 
> On AIX/HPUX setjmp()/longjmp() is defined in the same way like on
> MacOS in that it saves and restores the signal mask (and both
> platforms provide _setjmp()/_longjmp() versions as well which do not
> manipulate the signal mask). But while it is not clear from the
> documentation, all my tests showed that longjmp() only restores the
> signal mask of the current thread, which should be OK (i.e. I couldn't
> see a blocked VM thread which doesn't react on SIQUIT anymore).
> 
> So to cut a long story short - no action seems to be required to fix
> this issue on AIX:
> 
> As a side note I wonder why we didn't use sigsetjmp()/siglongjmp()
> with a zero 'savemask' argument to fix this problem? It is
> standardized by POSIX and would avoid the usage of "#ifdef __APPLE”.

It’s not clear to me if sigsetjmp()/siglongjmp() sets the signal mask on the 
process or on the pthread. The problem on OS X was that the VM sets up the 
sigprocmask on the pthreads (using pthread_sigmask()), but longjmp() restores 
the mask for the process (using sigprocmask()).


/Staffan


> 
> Thank you and best regards,
> Volker
> 
> 
> On Fri, May 16, 2014 at 6:04 PM, David DeHaven <david.deha...@oracle.com> 
> wrote:
>> 
>> I'd thought about that, since Apple borrowed most of it's underpinnings from 
>> BSD, but had no evidence to suggest it was necessary.
>> 
>> Anyways, at least you've identified and can rectify the issue :)
>> 
>> -DrD-
>> 
>>> Wow, sometimes it really makes sense to read apparently unrelated
>>> email-threads on Friday afternoon:)
>>> 
>>> It seems that AIX/HPUX have exactly the same problem like MacOS X.
>>> From the AIX setjmp man-page:
>>> 
>>> "The setjmp and longjmp subroutines save and restore the signal mask
>>> sigmask (2), while _setjmp and _longjmp manipulate only the stack
>>> context."
>>> 
>>> From the HPUXM setjmp man-page:
>>> 
>>> "setjmp(), longjmp()         These always save and restore the signal mask.
>>> _setjmp(), _longjmp()        These never manipulate the signal mask."
>>> 
>>> I think it doesn't make sense for you to wait pushing this until I
>>> provide the corresponding AIX patches because anyway I'll have to
>>> provide a fix not only for this issue but also for the other bugs you
>>> mentioned (i.e. 8023786 and 8023720). So I'll better create a new bug
>>> for AIX.
>>> 
>>> Thank you and best regards,
>>> Volker
>>> 
>>> On Fri, May 16, 2014 at 5:38 PM, David DeHaven <david.deha...@oracle.com> 
>>> wrote:
>>>> 
>>>> Thanks!
>>>> 
>>>> -DrD-
>>>> 
>>>>> The splashscreen changes look fine to me. Approved.
>>>>> 
>>>>> --
>>>>> best regards,
>>>>> Anthony
>>>>> 
>>>>> On 5/16/2014 7:18 PM, David DeHaven wrote:
>>>>>> 
>>>>>> Could someone on AWT team approve the splashscreen changes?
>>>>>> 
>>>>>> -DrD-
>>>>>> 
>>>>>>> Approved.
>>>>>>> 
>>>>>>> -phil.
>>>>>>> 
>>>>>>> On 5/15/2014 9:31 AM, David DeHaven wrote:
>>>>>>>> Ping!
>>>>>>>> 
>>>>>>>> Does this look OK?
>>>>>>>> 
>>>>>>>> I've also filed an issue against JavaFX:
>>>>>>>> https://javafx-jira.kenai.com/browse/RT-37125
>>>>>>>> 
>>>>>>>> -DrD-
>>>>>>>> 
>>>>>>>>>>>> I tried not modifying libpng but still ended up with lingering 
>>>>>>>>>>>> references to longjmp in pngread.o, despite libpng having 
>>>>>>>>>>>> png_ptr->longjmp_fn (bug in libpng?). pngread.c calls setjmp to 
>>>>>>>>>>>> set a default location to jump to in case the caller doesn't call 
>>>>>>>>>>>> setjmp, so if we continue down this path something in libpng must 
>>>>>>>>>>>> be modified. The only other option is to create our own setjmp.h 
>>>>>>>>>>>> and order it before /usr/include/setjmp.h, which seems dubious at 
>>>>>>>>>>>> best.
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm curious if the libpng changes are even needed since it's only 
>>>>>>>>>>>> used for splashscreen, which happens very early in the launch 
>>>>>>>>>>>> process. Also note that we didn't originally even call 
>>>>>>>>>>>> png_set_longjmp_fn, so any error should have resulted in an 
>>>>>>>>>>>> abort() instead of a call to longjmp... it appears we could retain 
>>>>>>>>>>>> the functionality we have today and #undef PNG_SETJMP_SUPPORTED 
>>>>>>>>>>>> (pngconf.h?). That would put the onus on developers to make sure 
>>>>>>>>>>>> their pngs don't have errors in them, or libsplashscreen will 
>>>>>>>>>>>> abort()...
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> That's an interesting question and the answer might extend to the 
>>>>>>>>>>> splashscreen changes too.
>>>>>>>>>>> Its platform specific code and on MAC, the thread is created using 
>>>>>>>>>>> pthreads directly and that
>>>>>>>>>>> thread goes away once splashscreen is done. But its running at the 
>>>>>>>>>>> same time as the VM
>>>>>>>>>>> is booting up and creating threads and setting their signal masks. 
>>>>>>>>>>> So I don't think you can
>>>>>>>>>>> guarantee that it won't mess up the masks on the JRE threads if the 
>>>>>>>>>>> PNG is bad. And I'm
>>>>>>>>>>> also not sure you want to remove error handling from the library 
>>>>>>>>>>> either.
>>>>>>>>>>> So a HIGHLY VISIBLE DO NOT REMOVE comment might be the best you can 
>>>>>>>>>>> do here.
>>>>>>>>>> I have a better idea:
>>>>>>>>>> 
>>>>>>>>>> png_default_error is the only place where png_longjmp is called. We 
>>>>>>>>>> could call png_set_error_fn to set up our own error handler (for Mac 
>>>>>>>>>> only), compile with PNG_SETJMP_SUPPORTED unset so it doesn't pull in 
>>>>>>>>>> setjmp/longjmp and our own implementation of the error handler would 
>>>>>>>>>> call _longjmp, which would jump back to where we call setjmp 
>>>>>>>>>> currently.
>>>>>>>>> Ok, I figured out what's going on. It's not quite intuitive...
>>>>>>>>> 
>>>>>>>>> png_jmpbuf is a macro defined in png.h, this calls png_set_longjmp_fn 
>>>>>>>>> with longjmp, which is why I was seeing references to longjmp in the 
>>>>>>>>> object file. That's what was throwing me off as it seems like it 
>>>>>>>>> should only be getting the jmp_buf ptr stored in the png_ptr. I guess 
>>>>>>>>> the intention was that setjmp/longjmp was optional, if you don't call 
>>>>>>>>> setjmp then it just abort()s.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I changed splashscreen_png.c to:
>>>>>>>>> #ifdef __APPLE__
>>>>>>>>>  if (_setjmp(png_set_longjmp_fn(png_ptr, _longjmp, sizeof(jmp_buf)))) 
>>>>>>>>> {
>>>>>>>>> #else
>>>>>>>>>  if (setjmp(png_jmpbuf(png_ptr))) {
>>>>>>>>> #endif
>>>>>>>>> 
>>>>>>>>> and it calls _longjmp instead. I verified this works by changing the 
>>>>>>>>> macro to set png_longjmp to exit() and without the above change it 
>>>>>>>>> does indeed exit prematurely with a bad png, with the change it 
>>>>>>>>> reports the error but continues to load the application as would be 
>>>>>>>>> expected.
>>>>>>>>> 
>>>>>>>>> pngread.o still has a symbol table entry for _longjmp instead of 
>>>>>>>>> __longjmp, but it's benign since we're ultimately forcing it to use 
>>>>>>>>> the correct function. So I've left libpng completely unchanged.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> With the change and using a bad png for splashscreen, I was able to 
>>>>>>>>> get a stack trace once the application was running. Without the 
>>>>>>>>> change to splashscreen_png.c, jstack was unable to connect to the 
>>>>>>>>> process. So splashscreen absolutely can interfere with the signal 
>>>>>>>>> handling.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Updated webrev:
>>>>>>>>> http://cr.openjdk.java.net/~ddehaven/8026385/jdk.1/
>>>>>>>>> 
>>>>>>>>> I can look into writing a regression test for this. It might not be 
>>>>>>>>> trivial though since we're dealing with signal handlers, and if 
>>>>>>>>> timing is a factor the test may not be reliable.
>>>>>>>>> 
>>>>>>>>> -DrD-
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> 

Reply via email to