The latest issue on the "Problems with NIO" thread (message dated Wed, 25 Feb 2009 23:09:49 +0100) seems to another case of a syscall being interrupted by an internal SIGUSR2 and reporting an error that needs to be caught and the signal retried. There are already several cases where the code "works around" this issue. And *every* occurence of affected syscalls[0] needs to be wrapped in this logic.
The IBM VM doesn't do this kind of wrapping and I assume neither does Sun's since all signals they receive are external and thus raising the exception is the required behaviour. I can't help thinking that replacing the use of SIGUSR2 would be an easier option long term - and better for performance - than wrapping every syscall. >From a selfish classlib perspective, it is wrong to fix these issues in classlib since wrapping syscalls imposes a performance penalty on all VMs (such as Jikes and IBM's) even though they do not need this extra code. It certainly makes the code harder to read, write and maintain. So it would be nice to "fix" it in DRLVM. Can someone explain why DRLVM uses SIGUSR2? And what alternatives are there? Could we not use signals? Other VMs seem to manage without them? How? Could we mask the signal when entering native oode? Regards, Mark. [0] The DRLVM code does use the SA_RESTART flag so some syscalls are automatically restarted but many calls are not. On Linux[1], the most important for us are epoll_wait(2), epoll_pwait(2), poll(2), ppoll(2), select(2), and pselect(2). For precise details, see "Interruption of System Calls and Library Functions by Signal Handlers" in signal(7). [1] The signal(7) man page also says "The details vary across Unix systems" which implies that the calls needing wrapping will be different across different unix systems making porting classlib for DRLVM on non-Linux systems even more difficult. (Of course, this will only be an issue if DRLVM ever gets ported to anything else.)
