%% "CARTER-HITCHIN, David, GBM" <[EMAIL PROTECTED]> writes:
cdg> Hi Paul, >> http://make.paulandlesley.org/jobserver.html >> >> Look at the section "SA-non-RESTARTer?" I've had conversations with >> knowledgeable people who claim that the POSIX guarantee for SA_RESTART >> is not as ironclad as one would assume, and that technically >> Solaris is not violating the letter of the spec, so... cdg> This is bad news. cdg> Would it be possible that you (or your knowledgable friends) to cdg> knock up a small test case illustrating this problem? The thing is, it's pretty hard to reproduce in a reliable way. You basically have to get the timing exactly right so that the signal comes in right when the system call is running. Not easy to do. However, you really don't need to reproduce this: the situation is well known and understood by Sun; there's no question about what the behavior is. However Sun maintains that it works as designed and expected and that the behavior is allowed by the relevant standards, and as far as I know, they aren't interested in changing it. You can find more info here including posts by Casper Dik, who works for Sun and knows a ton about these issues. Just ignore Rev. Don Kool when he starts spouting his usual inane drivel: http://groups.google.com/group/comp.unix.solaris/browse_thread/thread/d6e3339bd36504c8/a162a5cd7ff45340?lnk=st&q=SA_RESTART+solaris+make&rnum=2&hl=en#a162a5cd7ff45340 http://groups.google.com/group/comp.unix.solaris/browse_thread/thread/698f23c99f7532e0/a20dfa1b940b5e63?lnk=st&q=SA_RESTART+solaris+make&rnum=3&hl=en#a20dfa1b940b5e63 I think there's even a case mentioned by Paul Eggert that he filed with Sun that you can reference (although he says it was closed). cdg> Having said all that, if there are other systems that do not cdg> implement SA_RESTART properly, then I guess it is safer sticking cdg> with 'defensive coding' techniques. Nevertheless it would be cdg> still worth getting Sun to fix their O/S, as this might be cdg> causing problems for other apps. There are others, indeed. I've implemented a good bit of "defensive coding", especially in the obvious areas. However, it's simply not possible to defensively code around every possible system call that might fail: many are hidden behind normal C runtime functions (printf() etc. for example). I would be VERY interested in hearing from people using GNU make 3.81 in massively parallel situations (however, the parallelism has to be limited; using "-j" with no limit won't use the jobserver at all so it won't show this problem) about how often they still see these sorts of failures. -- ------------------------------------------------------------------------------- Paul D. Smith <[EMAIL PROTECTED]> Find some GNU make tips at: http://www.gnu.org http://make.paulandlesley.org "Please remain calm...I may be mad, but I am a professional." --Mad Scientist _______________________________________________ Help-make mailing list [email protected] http://lists.gnu.org/mailman/listinfo/help-make
