On Sep 6, 9:00 am, Simon King <simon.k...@uni-jena.de> wrote: > sage: oddprime_factors.precompute(range(1,100), 4) > [Errno 4] Interrupted system call [...] > Since precompute() launches a parallel computation, I could imagine that > the interrupted system call is related with that. But I am no expert.
OK, you're ending up in sage.parallel.use_fork.p_iter_fork.__call__: try: while len(v) > 0 or len(workers) > 0: ... except Exception, msg: print msg sys.stdout.flush() finally: ... if len(workers) > 0: print "Killing any remaining workers..." As you can see, the only way to end up with a "Killing ..." message is if there are workers left, which means you're exiting the try via the except. So, the message "[Errno 4] ..." should be originating from the print message there (you might want to verify that). Given that your cache is empty afterwards, it looks like no yield has completed successfully. The code does mess with SIGALRM which could lead to interrupted system calls. Especially the line "X = load(sobj, compress=False)" would be vulnerable to EINTR. However, the code does seem to try to turn off `SIGALRM`. Perhaps something slips through? The involvement of gdb could really just be that things get slowed down enough that a SIGALRM arrives at an inopportune moment. > And perhaps a decisive question: Do the parallel computations have > anything to do with weak references? Are instances of > UniqueRepresentation involved? Or UniqueFactory? These are changed by my > patches. Not that I could find. However, the first thing the subprocess does is invalidate all expect interfaces: sage/parallel/use_fork.py:213 try ... sys.stdout = open(out, 'w') ... if self.reset_interfaces: sage.interfaces.quit.invalidate_all() ... except Exception, msg: # Important to print this, so it is seen by the caller. print msg finally: sys.stdout.flush() os._exit(0) We've seen before that invalidated expect interfaces can interact unexpectedly with weakreffed stuff in entirely unrelated places. However, the more important thing here is that stdout in the subprocess gets redirected and that error messages are sent there (and not stderr). The code in the parent does read that output and under certain conditions print it, but an exception in the right place might prevent that from happening. If you want to get to the bottom of this one, I think you might want to instrument "use_fork.py" a little to see what's happening. A very straightforward one to try is to see if disabling the line signal.alarm(max(int(timeout - (walltime()-oldest)), 1)) solves your problem (not that you'll get any absolutes on a heisenbug ...) -- You received this message because you are subscribed to the Google Groups "sage-devel" group. To post to this group, send email to sage-devel@googlegroups.com. To unsubscribe from this group, send email to sage-devel+unsubscr...@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel?hl=en.