On Sep 6, 9:00 am, Simon King <simon.k...@uni-jena.de> wrote:
> sage: oddprime_factors.precompute(range(1,100), 4)
> [Errno 4] Interrupted system call
[...]
> Since precompute() launches a parallel computation, I could imagine that
> the interrupted system call is related with that. But I am no expert.

OK, you're ending up in sage.parallel.use_fork.p_iter_fork.__call__:

        try:
            while len(v) > 0 or len(workers) > 0:
...

        except Exception, msg:
            print msg
            sys.stdout.flush()

        finally:
...
            if len(workers) > 0:
                print "Killing any remaining workers..."

As you can see, the only way to end up with a "Killing ..." message is
if there are workers left, which means you're exiting the try
via the except. So, the message "[Errno 4] ..." should be originating
from the print message there (you might want to  verify that).

Given that your cache is empty afterwards, it looks like no yield has
completed successfully. The code does mess with SIGALRM which
could lead to interrupted system calls. Especially the line "X =
load(sobj, compress=False)" would be vulnerable to EINTR. However,
the code does seem to try to turn off `SIGALRM`. Perhaps something
slips through? The involvement of gdb could really just be that
things get slowed down enough that a SIGALRM arrives at an inopportune
moment.

> And perhaps a decisive question: Do the parallel computations have
> anything to do with weak references? Are instances of
> UniqueRepresentation involved? Or UniqueFactory? These are changed by my
> patches.

Not that I could find. However, the first thing the subprocess does is
invalidate all expect interfaces:

sage/parallel/use_fork.py:213

        try
...
            sys.stdout = open(out, 'w')
...
            if self.reset_interfaces:
                sage.interfaces.quit.invalidate_all()
...
        except Exception, msg:
            # Important to print this, so it is seen by the caller.
            print msg
        finally:
            sys.stdout.flush()
            os._exit(0)

We've seen before that invalidated expect interfaces can interact
unexpectedly with weakreffed stuff in entirely unrelated places.
However, the more important thing here is that stdout in the
subprocess gets redirected and that error messages are sent there (and
not stderr). The code in the parent does read that output and under
certain conditions print it, but an exception in the right place
might prevent that from happening.

If you want to get to the bottom of this one, I think you might want
to instrument "use_fork.py" a little to see what's happening. A
very straightforward one to try is to see if disabling the line

     signal.alarm(max(int(timeout - (walltime()-oldest)), 1))

solves your problem (not that you'll get any absolutes on a
heisenbug ...)

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To post to this group, send email to sage-devel@googlegroups.com.
To unsubscribe from this group, send email to 
sage-devel+unsubscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel?hl=en.


Reply via email to