The proposal to merge lp:~jtv/launchpad/bug-717969 into lp:launchpad has been 
updated.

Description changed to:

= Bug 717969 =

This is an oversized branch.  Don't just pick it up out of a sense of duty; if 
it's too big or you're not certain about the domain-specific stuff, contact one 
of the Soyuz experts — Julian, Steve, or William.

The buildmaster was showing very strange symptoms.  In a nutshell, cases where 
observed where it went through steps A, B, and C; we could see that steps A and 
C were completed successfully; but the expected results from step B were just 
not there.

The cause turns out to be a dangerous mix of global state: our database 
transaction management is global to a thread (all of the thread's changes are 
committed or all are aborted) but our use of twisted makes a single python 
thread/process flit back and forth between servicing multiple logical threads 
of execution.

Each of those threads of execution might, from its point of view: make some 
changes to ORM-backed objects, then initiate an asynchronous request (talking 
to the librarian or a build slave), block for the result to arrive, make some 
more ORM changes, block again, and later commit or abort the database 
transaction.

That is purely from the point of view of a logical thread of execution.  
Twisted experts are quick to point out that “nothing blocks” in Twisted, but 
that is exactly the problem: while one logical thread of execution “blocks,” 
another gets to execute.  And who's to say whether that other one might not 
commit or abort the first one's changes prematurely?  With the numbers of 
builders we drive, it's bound to happen regularly.

We discussed a few architectural solutions: farming off the ORM changes to a 
worker thread, doing all the work in threads, and so on.  The real problem is 
that the ORM changes are inlined in the buildmaster, but largely hidden away in 
the various build-related class hierarchies.  Twisted is meant for glue between 
systems, not for interleaving complex and dispersed stateful machinery with the 
glue's callback structure.

Ideally, we should separate the “glue” from the “machinery.”  But it's complex 
and fragile, so the next step from here is to get the code under control to the 
point where we can reason reliably about it.  Once we know more about what the 
code really does, we'll have more freedom to reorganize it.

This branch represents that next step from here: make the buildmaster run in 
read-only transaction policy.  It will not be able to make any changes to the 
database (or ORM-backed objects), except in sections of code that are 
explicitly wrapped in read-write transaction policies.  Now we know exactly 
where the buildmaster may change the database — doing it anywhere else would be 
an error.  We keep the read-write sections absolutely minimal, and try to avoid 
method calls where Twisted context switches might hide.  Unfortunately neither 
Twisted nor Storm seems to have a facility for forbidding them altogether.

Every read-write section commits immediately.  That means more fine-grained 
state changes.  I can't be 100% certain that early commits will not produce any 
unwanted object states, and I can't be 100% certain that the read-write 
policies cover all code paths that need them.  But we've run several types of 
build through the patched build farm, and we've stress-tested it against about 
5 concurrent builders working all-out.  As expected, we found code that still 
lacked an explicit read-write policy — but only one of them.  There may be 
more, but only production use will find them all.

For Q/A, we once again perform builds on staging and/or dogfood, of as many 
kinds as we can.  Include concurrent builds, and successes as well as failures. 
 Verify that the build products (tarballs, packages, changes files, build logs) 
all go into the librarian, and that the builds end up in a proper terminal 
state.  The contents of the build logs should be consistent with that end state.

As for tests… sorry, I just run them all!

No lint that I could help, copyrights updated, imports formatted as per policy. 
 Thank you for reading this far.

Jeroen

For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-717969/+merge/83022
-- 
https://code.launchpad.net/~jtv/launchpad/bug-717969/+merge/83022
Your team Launchpad code reviewers is requested to review the proposed merge of 
lp:~jtv/launchpad/bug-717969 into lp:launchpad.

_______________________________________________
Mailing list: https://launchpad.net/~launchpad-reviewers
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~launchpad-reviewers
More help   : https://help.launchpad.net/ListHelp

Reply via email to