Very interesting.

Good Job!

Wish everything working fine on Round1B

On Friday, May 3, 2013 3:30:53 AM UTC+8, Bartholomew Furrow wrote:
> All right, I think I've tracked down the bug!
> 
> 
> First, my apologies to everyone who got an email saying he or she had been 
> disqualified.  That's a nasty surprise.  I'd also like to apologize to the 
> entirely different set of people who then got an email saying "Don't worry, 
> you haven't been disqualified!", which was probably also quite confusing.  I 
> sent an email to all contestants after that, because without knowing what had 
> gone wrong, there was no other way to make sure I reached everyone who had 
> gotten the erroneous email.
> 
> 
> 
> 
> Since this was a bug that affected our users, it seems only fair that I tell 
> you about it.  If you don't want to read a story about tracking down a 
> difficult-to-reproduce bug, this is a good time to stop reading.
> 
> 
> 
> 
> The backgroundCode Jam runs on App Engine.  The email-sending service is 
> something like a hand-rolled MapReduce, implemented one summer by an 
> excellent intern who should feel free to chime in and take credit.  It's 
> supposed to work this way:
> 
> 
> - An admin sets up a "notification" object, which gets written to the App 
> Engine datastore and to memcache.  It has a time at which it will 
> automatically start sending.
> - Emails will be sent to the set of all users who were eligible to 
> participate in a given contest, but has a list of "filters" that restrict 
> that set, which are stored in the notification's "unprocessed_filters" field. 
>  For example, a filter could require that the field "rule_breaker" appears in 
> a user's scoreboard row, or it could require that user's rank be <= 1000.
> 
> 
> - When that time arrives, the first thing that happens is that filters are 
> "processed".  "Rank <= 1000" becomes "(Points, Time) >= (36, 45:50)". The 
> other filter remains unchanged. Both are stored in the "processed_filters" 
> field of the notification, and the modified "notification" object gets put 
> back into memcache and datastore.
> 
> 
> - Mappers go and fetch the "notification" object from memcache (or, if that 
> fails, datastore) and send email accordingly.
> 
> 
> One more piece of background: our library that stores objects in 
> memcache+datastore uses a local cache, which is supposed to be cleared at the 
> start of every request.
> 
> 
> 
> 
> What happened
> The library that stores objects in memcache+datastore wasn't cleared at the 
> start of the Mapper jobs.  On App Engine, a series of requests automatically 
> gets sent to various machines.  So if the following series of events takes 
> place:
> 
> 
> - Machine A sets up the notification, and caches it locally.
> - Machine B does the filter processing.
> - Machine A acts as a Mapper, without clearing the local cache.
> 
> 
> 
> 
> ...then Machine A will look in its local cache, and find the old version of 
> the notification: the one that has unprocessed filters, but no processed 
> filters.  It will think, "Should I send this email to user X?  Well, I don't 
> have any filters, so I should."
> 
> 
> 
> 
> This bug was really tough to track down.  I disabled mail-sending, added a 
> lot of debug logging, and then wrote a bunch of test emails, then sent them 
> off one by one; it took me ten tries to get one that failed in this 
> particular way.  Once I realized that the field was unset, I started 
> wondering about whether App Engine's datastore or memcache could be giving 
> old versions of the data; but then I remembered the local cache.  I checked 
> the logs to see whether when multiple Mappers failed in the same MapReduce, 
> it was always on the same machine; and it was.
> 
> 
> 
> 
> The immediate fix
> 1. Go through all request handlers and make sure our local cache is cleared 
> properly.  Add a comment to those lines saying that they must appear in new 
> handlers.  I'm fairly confident that will prevent this from happening again, 
> because I expect people to do a certain amount of copy/pasting when they make 
> new handlers.
> 
> 
> 2. Add an assertion, before any filters are evaluated, to make sure 
> len(notification["processed_filters"]) == 
> len(notification["unprocessed_filters"]).  Now even if another bug occurs 
> that causes filters not to be propagated, this should catch it.
> 
> 
> 
> 
> I hope this was interesting!  Expect the final emails from Round 1A today, 
> and please let me know if you got the wrong one. :-)
> - Bartholomew

-- 
You received this message because you are subscribed to the Google Groups 
"Google Code Jam" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-code+unsubscr...@googlegroups.com.
To post to this group, send email to google-code@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-code/-/qkvprNJTldUJ.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to