Hi folks,

My team has built a service comprising 3 main parts, a web application and 
2 long-running worker processes that process events from a message 
exchange. All of these components interact with the same database and use 
the same underlying Django "app" for ORM models (i.e. the 2 worker 
processes call django.setup() on initialization).

We've had some issues with the worker processes failing to recover in the 
face of DB connectivity issues. For example at one point Amazon restarted 
our DB (it's an RDS instance) and the workers started flailing, repeatedly 
raising the same exceptions despite the DB coming back online. Later on we 
discovered that we could fix this particular issue by calling 
django.db.connection.close() when this exception occurred (it happened to 
be InterfaceError); on the next attempt to interact w/ the DB Django would 
establish a new connection to the DB and everything would continue to work. 
More recently a new error occurred that caused a similar problem, leading 
us to speculate that we should do the same thing in this case with this new 
type of exception (I think now it's OperationalError because the DB went 
into "recovery mode" or something).

We are now planning on refactoring this service a bit so that instead of 
attempting to recover from exceptions, we'll just terminate the process and 
configure an external agent to automatically restart in the face of 
unexpected errors. This feels like a safer design than trying to figure out 
every exception type we should be handling. However I wanted to reach out 
to the Django group as a sanity check to see if we're missing something 
more basic. From browsing various tickets in Django's issue tracker I've 
gotten the impression that we may be swimming upstream a little bit as 
Django is designed as a web framework and relies on DB connections being 
closed or returned to a pool or something automatically at the end of the 
request cycle, not held open by a single loop in a long-running process. Is 
there something special we should be doing in these worker processes? A 
special Django setting perhaps? Should we just be calling 
connection.close() after processing each event? Should we not be using 
Django at all in this case?

I think the pessimistic kill-and-restart strategy we've decided upon for 
now will work, but any guidance here to ensure we aren't fighting against 
our own framework would be much appreciated.

Dan Tao

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/3c428718-af67-4beb-af20-36aaede71969%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to