Using the Django ORM in a long-running worker process

Daniel Tao Mon, 15 Aug 2016 09:28:26 -0700

Hi folks,

My team has built a service comprising 3 main parts, a web application and 
2 long-running worker processes that process events from a message 
exchange. All of these components interact with the same database and use 
the same underlying Django "app" for ORM models (i.e. the 2 worker 
processes call django.setup() on initialization).

We've had some issues with the worker processes failing to recover in the
face of DB connectivity issues. For example at one point Amazon restarted
our DB (it's an RDS instance) and the workers started flailing, repeatedly
raising the same exceptions despite the DB coming back online. Later on we
discovered that we could fix this particular issue by calling
django.db.connection.close() when this exception occurred (it happened to
be InterfaceError); on the next attempt to interact w/ the DB Django would
establish a new connection to the DB and everything would continue to work.
More recently a new error occurred that caused a similar problem, leading
us to speculate that we should do the same thing in this case with this new
type of exception (I think now it's OperationalError because the DB went
into "recovery mode" or something).

We are now planning on refactoring this service a bit so that instead of
attempting to recover from exceptions, we'll just terminate the process and
configure an external agent to automatically restart in the face of
unexpected errors. This feels like a safer design than trying to figure out
every exception type we should be handling. However I wanted to reach out
to the Django group as a sanity check to see if we're missing something
more basic. From browsing various tickets in Django's issue tracker I've
gotten the impression that we may be swimming upstream a little bit as
Django is designed as a web framework and relies on DB connections being
closed or returned to a pool or something automatically at the end of the
request cycle, not held open by a single loop in a long-running process. Is
there something special we should be doing in these worker processes? A
special Django setting perhaps? Should we just be calling
connection.close() after processing each event? Should we not be using
Django at all in this case?

I think the pessimistic kill-and-restart strategy we've decided upon for
now will work, but any guidance here to ensure we aren't fighting against
our own framework would be much appreciated.

Dan Tao

--
You received this message because you are subscribed to the Google Groups
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-users/3c428718-af67-4beb-af20-36aaede71969%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Using the Django ORM in a long-running worker process

Reply via email to