Re: Using the Django ORM in a long-running worker process

Mike Dewhirst Fri, 19 Aug 2016 22:37:08 -0700

We saw a presentation last weekend at pyconau by Andrew Godwin about Django 
Channels and you might like to check the YouTube video. It lets long-running 
processes keep web sockets open in parallel with http requests.


Connected by Motorola

Daniel Tao <daniel....@gmail.com> wrote:

>Thank you for the thorough response, which has proven very helpful, mainly by 
>reinforcing what I was growing to suspect: we've updated the code in our 
>worker processes to close the DB connection after every message, and sure 
>enough we've seen a huge improvement in the form of significantly fewer open 
>DB connections at any given time (where things stood, we were coming 
>dangerously close to hitting the limit set by RDS).
>
>
>To clarify: these aren't really "jobs" as you probably mean. We are very 
>familiar with celery; in fact we use it in our product for background tasks! 
>The worker processes I'm talking about do not receive jobs to execute, they 
>process raw messages in very high volume and are more purpose-built to handle 
>them in a specific way.
>
>
>But that isn't really important or interesting to a general audience. Really I 
>just wanted to say thanks for the tip—it definitely nudged us in the right 
>direction.
>
>On Wednesday, August 17, 2016 at 1:49:22 AM UTC-5, James Schneider wrote:
>
>
>> My team has built a service comprising 3 main parts, a web application and 2 
>> long-running worker processes that process events from a message exchange. 
>> All of these components interact with the same database and use the same 
>> underlying Django "app" for ORM models (i.e. the 2 worker processes call 
>> django.setup() on initialization).
>>
>
>Are the two long term worker instances Django processes, or some other 
>process? If they are Django workers, you're probably not handling jobs 
>correctly.
>
>> We've had some issues with the worker processes failing to recover in the 
>> face of DB connectivity issues. For example at one point Amazon restarted 
>> our DB (it's an RDS instance) and the workers started flailing, repeatedly 
>> raising the same exceptions despite the DB coming back online. Later on we 
>> discovered that we could fix this particular issue by calling 
>> django.db.connection.close() when this exception occurred (it happened to be 
>> InterfaceError); on the next attempt to interact w/ the DB Django would 
>> establish a new connection to the DB and everything would continue to work. 
>> More recently a new error occurred that caused a similar problem, leading us 
>> to speculate that we should do the same thing in this case with this new 
>> type of exception (I think now it's OperationalError because the DB went 
>> into "recovery mode" or something).
>>
>
>You're right. Django is not really designed to be held open in this manner.
>
>> We are now planning on refactoring this service a bit so that instead of 
>> attempting to recover from exceptions, we'll just terminate the process and 
>> configure an external agent to automatically restart in the face of 
>> unexpected errors. This feels like a safer design than trying to figure out 
>> every exception type we should be handling. However I wanted to reach out to 
>> the Django group as a sanity check to see if we're missing something more 
>> basic. From browsing various tickets in Django's issue tracker I've gotten 
>> the impression that we may be swimming upstream a little bit as Django is 
>> designed as a web framework and relies on DB connections being closed or 
>> returned to a pool or something automatically at the end of the request 
>> cycle, not held open by a single loop in a long-running process. Is there 
>> something special we should be doing in these worker processes? A special 
>> Django setting perhaps? Should we just be calling connection.close() after 
>> processing each event? Should we not be using Django at all in this case?
>>
>
>The answer is yes, you can/should use Django, but not to the extent of your 
>current implementation. Your long running jobs should be collected by Django 
>and passed off immediately to a batch processor designed for long-running jobs 
>(although your jobs may not be long-running, it sounds like you are just 
>waiting for incoming job requests).
>
>Celery is a popular choice for batch processing with Django. It has hooks 
>built specifically for Django, and is well documented. It does require a 
>message broker such as Redis or RabbitMQ to keep track of the jobs, though. 
>However, it is designed to work directly with your Django instance, including 
>support for the ORM against your existing database.
>
>http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html
>
>> I think the pessimistic kill-and-restart strategy we've decided upon for now 
>> will work, but any guidance here to ensure we aren't fighting against our 
>> own framework would be much appreciated.
>>
>
>My recommendation would be to investigate a batch processor such as Celery. 
>Depending on the number of jobs you are running, if you have individual jobs 
>running rather than a long process, the chances of a DB restart causing panic 
>are mitigated to just the few jobs that happen to run at that moment. You also 
>have granular control over the failure behavior of individual jobs. Some may 
>be one-shot jobs that simply fail and report, others may retry. 
>
>Also, I would recommend at least coding recovery behavior for the known 
>failure cases. This list may obviously grow over time, but that's what keeps 
>developers employed, right? ;-)
>
>If you do keep a long running process going, I would recommend that you keep a 
>tight grip on the connection state in your loop, maybe even close it from time 
>to time as a sanity check to make sure the DB is really alive. 
>
>You should also have some sort of external network monitoring set up if the 
>application has any sort of value or service expectation. That may include 
>ongoing automated functional testing submitting test jobs, etc.
>
>Preemptively catching DB failures with no production impact is a great way to 
>impress your employer, and make a case to complain to Amazon with trend data. 
>
>-James
>
>-- 
>You received this message because you are subscribed to the Google Groups 
>"Django users" group.
>To unsubscribe from this group and stop receiving emails from it, send an 
>email to django-users+unsubscr...@googlegroups.com.
>To post to this group, send email to django-users@googlegroups.com.
>Visit this group at https://groups.google.com/group/django-users.
>To view this discussion on the web visit 
>https://groups.google.com/d/msgid/django-users/cea41ff1-9d98-4120-9015-994d200f4f90%40googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/sflmv4johcf60w0lhki04m00.1471671393869%40email.android.com.
For more options, visit https://groups.google.com/d/optout.

Re: Using the Django ORM in a long-running worker process

Reply via email to