We saw a presentation last weekend at pyconau by Andrew Godwin about Django Channels and you might like to check the YouTube video. It lets long-running processes keep web sockets open in parallel with http requests.
Connected by Motorola Daniel Tao <[email protected]> wrote: >Thank you for the thorough response, which has proven very helpful, mainly by >reinforcing what I was growing to suspect: we've updated the code in our >worker processes to close the DB connection after every message, and sure >enough we've seen a huge improvement in the form of significantly fewer open >DB connections at any given time (where things stood, we were coming >dangerously close to hitting the limit set by RDS). > > >To clarify: these aren't really "jobs" as you probably mean. We are very >familiar with celery; in fact we use it in our product for background tasks! >The worker processes I'm talking about do not receive jobs to execute, they >process raw messages in very high volume and are more purpose-built to handle >them in a specific way. > > >But that isn't really important or interesting to a general audience. Really I >just wanted to say thanks for the tip—it definitely nudged us in the right >direction. > >On Wednesday, August 17, 2016 at 1:49:22 AM UTC-5, James Schneider wrote: > > >> My team has built a service comprising 3 main parts, a web application and 2 >> long-running worker processes that process events from a message exchange. >> All of these components interact with the same database and use the same >> underlying Django "app" for ORM models (i.e. the 2 worker processes call >> django.setup() on initialization). >> > >Are the two long term worker instances Django processes, or some other >process? If they are Django workers, you're probably not handling jobs >correctly. > >> We've had some issues with the worker processes failing to recover in the >> face of DB connectivity issues. For example at one point Amazon restarted >> our DB (it's an RDS instance) and the workers started flailing, repeatedly >> raising the same exceptions despite the DB coming back online. Later on we >> discovered that we could fix this particular issue by calling >> django.db.connection.close() when this exception occurred (it happened to be >> InterfaceError); on the next attempt to interact w/ the DB Django would >> establish a new connection to the DB and everything would continue to work. >> More recently a new error occurred that caused a similar problem, leading us >> to speculate that we should do the same thing in this case with this new >> type of exception (I think now it's OperationalError because the DB went >> into "recovery mode" or something). >> > >You're right. Django is not really designed to be held open in this manner. > >> We are now planning on refactoring this service a bit so that instead of >> attempting to recover from exceptions, we'll just terminate the process and >> configure an external agent to automatically restart in the face of >> unexpected errors. This feels like a safer design than trying to figure out >> every exception type we should be handling. However I wanted to reach out to >> the Django group as a sanity check to see if we're missing something more >> basic. From browsing various tickets in Django's issue tracker I've gotten >> the impression that we may be swimming upstream a little bit as Django is >> designed as a web framework and relies on DB connections being closed or >> returned to a pool or something automatically at the end of the request >> cycle, not held open by a single loop in a long-running process. Is there >> something special we should be doing in these worker processes? A special >> Django setting perhaps? Should we just be calling connection.close() after >> processing each event? Should we not be using Django at all in this case? >> > >The answer is yes, you can/should use Django, but not to the extent of your >current implementation. Your long running jobs should be collected by Django >and passed off immediately to a batch processor designed for long-running jobs >(although your jobs may not be long-running, it sounds like you are just >waiting for incoming job requests). > >Celery is a popular choice for batch processing with Django. It has hooks >built specifically for Django, and is well documented. It does require a >message broker such as Redis or RabbitMQ to keep track of the jobs, though. >However, it is designed to work directly with your Django instance, including >support for the ORM against your existing database. > >http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html > >> I think the pessimistic kill-and-restart strategy we've decided upon for now >> will work, but any guidance here to ensure we aren't fighting against our >> own framework would be much appreciated. >> > >My recommendation would be to investigate a batch processor such as Celery. >Depending on the number of jobs you are running, if you have individual jobs >running rather than a long process, the chances of a DB restart causing panic >are mitigated to just the few jobs that happen to run at that moment. You also >have granular control over the failure behavior of individual jobs. Some may >be one-shot jobs that simply fail and report, others may retry. > >Also, I would recommend at least coding recovery behavior for the known >failure cases. This list may obviously grow over time, but that's what keeps >developers employed, right? ;-) > >If you do keep a long running process going, I would recommend that you keep a >tight grip on the connection state in your loop, maybe even close it from time >to time as a sanity check to make sure the DB is really alive. > >You should also have some sort of external network monitoring set up if the >application has any sort of value or service expectation. That may include >ongoing automated functional testing submitting test jobs, etc. > >Preemptively catching DB failures with no production impact is a great way to >impress your employer, and make a case to complain to Amazon with trend data. > >-James > >-- >You received this message because you are subscribed to the Google Groups >"Django users" group. >To unsubscribe from this group and stop receiving emails from it, send an >email to [email protected]. >To post to this group, send email to [email protected]. >Visit this group at https://groups.google.com/group/django-users. >To view this discussion on the web visit >https://groups.google.com/d/msgid/django-users/cea41ff1-9d98-4120-9015-994d200f4f90%40googlegroups.com. >For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/sflmv4johcf60w0lhki04m00.1471671393869%40email.android.com. For more options, visit https://groups.google.com/d/optout.

