We saw a presentation last weekend at pyconau by Andrew Godwin about Django 
Channels and you might like to check the YouTube video. It lets long-running 
processes keep web sockets open in parallel with http requests.

Connected by Motorola

Daniel Tao <[email protected]> wrote:

>Thank you for the thorough response, which has proven very helpful, mainly by 
>reinforcing what I was growing to suspect: we've updated the code in our 
>worker processes to close the DB connection after every message, and sure 
>enough we've seen a huge improvement in the form of significantly fewer open 
>DB connections at any given time (where things stood, we were coming 
>dangerously close to hitting the limit set by RDS).
>
>
>To clarify: these aren't really "jobs" as you probably mean. We are very 
>familiar with celery; in fact we use it in our product for background tasks! 
>The worker processes I'm talking about do not receive jobs to execute, they 
>process raw messages in very high volume and are more purpose-built to handle 
>them in a specific way.
>
>
>But that isn't really important or interesting to a general audience. Really I 
>just wanted to say thanks for the tip—it definitely nudged us in the right 
>direction.
>
>On Wednesday, August 17, 2016 at 1:49:22 AM UTC-5, James Schneider wrote:
>
>
>> My team has built a service comprising 3 main parts, a web application and 2 
>> long-running worker processes that process events from a message exchange. 
>> All of these components interact with the same database and use the same 
>> underlying Django "app" for ORM models (i.e. the 2 worker processes call 
>> django.setup() on initialization).
>>
>
>Are the two long term worker instances Django processes, or some other 
>process? If they are Django workers, you're probably not handling jobs 
>correctly.
>
>> We've had some issues with the worker processes failing to recover in the 
>> face of DB connectivity issues. For example at one point Amazon restarted 
>> our DB (it's an RDS instance) and the workers started flailing, repeatedly 
>> raising the same exceptions despite the DB coming back online. Later on we 
>> discovered that we could fix this particular issue by calling 
>> django.db.connection.close() when this exception occurred (it happened to be 
>> InterfaceError); on the next attempt to interact w/ the DB Django would 
>> establish a new connection to the DB and everything would continue to work. 
>> More recently a new error occurred that caused a similar problem, leading us 
>> to speculate that we should do the same thing in this case with this new 
>> type of exception (I think now it's OperationalError because the DB went 
>> into "recovery mode" or something).
>>
>
>You're right. Django is not really designed to be held open in this manner.
>
>> We are now planning on refactoring this service a bit so that instead of 
>> attempting to recover from exceptions, we'll just terminate the process and 
>> configure an external agent to automatically restart in the face of 
>> unexpected errors. This feels like a safer design than trying to figure out 
>> every exception type we should be handling. However I wanted to reach out to 
>> the Django group as a sanity check to see if we're missing something more 
>> basic. From browsing various tickets in Django's issue tracker I've gotten 
>> the impression that we may be swimming upstream a little bit as Django is 
>> designed as a web framework and relies on DB connections being closed or 
>> returned to a pool or something automatically at the end of the request 
>> cycle, not held open by a single loop in a long-running process. Is there 
>> something special we should be doing in these worker processes? A special 
>> Django setting perhaps? Should we just be calling connection.close() after 
>> processing each event? Should we not be using Django at all in this case?
>>
>
>The answer is yes, you can/should use Django, but not to the extent of your 
>current implementation. Your long running jobs should be collected by Django 
>and passed off immediately to a batch processor designed for long-running jobs 
>(although your jobs may not be long-running, it sounds like you are just 
>waiting for incoming job requests).
>
>Celery is a popular choice for batch processing with Django. It has hooks 
>built specifically for Django, and is well documented. It does require a 
>message broker such as Redis or RabbitMQ to keep track of the jobs, though. 
>However, it is designed to work directly with your Django instance, including 
>support for the ORM against your existing database.
>
>http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html
>
>> I think the pessimistic kill-and-restart strategy we've decided upon for now 
>> will work, but any guidance here to ensure we aren't fighting against our 
>> own framework would be much appreciated.
>>
>
>My recommendation would be to investigate a batch processor such as Celery. 
>Depending on the number of jobs you are running, if you have individual jobs 
>running rather than a long process, the chances of a DB restart causing panic 
>are mitigated to just the few jobs that happen to run at that moment. You also 
>have granular control over the failure behavior of individual jobs. Some may 
>be one-shot jobs that simply fail and report, others may retry. 
>
>Also, I would recommend at least coding recovery behavior for the known 
>failure cases. This list may obviously grow over time, but that's what keeps 
>developers employed, right? ;-)
>
>If you do keep a long running process going, I would recommend that you keep a 
>tight grip on the connection state in your loop, maybe even close it from time 
>to time as a sanity check to make sure the DB is really alive. 
>
>You should also have some sort of external network monitoring set up if the 
>application has any sort of value or service expectation. That may include 
>ongoing automated functional testing submitting test jobs, etc.
>
>Preemptively catching DB failures with no production impact is a great way to 
>impress your employer, and make a case to complain to Amazon with trend data. 
>
>-James
>
>-- 
>You received this message because you are subscribed to the Google Groups 
>"Django users" group.
>To unsubscribe from this group and stop receiving emails from it, send an 
>email to [email protected].
>To post to this group, send email to [email protected].
>Visit this group at https://groups.google.com/group/django-users.
>To view this discussion on the web visit 
>https://groups.google.com/d/msgid/django-users/cea41ff1-9d98-4120-9015-994d200f4f90%40googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/sflmv4johcf60w0lhki04m00.1471671393869%40email.android.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to