Thank you for the extremely quick turn-around and suggestions, Graham!  Per your advice, I removed `WSGIImportScript` and added `WSGIRestrictEmbedded On`.

After some more research, setting up a dummy hello world Django app, I confirmed the startup delay wasn't with mod_wsgi after all.  It ended up being a large number of static files getting scanned at startup by Whitenoise (https://github.com/evansd/whitenoise/).

Thanks again for the quick response and apologies for the noise.  I had been spinning my wheels for longer than I care to admit and appreciate your help ruling out mod_wsgi as the cause.

Best,
Jamie

On 10/22/20 6:40 PM, Graham Dumpleton wrote:
Remove:

    WSGIImportScript /path/to/django-wsgi.py \
      process-group=eslive application-group=%{GLOBAL}

Setting both process-group and application-group on WSGIScriptAlias has the same effect of preloading the WSGI script file using WSGIScriptAlias. I am not sure what will happen if both ways of forcing preloading are set.

Also a memory corruption bug was also recently reported to me along with a fix. This has been an outstanding issue for many years but which so rarely occurred on full Linux and macOS platforms (Alpine Linux would crash all the time though), that never been able to track it down. This bug relates to the preloading of the WSGI script file, so there is an outside chance it is related.

Disabling the preloading may not be desirable though because lazy loading has greater risk of delaying first requests longer as can queue up on process which is still loading the application. That said, it may not be noticeable since only one thread per process. Thus worth trying:

    WSGIProcessGroup eslive
    WSGIScriptAlias / /path/to/django-wsgi.py application-group=%{GLOBAL}

which because no WSGIImportScript, but both process-group and application-group aren't said, means no pre-loading.

BTW, if you don't already have it said, ensure you are setting:

    WSGIRestrictEmbedded On

if only using daemon mode. Not related, but good practice and cuts down on memory usage and startup load on Apache child worker processes.

So first up try that. The bug fix I mention hasn't actually been released yet as had some other unfinished stuff in code which wasn't sure if I wanted to change. If you wanted to be brave though, you could try the 'develop' branch of mod_wsgi on GitHub. If can replicate in testing system, could perhaps try it there.

The only other thing can think of is if there is a cross process conflict with initialisation done by your app in relation to a database or backend service, when multiple processes are starting up at the same time.

Finally, not sure whether might be adapted, but as very first thing in WSGI script file you could start a background thread which watches for an event set at end of WSGI script file import, and if takes more than certain time to see that event, indicating slow WSGI script file load, dump out Python stack traces. Code related to this is found at:

https://modwsgi.readthedocs.io/en/master/user-guides/debugging-techniques.html#extracting-python-stack-traces

It will need to be updated to Python 3 as probably still Python 2, and then adapt it as mentioned.

Graham

On 23 Oct 2020, at 8:52 am, Jamie Biggar <[email protected] <mailto:[email protected]>> wrote:

Hi all,

I've been a mod_wsgi user for many years (Graham, thank you for your fantastic community support!), but this week ran into a mystery I haven't been able to solve on my own.

We've been running a fairly hefty Django app in production with mod_wsgi for years without much issue.  In August, with no obviously correlated change in code or server architecture, we started having issues where a restart (usually triggered by `touch`ing the WSGI script via `WSGIScriptReloading On`, though sometimes also by `systemctl restart httpd.service`) would occasionally lead to an unending stream of 504 timeouts (and sometimes some 503s as well) lasting indefinitely.  Another restart would sometimes fix it, but not always.  The issue seems to be load related -- the busier the server is, the more likely it is to get stuck in the 504 loop.  Most restarts would work fine and yield a normally-running site after a brief pause as the app was loaded into memory.

While troubleshooting today (not under production load), I noticed something that I think is likely exacerbating load-related restart timeout issues:  it seems that after a flurry of activity on initial server (re)start which clearly includes loading our WSGI script (as I see entries in the Apache error log related to Python packages it imports), there's a period of roughly 45 seconds when the CPU is idle and no requests are served via mod_wsgi before it wakes up and finally emits  `Started thread 0 in daemon process ...` log messages, then a few seconds later it's able to reply to HTTP requests.

*Any idea what could cause that ~45 second idle period during startup?*  I've tried tuning the *-timeout options for WSGIDaemonProcess, with no apparent effect on the idle time.  I also tried disabling our NewRelic APM code to rule out a network API bottleneck.

Software versions:

* Amazon Linux 2
* Python 3.6 (via IUS: https://ius.io/ )
* mod_wsgi/4.6.2 (also via IUS, compiled against Python 3.6)
* Apache/2.4.46
* Django 2.2

Apache config:

WSGIDaemonProcess eslive display-name='(wsgi:es-site)' \
  processes=6 threads=1 \
  user=apache group=apache \
  python-home=/path/to/virtualenv \
  python-path=/path/to/code/root \
  python-eggs=/var/www/.python-eggs \
  lang='en_US.UTF-8' locale='en_US.UTF-8' \
  queue-timeout=45 \
  socket-timeout=60 \
  connect-timeout=15 \
  request-timeout=120 \
  startup-timeout=30 \
  deadlock-timeout=60 \
  eviction-timeout=0 \
  shutdown-timeout=5 \
  graceful-timeout=15 \
  restart-interval=0 \
  inactivity-timeout=0 \
  maximum-requests=0
WSGIImportScript /path/to/django-wsgi.py \
  process-group=eslive application-group=%{GLOBAL}
WSGISocketPrefix run/httpd-wsgi
<VirtualHost ...>
WSGIScriptAlias / /path/to/django-wsgi.py \
  process-group=eslive application-group=%{GLOBAL}
 WSGIPassAuthorization On
</VirtualHost>

Thanks in advance for any recommendations!

-Jamie


--
You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/bcb386ac-7c83-459d-bced-792d535a09d0n%40googlegroups.com <https://groups.google.com/d/msgid/modwsgi/bcb386ac-7c83-459d-bced-792d535a09d0n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to a topic in the Google Groups "modwsgi" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/modwsgi/EYQ6O5NLC3k/unsubscribe. To unsubscribe from this group and all its topics, send an email to [email protected] <mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/4D0C89F5-4F66-478A-B61D-049C3C8622AD%40gmail.com <https://groups.google.com/d/msgid/modwsgi/4D0C89F5-4F66-478A-B61D-049C3C8622AD%40gmail.com?utm_medium=email&utm_source=footer>.

--

*Jamie Biggar*
VP Engineering & CTO, EnergySage <https://www.energysage.com/>
617.396.7215 | [email protected] <mailto:[email protected]>

Get an _instant estimate_ <https://www.energysage.com/solar/calculator/> to see your solar savings!

--
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/modwsgi/43736233-355c-9205-71d1-0286aad37d34%40energysage.com.

Reply via email to