On Thu, 05 Jan 2012, Alberto Accomazzi wrote: > I think this would be the cleanest approach. I'm not sure we (maybe I > should say I) fully understand all the practical implications of > running a separate python version in production, and how this affects > other system components (apache, etc). But I like the idea of having > control over python so we can be insulated by whatever CentOS does.
It is not too difficult to run yet another Python on CentOS. For example, Python-3.1 packages are available for CentOS5 via iuscommunity: <http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/x86_64/repoview/python31.html> to cite but one example. We've been actually using such an approach on some less-critical Invenio instances here at CERN, but we usually prefer to stick to stock Python versions coming with the given OS, for obvious reasons like not having to maintain all the libraries, not having to watch for security updates, etc. As for the dangers with other system components that you mention, such as Apache, most of the necessary components are perfectly fine. However, it may happen that some recommended/optional Python libraries are depending rather tightly on OS system library versions and that they may be hard to get compiled with higher Python versions on older OS systems. Notably, one problematic thing I noticed when running Invenio with Python-2.6 on a RHEL-5 box was libxslt. There may be similar issues with Python-2.7 on RHEL-6 perhaps; I have not looked. Moreover, we usually prefer to stick to stock Python versions because if Invenio runs well on a three-years-old RHEL system, then chances are it runs well everywhere else, which is good for platform independence. But for Invenio instances that we maintain internally, there is no problem using another, higher Python version, at the price of increased maintenance and possibly unusable dependencies. We usually don't go that way, except for testing. Our production servers are running stock Python versions. > I think I like this solution best (or something along these lines). I > would really favor having citation and usage data stored in shared > memory and accessible through IPC (e.g. via memcache or redis), rather > than in private memory. But I don't know the intricacies of the > current implementation, so we would have to make sure that query > efficiency is preserved. Especially in case of user storms, one can feel the communication overhead of accessing citation information internally vs externally. We did some tests with Marko and under usual operational conditions, the difference was acceptable. But it may be important in case of user storms where everyone suddenly starts to access citation summary pages or something like that. >> We can naturally pursue all these tracks in parallel. Even if you'd >> opt out for using locally-maintained Python-2.7, it may be profitable >> to micro-optimise our data structures, and in parallel to go for >> standalone WSGI citation process for better scalability of >> non-citation requests. > > Sounds like a sensible plan to me! Yup. Let's go for it progressively. Best regards -- Tibor Simko