Re: Slow MySQL queries with large data structures in memory

Tibor Simko Wed, 11 Jan 2012 05:14:13 -0800

On Thu, 05 Jan 2012, Alberto Accomazzi wrote:
> I think this would be the cleanest approach.  I'm not sure we (maybe I
> should say I) fully understand all the practical implications of
> running a separate python version in production, and how this affects
> other system components (apache, etc).  But I like the idea of having
> control over python so we can be insulated by whatever CentOS does.


It is not too difficult to run yet another Python on CentOS.  For
example, Python-3.1 packages are available for CentOS5 via iuscommunity:

  
<http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/x86_64/repoview/python31.html>

to cite but one example.

We've been actually using such an approach on some less-critical Invenio
instances here at CERN, but we usually prefer to stick to stock Python
versions coming with the given OS, for obvious reasons like not having
to maintain all the libraries, not having to watch for security updates,
etc.

As for the dangers with other system components that you mention, such
as Apache, most of the necessary components are perfectly fine.
However, it may happen that some recommended/optional Python libraries
are depending rather tightly on OS system library versions and that they
may be hard to get compiled with higher Python versions on older OS
systems.  Notably, one problematic thing I noticed when running Invenio
with Python-2.6 on a RHEL-5 box was libxslt.  There may be similar
issues with Python-2.7 on RHEL-6 perhaps; I have not looked.

Moreover, we usually prefer to stick to stock Python versions because if
Invenio runs well on a three-years-old RHEL system, then chances are it
runs well everywhere else, which is good for platform independence.  But
for Invenio instances that we maintain internally, there is no problem
using another, higher Python version, at the price of increased
maintenance and possibly unusable dependencies.  We usually don't go
that way, except for testing.  Our production servers are running stock
Python versions.

> I think I like this solution best (or something along these lines).  I
> would really favor having citation and usage data stored in shared
> memory and accessible through IPC (e.g. via memcache or redis), rather
> than in private memory.  But I don't know the intricacies of the
> current implementation, so we would have to make sure that query
> efficiency is preserved.

Especially in case of user storms, one can feel the communication
overhead of accessing citation information internally vs externally.  We
did some tests with Marko and under usual operational conditions, the
difference was acceptable.  But it may be important in case of user
storms where everyone suddenly starts to access citation summary pages
or something like that.

>> We can naturally pursue all these tracks in parallel.  Even if you'd
>> opt out for using locally-maintained Python-2.7, it may be profitable
>> to micro-optimise our data structures, and in parallel to go for
>> standalone WSGI citation process for better scalability of
>> non-citation requests.
>
> Sounds like a sensible plan to me!

Yup.  Let's go for it progressively.

Best regards
-- 
Tibor Simko

Re: Slow MySQL queries with large data structures in memory

Reply via email to