On Thu, 05 Jan 2012, Benoit Thiell wrote: > Here is a table that summarize the results I had while testing > search_unit_in_bibxxx with different queries, with or without garbage > collection and with or without citation dictionaries.
Thanks for the summary, very informative. > It might be interesting to run similar tests for an Inspire-like > Invenio instance. I ran two similar tests on an INSPIRE test instance. (1) For a word query returning 600K hits, the speed up was a negligible 1%, which was expected because this is mostly using intbitsets. (2) For a MARC value query returning 550K hits, where native Python lists are being used internally, the speed up was around 40%. However, this type of query is not often used, since users' phrase queries typically use intbitsets internally as well. To conclude, I think we can safely hard-code GC switching on/off inside run_sql() and friends, perhaps depending on the Python version, like: if sys.version_info < (2, 7): USE_GC_DISABLING = True else USE_GC_DISABLING = False [...] if USE_GC_DISABLING: gc.disable() rc = cur.execute(sql, param) gc.enable() else: rc = cur.execute(sql, param) If you agree, I'll commit such a change in your name. >> - We can revive the option of using standalone WSGI process for all >> citation handling, since this would enable Invenio to use numerous >> smaller WSGI processes on the front-end for answering all the >> non-citation requests. > > This looks like a promising option to me. Is there any literature on > this that would give me an idea of how this is supposed to work? Some ideas can be seen at: <https://twiki.cern.ch/twiki/bin/view/CDS/InvenioScalability> There is even a tentative branch that Marko was working on at: <http://invenio-software.org/repo/personal/invenio-marko/log/?h=citations-in-separate-server> I can dust it off and clean it and let you know. > I agree that the data structures need to be optimized because the > memory usage has already proven several times to be problematic. We'll definitely do that as part of the micro-optimisation ticket: <http://invenio-software.org/ticket/21> It should be a rather cheap change. Best regards -- Tibor Simko