On Thu, 05 Jan 2012, Benoit Thiell wrote:
> Here is a table that summarize the results I had while testing
> search_unit_in_bibxxx with different queries, with or without garbage
> collection and with or without citation dictionaries. 

Thanks for the summary, very informative.

> It might be interesting to run similar tests for an Inspire-like
> Invenio instance.

I ran two similar tests on an INSPIRE test instance.  (1) For a word
query returning 600K hits, the speed up was a negligible 1%, which was
expected because this is mostly using intbitsets.  (2) For a MARC value
query returning 550K hits, where native Python lists are being used
internally, the speed up was around 40%.  However, this type of query is
not often used, since users' phrase queries typically use intbitsets
internally as well.

To conclude, I think we can safely hard-code GC switching on/off inside
run_sql() and friends, perhaps depending on the Python version, like:

  if sys.version_info < (2, 7):
     USE_GC_DISABLING = True
  else
     USE_GC_DISABLING = False

  [...]

        if USE_GC_DISABLING:
            gc.disable()    
            rc = cur.execute(sql, param)
            gc.enable()    
        else:         
            rc = cur.execute(sql, param)  

If you agree, I'll commit such a change in your name.

>> - We can revive the option of using standalone WSGI process for all
>>  citation handling, since this would enable Invenio to use numerous
>>  smaller WSGI processes on the front-end for answering all the
>>  non-citation requests.
>
> This looks like a promising option to me. Is there any literature on
> this that would give me an idea of how this is supposed to work?

Some ideas can be seen at:

  <https://twiki.cern.ch/twiki/bin/view/CDS/InvenioScalability>

There is even a tentative branch that Marko was working on at:

  
<http://invenio-software.org/repo/personal/invenio-marko/log/?h=citations-in-separate-server>

I can dust it off and clean it and let you know.

> I agree that the data structures need to be optimized because the
> memory usage has already proven several times to be problematic.

We'll definitely do that as part of the micro-optimisation ticket:

   <http://invenio-software.org/ticket/21>

It should be a rather cheap change.

Best regards
-- 
Tibor Simko

Reply via email to