Re: Lucene indexing questions

Tibor Simko Thu, 7 Oct 2010 17:29:52 +0200

On Thu, 07 Oct 2010, Roman Chyla wrote:
> thank you, so i guess there is something that interprets those
> relations


Yes, there is Python code in the web app server that walks through the
dictionaries as needed.

>>> Do you have some reasons to believe that the pairs are more storage
>>> effective, than the points in the index?
>>
>> A web app node does not have to contact the DB node in order to walk
>> over the citation map to provide a cite summary, because the full
>> citation map is readily available in its memory.  Good for load
>> distribution, hence speed and scalability.
>
> strictly speaking, we were having discussion about the storage, so it
> still seems to be not soooo much more swelled

The web app nodes fetch and cache citation dictionaries from the DB
storage space upon Apache startup.  Then they don't bother going to the
DB server for citation data anymore, except for quick timestamp checks
to see if everything is still up to date.  So, if we have N web app
nodes, they can process N*WP cite summary queries in parallel (where WP
is the number of worker processes per node) without ever charging DB
server for the citation data.  (Only for that quick timestamp check.)

If we have data points stored in an `index node' that is separated from
the `web app nodes', then the web app nodes would have to dispatch
queries to the `index node' and gather response back, taking some time.
If there is only one `index node', then this would become a real
bottleneck.  If there are several `index nodes', then it is not so
different from having in-memory citation dictionary nodes indeed, from
the scalability point of view.  It would be a bit like doubling or
shadowing the web app nodes, so to speak.  But wouldn't it require some
Solr extension?

Best regards
-- 
Tibor Simko

Re: Lucene indexing questions

Reply via email to