On Thu, 07 Oct 2010, Roman Chyla wrote: > thank you, so i guess there is something that interprets those > relations
Yes, there is Python code in the web app server that walks through the dictionaries as needed. >>> Do you have some reasons to believe that the pairs are more storage >>> effective, than the points in the index? >> >> A web app node does not have to contact the DB node in order to walk >> over the citation map to provide a cite summary, because the full >> citation map is readily available in its memory. Good for load >> distribution, hence speed and scalability. > > strictly speaking, we were having discussion about the storage, so it > still seems to be not soooo much more swelled The web app nodes fetch and cache citation dictionaries from the DB storage space upon Apache startup. Then they don't bother going to the DB server for citation data anymore, except for quick timestamp checks to see if everything is still up to date. So, if we have N web app nodes, they can process N*WP cite summary queries in parallel (where WP is the number of worker processes per node) without ever charging DB server for the citation data. (Only for that quick timestamp check.) If we have data points stored in an `index node' that is separated from the `web app nodes', then the web app nodes would have to dispatch queries to the `index node' and gather response back, taking some time. If there is only one `index node', then this would become a real bottleneck. If there are several `index nodes', then it is not so different from having in-memory citation dictionary nodes indeed, from the scalability point of view. It would be a bit like doubling or shadowing the web app nodes, so to speak. But wouldn't it require some Solr extension? Best regards -- Tibor Simko
