(Funnily enough, we have been replying with Travis at the same time. Let me comment on some issues I passed by in silence.)
On Wed, 06 Oct 2010, Brooks, Travis C. wrote: >> - it allows to combine the full power of the search engine (but >> inevitably, things are done differently) > > This is clearly an advantage, plus the maint. advantage of not > writing/maintaining code that has already been written/is being > maintained You mean NLP and full-text word proximity and (facets)? Yes. But I doubt that cite summary code, co-cited-with code, SPIRES syntax code, download similarity code, citedby/refersto code, etc etc have been ported to Solr by somebody else, in that we could just take it and use it without any code writing/maintenance issues. So I would not call it a clear advantage in software writing/maintenance domain, unless the pros/cons of porting or otherwise enabling this feature here, or that feature there, are compared in detail. There are various solutions to various problems with various economics. >> - assumption that it will be slower than python in-memory dictionary >> is assumption (and should be _recognized_ as such) > > Agreed In my email reply to Jay I mentioned `slow' in the context when Solr would not have full access to raw citation map and would store only citation counts. I was not comparing speeds of in-memory citation dict technique vs storing citer-citee pairs in Solr technique. This is frankly a non-problem in my eyes, so to speak, for Invenio is already very fast in this domain. >> - it is just a different paradigm than rdbms > > Yep, interestingly enough, so was SPIRES. Invenio does not really use RDBMS for search or citation stuff either. (Or only a little.) It would be slow to use the classical pure RDBMS table/row/column technique to hold info about these things. Invenio has its own indexes, and RDBMS is there mostly to hold serialized bit vectors and serialized citation dictionaries and stuff. Recapping just for the sake of completeness. Best regards -- Tibor Simko
