Re: Lucene indexing questions

Tibor Simko Thu, 7 Oct 2010 00:36:21 +0200

(Funnily enough, we have been replying with Travis at the same time.
Let me comment on some issues I passed by in silence.)

On Wed, 06 Oct 2010, Brooks, Travis C. wrote:
>> - it allows to combine the full power of the search engine (but
>> inevitably, things are done differently)
>
> This is clearly an advantage, plus the maint. advantage of not
> writing/maintaining code that has already been written/is being
> maintained

You mean NLP and full-text word proximity and (facets)?  Yes.  But I
doubt that cite summary code, co-cited-with code, SPIRES syntax code,
download similarity code, citedby/refersto code, etc etc have been
ported to Solr by somebody else, in that we could just take it and use
it without any code writing/maintenance issues.  So I would not call it
a clear advantage in software writing/maintenance domain, unless the
pros/cons of porting or otherwise enabling this feature here, or that
feature there, are compared in detail.  There are various solutions to
various problems with various economics.

>> - assumption that it will be slower than python in-memory dictionary
>> is assumption (and should be _recognized_ as such)
>
> Agreed

In my email reply to Jay I mentioned `slow' in the context when Solr
would not have full access to raw citation map and would store only
citation counts.  I was not comparing speeds of in-memory citation dict
technique vs storing citer-citee pairs in Solr technique.  This is
frankly a non-problem in my eyes, so to speak, for Invenio is already
very fast in this domain.

>> - it is just a different paradigm than rdbms
>
> Yep, interestingly enough, so was SPIRES.

Invenio does not really use RDBMS for search or citation stuff either.
(Or only a little.)  It would be slow to use the classical pure RDBMS
table/row/column technique to hold info about these things.  Invenio has
its own indexes, and RDBMS is there mostly to hold serialized bit
vectors and serialized citation dictionaries and stuff.  Recapping just
for the sake of completeness.

Best regards
--
Tibor Simko

Re: Lucene indexing questions

Reply via email to