[lucy-dev] On Transactionality and Performance

David E. Wheeler Tue, 22 Mar 2011 22:08:00 -0700

Lucites,

I ended up rewriting the PGXN schema into multiple schemas after consulting 
with Graham Barr on how CPAN search works. I'm pretty happy with the results so 
far, but have a few questions about how indexing transactions work.


* Why does `commit()` invalidate an Indexer object?

* Should I be making as many changes to an index as I can before calling 
`commit()`, or can I update bits at a time using separate index objects?

* Is there a way to invalidate an IndexSearcher object when an index changes? 
Or do I just need to create a new searcher for every request? If the latter, 
how efficient is the constructor?

These questions stem mainly from being a database geek, so I tend to think in 
database-style transactions. To whit:

* If I have to update lots of rows, it's more efficient to use transactions to 
do a few at a time. For example, if I need to update 1,000 rows, I might update 
100 at time in separate transactions.

* Once I've committed a transaction, all other connections can see the changes.

But I'm starting to suspect this isn't the best way to do it with 
Lucy/KinoSearch. Is it better to:

* Update all 1,000 objects in a single transaction (one indexer, calling 
commit() at the end)?

* Always create a new IndexSearcher for new requests in order to see any 
changes? (I found in tests I was writing that if I updated an index, an 
existing IndexSearcher did *not* see the change -- maybe it was caching results 
for performance?)

Thank you for your patience with my newbish questions.

Best,

David

[lucy-dev] On Transactionality and Performance

Reply via email to