Thanks Yonik,
assuming I am not going to index ID , than only an option 4. remains so far. I
have no other ideas, and Log* merge policy would mean all 4 Indexing magic went
to nothing :)
Colud then the following do the job?
clone DefaultIndexWriterProvider into my codebase (ugly, keep in sync , but
doable)
make it provide
EnhancedSolrIndexWriter extends SolrIndexWriter
@Override
commit(...){
super.commit(Map<String, String> Core.getUserMap());
}
the same with close(...)
If yes, Is this feature something solr could use? Map<String, String>
userParams
somewhere in Core that gets committed with whatever it has at commit time. I
could wrap up a patch by modifying SolrIndexWriter directly then?
Nice thing about it, one could have possibility to keep small map of key value
pairs in sync with commit points with all goods of TwoPhaseCommit... for "no
way
for this to get out of sync" things, like my use case below... I imagine DIH
could use it as well
---------------------------------------------------------
No longer... the default merge policy can now merge non-contiguous segments.
You can of course still select a Log* merge policy, which never
reorders ids with respect to each other.
-Yonik
http://www.lucidimagination.com
________________________________
From: eks dev <[email protected]>
To: [email protected]
Sent: Sat, 6 August, 2011 20:47:09
Subject: IndexReader.maxDoc() and other
Assuming there are no deletes, would the following work as a way to load *last
added document*, surviving optimize as well?
Order of documentId-s in Lucene survives optimize as far as I remember?
IndexReader ir...
int maxDoc = ir.maxDoc() - 1;
if(maxDoc>0) //? What is the return value on empty index, 0 or 1?
Document d = ir.getDocument(maxDoc);
Would this correspond to the last committed document (at commit point where
index reader was opened)
Or last added document, including pending/uncommitted (I am not getting
IndexReader from the IndexWriter, no nrt yet...)
The problem I am trying to solve are incremental updates (there are no
deletions). Having unique, numerical uid stored in index that is increasing
with
every add, I just need a way to find max(uid) on the last commit to get my
delta
from the database.
Above solution was one of the options.
2.The second would be to iterate TermsEnum for uid field until I hit an end,
but
this sounds slow (even if I start skipping around like a monkey)?
3.Third option would be to index reverse uid (HUGE_CONSTANT - uid), so it gets
on top in terms dictionary?
4. And finally, the last option I am thinking of would be to track max(UID) and
write it as a user Parameter with IndexWriter.commit(Map...), so I could read
it easily (piggy-back on lucene commit is as safe as it gets, better then
persisting own files...)
I like the last option, but have no idea how to create beforeCommitListener in
solr?
The most robust is 2/3, but maybe slow-ish (there are 100-200Mio documents/UIDs)
Any better ideas? (and no, DIH wall clock timestamp is not good enough)
I am talking about solr/lucene 4 trunk, we decided to take a risk :)
Thanks,
eks