Thanks Grant, this sound good. https://issues.apache.org/jira/browse/LUCENE-1812 and https://issues.apache.org/jira/browse/LUCENE-2632
I didn't notice them before due to high_volume high_quality traffic here in lucene world, one cannot keep up :) Will have to look into it in detail. With pruning the problem is going to be to somehow preserve this "write once" benefit for slave updates (copy deltas and relaod()) . Update full index by adding/deleting a few docs -> commit -> prune-> Update slaves incrementally? Will that work? I will have to check what this pruning codec produces (one merge on the way and I need full update of slaves...) and these TeeSinkCodec and FilteringCodec look from JIRA description just exctly like a solution! Sounds too good. Thanks again! Eks ----- Original Message ---- > From: Grant Ingersoll <gsing...@apache.org> > To: dev@lucene.apache.org > Sent: Fri, 22 October, 2010 15:26:31 > Subject: Re: Polymorphic Index > > > On Oct 21, 2010, at 3:44 PM, eks dev wrote: > > > Hi All, > > I am trying to figure out a way to implement following use case with > > lucene/solr. > > > > > > In order to support simple incremental updates (master) I need to index > > and > > > store UID Field on 300Mio collection. (My UID is a 32 byte sequence). But > > I >do > > > not need indexed (only stored) it during normal searching (slaves). > > > > > > The problem is that my term dictionary gets blown away with sheer number > > of > > > unique IDs. Number of unique terms on this collection, excluding UID is >less > > > than 7Mio. > > I can tolerate resources hit on Updater (big hardware, on disk index...). > > > > This is a master slave setup, where searchers run from RAMDisk and having > > 300Mio * 32 (give or take prefix compression) plus pointers to postings > > and > > > postings is something I would really love to avoid as this is significant > > compared to really small documents I have. > > > > > > Cutting to the chase: > > How I can have Indexed UID field, and when done with indexing: > > 1) Load "searchable" index into ram from such an index on disk without one > > field? > > That doesn't seem like it would be all that hard to do in Lucene with a few >edits to the appropriate low level classes to simply not load the term >dictionary for a particular set of fields (pass in a set?). This sort of >masking even seems like a generally useful performance gain in the typical >master/worker replicated environment. > > > > > 2) create 2 Indices in sync on docIDs, One containing only indexed UID > > Kind of reminds me of Andrzej's pruning codec stuff. Perhaps the new Flex >stuff helps here? > > > 3) somehow transform index with indexed UID by dropingUID field, > > preserving > > docIs. Kind of tool smart index-editing tool. > > Again, take a look at Andrzej's pruning codec. > > -Grant > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org