Hi all, I was trying to figure out the control flow of IndexWriter and IndexSearcher, in order to get a better understanding of the idea behind Codec implementation.
However, there seem to be some questions related with codes, which I just find inconvenient to discuss here. Maybe it is better to expain how much I understand, and ask for your comments? Here is what I understand: *Index time: *--First of all, IndexWriter should get a Codec configuration from an IndexWriterConfig. --When IndexWriter.addDocument is called, an instance of DocumentsWriterPerThread will be created, --It then pass the codec information through the indexing chain, and make an instance of FreqProxTermsWriterPerField to call flush(). --Then, based on the codec information, we create an instance of TermsConsumer, after this, we iterator each termID, get corresponding PostingConsumer, and save infomation of each document. --Here, by inheriting "TermsConsumer" and "PostingConsumer", we get IndexWriter create index with new posting formats. *Query time: *--Now, let's take Phrase Search as an example. --When IndexSearcher.search(phraseQuery,topN) is called, an instance of PhraseWeight will be created to wrap the query terms, --Then, IndexSearcher will create tasks to call method PhraseWeight.scorer(), inside which two instances: Terms and TermsEnum will be fetched from corresponding AtomicReader, --With the help of TermsEnum, for every phrase words, related docs and positions will be fetched through a DocsAndPositionsEnum, and result thus be generated. --Here, by inheriting "TermsEnum" and related "*Enum" classes, we get IndexSearcher(or IndexReader) understand our posting formats. And, here I have some questions: 1. Will multiple AtomicReaders created if I operate a search on a index with several segments? If not, when will there be multi AtomicReaders? And to further the question, what is the idea to introduce AtomicReader and CompositeReader into lucene 4? 2. I must have missed something during query time, since subtype of PostingsReaderBase is just absent from what I explained. Is it created when an instance of AtomicReader is fetch from context? Where can I find related codes? 3. The wiki page here <http://wiki.apache.org/lucene-java/FlexibleIndexing> says we should provide an arbitrary skipDocs bit set during enumeration. Then, is posting list itself remains unchanged, even if I call deleteDocuments() ? Will deleted documents still remain in the postings file, even segments get merged? Thank you. -- Han Jiang EECS, Peking University, China Every Effort Creates Smile Senior Student
