I am a student and studying the functionality of Lucene for my project work.
If I have to add a new user-generated document in lucene with a term having a particular frequency just like any text file, how do I do it? For eg, say I have to add the following documents analyzed from an image doc1 = { contents field: {"red (X15 times) blue(X10 times)"} , name field: {"doc1"} } doc2 = { contents field: {"red (X10 times) blue(X18 times)"} , name field: {"doc2"} } Now when indexing, I should have term freq for "red" as 15 for doc1 and 10 for doc2 ? The documents doc1 and doc2 can be indexed alongwith the normal text files if only we can update the frequencies manually. Here I need to have frequencies indexed as well (FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS). The DocDelta example provided on this link ( http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene40/Lucene40PostingsFormat.html?is-external=true) says : FreqFile (.frq) --> Header, <TermFreqs, SkipData> TermCount Header --> CodecHeader TermFreqs --> <TermFreq> DocFreq TermFreq --> DocDelta[, Freq?] SkipData --> <<SkipLevelLength, SkipLevel> NumSkipLevels-1, SkipLevel> <SkipDatum> SkipLevel --> <SkipDatum> DocFreq/(SkipInterval^(Level + 1)) SkipDatum --> DocSkip,PayloadLength?,OffsetLength?,FreqSkip,ProxSkip,SkipChildLevelPointer? DocDelta,Freq,DocSkip,PayloadLength,OffsetLength,FreqSkip,ProxSkip --> VInt SkipChildLevelPointer --> VLong "For example, the TermFreqs for a term which occurs once in document seven and three times in document eleven, with frequencies indexed, would be the following sequence of VInts: 15, 8, 3 If frequencies were omitted (FieldInfo.IndexOptions.DOCS_ONLY) it would be this sequence of VInts instead: 7,4" So what should be the DocDelta values for doc1 and doc2 and how? Please provide any other useful links. Thanks.