Hello,
I am trying to count the total of number of posting entries for terms having
a given prefix in an index. Also count the number of such terms in the
index.
The following is the code I am using for that. The problem is the result is
not as expected.
Can you point out if what am I doing something wrong:
ASSUMPTION:
Index has had no deletions.
INPUT:
prefix: the prefix that terms should match.
VARIABLES:
set: a set of unique terms found in the index having given prefix
wordcount: the number of unique terms in the index having given
prefix
termFreqCount: final result which will be returned
CODE:
public long countTotalPositingEntriesInIndex(String prefix) {
int wordCount = 0;
int documentId = -1;
long termFreqCount = 0;
HashSet<String> set = new HashSet<String>();
for (int i = 0; i < index.length; i++) {
while (documentId < index[i].getIndexReader().maxDoc() - 1) {
documentId++;
try {
TermFreqVector tfv[] = index[i].getIndexReader()
.getTermFreqVectors(documentId);
if (tfv == null)
continue;
for (int fieldCount = 0; fieldCount < tfv.length; fieldCount++) {
String terms[] = tfv[fieldCount].getTerms();
int termFreq[] = tfv[fieldCount].getTermFrequencies();
for (int termCount = 0; termCount < terms.length; termCount++) {
if (terms[termCount].toLowerCase().startsWith(
prefix.toLowerCase()))
{
if( !set.contains(terms[termCount]))
{
wordCount++;
set.add(terms[termCount].toLowerCase());
}
termFreqCount += termFreq[termCount];
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
return termFreqCount;
}