Mark, Here's a small piece of code that outputs a list of all terms
for a given field, in order of decreasing term frequency:
--- Requires Java 1.5 for PriorityQueue, or you can use Doug Lea's version ---
String field = "myfield".intern(); // intern
required for != below
IndexReader reader = IndexReader.open(indexPath);
Term startTerm = new Term(field, ""); // set your field here
TermEnum termEnum = reader.terms(startTerm);
PriorityQueue queue = new PriorityQueue(); // using java 1.5's PriorityQueue
while (termEnum.next()) {
Term term = termEnum.term();
if (term.field() != field) break; // lucene interns fields so != works
int freq = reader.docFreq(term);
queue.add(new TermFreq(term, freq));
}
reader.close();
while (!queue.isEmpty()) {
TermFreq termFreq = (TermFreq) queue.remove();
System.out.println(termFreq.term.text()+": "+termFreq.freq);
}
/* inner class for priority queue */
class TermFreq implements Comparable {
Term term;
int freq;
public TermFreq(Term term, int freq) { this.term = term; this.freq = freq; }
public int compareTo(Object o) {
if (this == o) return 0;
TermFreq other = (TermFreq) o;
return other.freq - this.freq;
}
}
On Apr 7, 2005 9:00 PM, Mark Gunnels <[EMAIL PROTECTED]> wrote:
> Is there a simple way to list all terms in a field?
> The only approach that I see is to use the IndexReader.terms() method
> and then iterate over all the results and build my list by manually
> filtering. This seems inefficient and there must be a better way that
> my newbie eyes don't see.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]