I have this uncommitted class locally (forgot its origins), which you'll like:
$ svn st
? contrib/miscellaneous/src/java/org/apache/lucene/misc/AllTerms.java
Slap the package statement and add imports and you have it. Read this into
some data structure and pick random terms from there.
/**
* <code>AllTerms</code> class extracts terms and their frequencies out
* of an existing Lucene index.
*
* @version $Id: HighFreqTerms.java 376393 2006-02-09 19:17:14Z otis $
*/
public class AllTerms {
public static void main(String[] args) throws Exception {
IndexReader reader = null;
String field = null;
if (args.length == 1) {
reader = IndexReader.open(args[0]);
} else if (args.length == 2) {
reader = IndexReader.open(args[0]);
field = args[1];
} else {
usage();
System.exit(1);
}
TermEnum terms = reader.terms();
if (field != null) {
while (terms.next()) {
if (terms.term().field().equals(field)) {
System.out.println(terms.term() + ": " + terms.docFreq());
}
}
}
else {
while (terms.next()) {
System.out.println(terms.term() + ": " + terms.docFreq());
}
}
reader.close();
}
private static void usage() {
System.out.println(
"\n\n"
+ "java org.apache.lucene.misc.AllTerms <index dir> [field]\n\n");
}
}
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
> From: Erick Erickson <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Tuesday, June 24, 2008 9:26:03 AM
> Subject: Re: uniqueWords, and termDocs
>
> Isn't asking for unique words (actually tokens) equivalent to enumerating
> all the terms in a field?
>
> I have no idea how to select a random word. Seems like you'd have to
> somehow use a TermEnum, but I don't think there's anything built in.
>
> Best
> Erick
>
> On Mon, Jun 23, 2008 at 6:03 PM, Cam Bazz wrote:
>
> > Hello,
> >
> > I need to be able to select a random word out of all the words in my index.
> > how can I do this tru termDocs() ?
> >
> > Also, I need to get a list of unique words as well. Is there a way to ask
> > this to lucene?
> >
> > Best Regards,
> > -C.B.
> >
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]