On May 12, 2005, at 10:24 AM, Goel, Nikhil wrote:
1) Lucene does the inverted indexing by which we mean it keeps how many
times a particular token is used. Is there a way to find out the list of
most frequently used words in the descending order.
Have a look at Luke's code to see how it does this - it has a view of the most frequent terms. http://www.getopt.org/luke/
2) I have a number of documents with BTN(10 digit numeric charater) in their content. I want to do the following things:- a) What query can I write to find the documents that have BTN included in it. I think wildcard search will help but I am not able to find the exact query. b) More importantly, will it tell us what exact BTN is there in the document? For example lets say I search with java* and say 2 documents matched. One of the document has "javaspace" in it and second has "javaworld" in it. Is it possible to get these matched phrases through some API?
One option to consider is extracting these BTN values from the original documents and indexing them into a separate field.
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]