Hi, I have two questions regarding the search capability of Lucene.
1) Lucene does the inverted indexing by which we mean it keeps how many times a particular token is used. Is there a way to find out the list of most frequently used words in the descending order. For example:- Suppose I have two docs in my index. One doc has "Lucence" 6 times in it(and thats the maximum out of all). Second doc has "Lucene" once and "index" 6 times. So that means most frequently used word is "lucence" - used 7 times and "index" is used 6 times. Is there a way to find out this information? 2) I have a number of documents with BTN(10 digit numeric charater) in their content. I want to do the following things:- a) What query can I write to find the documents that have BTN included in it. I think wildcard search will help but I am not able to find the exact query. b) More importantly, will it tell us what exact BTN is there in the document? For example lets say I search with java* and say 2 documents matched. One of the document has "javaspace" in it and second has "javaworld" in it. Is it possible to get these matched phrases through some API? Thanks a lot. Nikhil