Lucene Search Capabilities.

Goel, Nikhil Thu, 12 May 2005 07:48:50 -0700

Hi, 

I have two questions regarding the search capability of Lucene.


1) Lucene does the inverted indexing by which we mean it keeps how many
times a particular token is used. Is there a way to find out the list of
most frequently used words in the descending order. 

For example:- Suppose I have two docs in my index. One doc has "Lucence"
6 times in it(and thats the maximum out of all). Second doc has "Lucene"
once and "index" 6 times. 

So that means most frequently used word is "lucence" - used 7 times and
"index" is used 6 times. 

Is there a way to find out this information? 

2) I have a number of documents with BTN(10 digit numeric charater) in
their content. I want to do the following things:-
a) What query can I write to find the documents that have BTN included
in it. I think wildcard search will help but I am not able to find the
exact query. 
b) More importantly, will it tell us what exact BTN is there in the
document? For example lets say I search with java* and say 2 documents
matched. One of the document has "javaspace" in it and second has
"javaworld" in it. 
Is it possible to get these matched phrases through some API?


Thanks a lot.
Nikhil

Lucene Search Capabilities.

Reply via email to