Hi,
We are using doc.add(Field.Text("keywords",keywords)); to add the keywords to
the document, where keywords is comma separated keywords string.
Lucene seems to tokenize the keywords with multiple words like(MAIN BOARD) as
different keywords(ie as MAIN and BOARD). Tokenization is based on comma and
space...So if we search for "MAIN BOARD", documents having keywords like "MAIN
LOGIC", "MAIN PARTS", etc also show up
If one searches for "MAIN BOARD", we want get only the documents have "MAIN
BOARD". How to do this ?
To achieve this we used doc.add(Field.Keyword("keywords", keywords)); and while
searching
we cannot use standard analyzer, while searching, as divides the keywords if we
search keywords having space... so we wrote an KeywordAnalyser(KeywordAnalyzer
is basically returns only one single token) as given below.
/**
* Tokenizes the entire stream as single token
*/
public class KeywordAnalyzer extends Analyzer
{
public TokenStream tokenStream(String fieldName, final Reader reader)
{
return new TokenStream(){
private boolean done;
private final char[] buffer = new char[1024];
public Token next() throws IOException
{
if(!done)
{
done = true;
StringBuffer buffer = new
StringBuffer();
int length = 0;
while(true)
{
length =
reader.read(this.buffer);
if(length == -1) break;
buffer.append(this.buffer,0,length);
}
String text = buffer.toString();
return new
Token(text.toUpperCase(),0,text.length());
}
return null;
}
};
}
}
Which solve the above said problem, but we are not able to the wild card
searchs like MAIN*, etc.
We need both the functionality ie.
1. if user searches for MAIN BOARD, should get only documents that contain
MAIN BOARD and not MAIN LOGIC, MAIN, MAIN PART etc.
2. User should be able to do the wild card search like MAIN*, etc and get the
desired documents.
Please let us know, how we should do the indexing ? and which analyzer to use
to do the search ?
thanks
Rahul...