17 apr 2006 kl. 18.59 skrev John Powers:
Hello,
If I have a user search for "b-trunk" I would like them to be able to
find "b-trunk" (with hypen). I would also like someone searching for
"b trunk" to also find "b-trunk".
If you don't care about spans, make a filter that rebuilds the token
at index time. It's a bit quick and dirty, but I do things like this
in some cases without any major problems.
Below code builds [btrunk] [b-trunk] [b] [trunk].
I would not recommend you to do this without considering what you do.
public class LowASCIIDashWordFilter extends TokenFilter {
private static Pattern p = Pattern.compile("(\\w+)-(\\w+)");
/** Construct a token stream filtering the given input. */
public LowASCIIDashWordFilter(TokenStream input) {
super(input);
}
private LinkedList<Token> buf;
/** Returns the next token in the stream, or null at EOS. */
public Token next() throws IOException {
Token next;
if (buf == null) {
buf = new LinkedList<Token>();
next = input.next();
while (next != null) {
Matcher m = p.matcher(next.termText());
if (m.matches()) {
buf.add(new Token(m.group(1) + m.group(2),
m.start(1), m.end(2), "composite dashword"));
buf.add(new Token(m.group(1), m.start(1), m.end
(1), "left dashword"));
buf.add(new Token(m.group(2), m.start(2), m.end
(2), "right dashword"));
}
next = input.next();
}
}
if (buf.size() > 0) {
return buf.removeFirst();
} else {
return null;
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]