17 apr 2006 kl. 18.59 skrev John Powers:

Hello,

If I have a user search for "b-trunk"  I would like them to be able to
find "b-trunk" (with hypen).   I would also like someone searching for
"b trunk" to also find "b-trunk".

If you don't care about spans, make a filter that rebuilds the token at index time. It's a bit quick and dirty, but I do things like this in some cases without any major problems.

Below code builds [btrunk] [b-trunk] [b] [trunk].

I would not recommend you to do this without considering what you do.



public class LowASCIIDashWordFilter extends TokenFilter {

    private static Pattern p = Pattern.compile("(\\w+)-(\\w+)");

    /** Construct a token stream filtering the given input. */
    public LowASCIIDashWordFilter(TokenStream input) {
        super(input);
    }

    private LinkedList<Token> buf;

    /** Returns the next token in the stream, or null at EOS. */
    public Token next() throws IOException {
        Token next;
        if (buf == null) {
            buf = new LinkedList<Token>();
            next = input.next();
            while (next != null) {
                Matcher m = p.matcher(next.termText());
                if (m.matches()) {
buf.add(new Token(m.group(1) + m.group(2), m.start(1), m.end(2), "composite dashword")); buf.add(new Token(m.group(1), m.start(1), m.end (1), "left dashword")); buf.add(new Token(m.group(2), m.start(2), m.end (2), "right dashword"));
                }
                next = input.next();
            }
        }

        if (buf.size() > 0) {
            return buf.removeFirst();
        } else {
            return null;
        }
    }

}



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to