Hi Max, On 02/05/2010 at 10:18 AM, Grant Ingersoll wrote: > On Feb 3, 2010, at 8:57 PM, Max Lynch wrote: > > Hi, I would like to do a search for "Microsoft Windows" as a span, but > > not match if words before or after "Microsoft Windows" are upper cased. > > > > For example, I want this to match: another crash for Microsoft Windows > > today But not this: another crash for Microsoft Windows Server today > > > > Is this possible? My first attempt started with the SpanRegexQuery > > from the regex contrib package, but I can't figure out how to put in a > > term I do want to match but don't want to include in the final > > highlighting match. Does that make sense? > > > > My example (using WhitespaceAnalyzer since I care about case): > > > > SpanRegexQuery srq1 = new SpanRegexQuery(new Term("contents", "Chase")); > > SpanRegexQuery srq2 = new SpanRegexQuery(new Term("contents", > > "Bank[\\.]*")); > > SpanRegexQuery srq3 = new SpanRegexQuery(new Term("contents", "[^A-Z]*")); > > I'm not sure it supports it, but I wonder if you could use a negative > lookahead assertion? Most regex languages support it.
I don't think this would work, since the input to a SpanRegexQuery regex is a single Term; following Terms are not included in the input. I *think* you can get what you want using SpanNotQuery - something like the following, using your "Microsoft Windows" example: SpanNot: include: SpanNear(in-order=true, slop=0): SpanTerm: "Microsoft" SpanTerm: "Windows" exclude: SpanNear(in-order=true, slop=0): SpanTerm: "Microsoft" SpanTerm: "Windows" SpanRegex: "^\\p{Lu}.*" Steve --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org