I implemented this concept for my ends with query. It works very well! ----- Original Message ----- From: "Chris Hostetter" <[EMAIL PROTECTED]> To: "Lucene Users List" <lucene-user@jakarta.apache.org> Sent: Friday, February 04, 2005 9:37 PM Subject: Re: Starts With x and Ends With x Queries
> > : Also keep in mind that QueryParser only allows a trailing asterisk, > : creating a PrefixQuery. However, if you use a WildcardQuery directly, > : you can use an asterisk as the starting character (at the risk of > : performance). > > On the issue of "ends with" wildcard queries, I wanted to throw out and > idea that i've seen used to deal with matches like this in other systems. > I've never acctually tried this with Lucene, but I've seen it used > effectively with other systems where the goal is to "sort" strings by the > least significant (ie: right most) characters first. I think it could > apply nicely to people who have compelling needs for efficent 'ends with' > queries. > > > > Imagine you have a field call name, which you can already do efficient > prefix matching on using the PrefixQuery class. Your docs and query may > look something like this... > > D1> name:"Adam Smith" age:13 state:CA ... > D2> name:"Joe Bob" age:42 state:WA ... > D3> name:"John Adams" age:35 state:NV ... > D3> name:"Sue Smith" age:33 state:CA ... > > ...and your queries may look something like... > > Query q1 = new PrefixQuery(new Term("name","J*")); > Query q2 = new PrefixQuery(new Term("name","Sue*")); > > If you want to start doing suffix queries (ie: all names ending with > "s", or all names ending with "Smith") one approach would be to use > WildcarQuery, which as Erik mentioned, will allow you to use a quey Term > that starts with a "*". ie... > > Query q3 = new WildcardQuery(new Term("name","*s")); > Query q4 = new WildcardQuery(new Term("name","*Smith")); > > (NOTE: Erik says you can do this, but the docs for WildcardQuery say you > can't I'll assume the docs are wrong and Erik is correct.) > > The problem is that this is horrendously inefficient. In order to find > the docs that contain Terms which match your suffix, WildcardQuery must > first identify what all of those Terms are, by iterating over every Term > in your index to see if they match the suffix. This is much slower then a > PrefixQuery, or even a WildcardQuery that has just 1 initial character > before a "*" (ie: "s*foobar"), because it can then seek to directly to the > first Term that starts with that character, and also stop iterating as > soon as it encounters a Term that no longer begins with that character. > > Which leads me to my point: if you denormalize your data so that you store > both the Term you want, and the *reverse* of the term you want, then a > Suffix query is just a Prefix query on a reversed field -- by sacrificing > space, you can get all the speed efficiencies of a PrefixQuery when doing > a SuffixQuery... > > D1> name:"Adam Smith" rname:"htimS madA" age:13 state:CA ... > D2> name:"Joe Bob" rname:"boB oeJ" age:42 state:WA ... > D3> name:"John Adams" rname:"smadA nhoJ" age:35 state:NV ... > D3> name:"Sue Smith" rname:"htimS euS" age:33 state:CA ... > > Query q1 = new PrefixQuery(new Term("name","J*")); > Query q2 = new PrefixQuery(new Term("name","Sue*")); > Query q3 = new PrefixQuery(new Term("rname","s*")); > Query q4 = new PrefixQuery(new Term("rname","htimS*")); > > > (If anyone sees a flaw in my theory, please chime in) > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]