Re: SpanQuery parser? Update (ugly hack inside...)

Sean O'Connor Mon, 07 Nov 2005 14:42:43 -0800

Erik Hatcher wrote:

On 4 Nov 2005, at 18:32, Sean O'Connor wrote:
I'm posting this primarily hoping to give back a tiny bit to a veryhelpful community. More likely however, someone else will open myeyes to an easier approach than what I outline below...
I've come up with a very ugly conversion approach from regular Queryobjects into SpanQuery objects. I then use the converted SpanQueryto get span positions (currently both token #, and start/ endposition). In effect, I have highlighting for simple queries with avery inefficient approach (yea for me!).
As you and I have talked about on a couple of face to face occasions,this is the approach I am taking on a current consulting project. Myconversion code is slightly different than yours in that I don'trewrite the query, but translate it as-is into comparable SpanQuerysubclasses - and this is because I have a RegexQuery andSpanRegexQuery that are comparable. But rewriting is a goodpragmatic way to go for general query types that don't have acomparable SpanQuery subclass.
The goal(s) I am trying to accomplish is rather specific I think, soI imagine the use of my hacking is rather limited (i.e. just to me).
At the moment my code:

   * parses the search text (i.e. user entered query)
Are you using QueryParser? If so, you'll also want to account forBooleanQuery, recursively.

I am using QueryParser. So far I have taken the easy route, and justdeal with 'Or' BooleanQueries. The additional aspects of Boolean query(required and prohibited) should not be much of a stretch.

   * rewrites the resulting query to expand wildcards and such against
     index
   * calls a recursive conversion function with very basic conversion
     understanding
         o TermQuery -> SpanTerm
         o PhraseQuery -> SpanNear
         o others in progress as time permits

Currently, I only process simple query strings like:
"blue green yellow" => SpanOrQuery
"luce* acti*" => SpanOrQuery with wild cards expanded
e.g.: lucene lucent action acting ... all or'ed together in abraindead fashion"luce* acti* \"book rocks\"" => SpanOrQuery combining SpanTerms andSpanNear (no slop)er, hopefully you get the picture, I'm not up to showing a vectorof this one... :-)
I would be happy to discuss my approach if there is anyoneinterested. I assume I am pretty much alone in finding thisineffecient approach useful. For me, it is the functionality thatoverrides perfomance issues.
What is inefficient about it? The rewrite stuff is the maindifference, and perhaps that is the issue you're encountering. Wheredo you see the performance issues?Converting a query, for me at least, is fast - perhaps because thereis no rewriting involved.

Good question. I haven't done any performance testing, nor am I seeingany performance problems with lucene. I just assumed that my approachwas adding an extra (unoptimized) layer. So for now, forgot I mentionedthat :-).

I have something which can take user search strings and do hithighlighting for the exact hit found. This is really only useful for"termA near 'some phrase'" at the moment, but might become moreadvanced in the next 2-3 months.
I'm basically implementing this very thing. I will likely beenhancing the contrib/highlighter code in the next month to useSpanQuery for highlighting, as well as adding field-aware highlighting.

Excellent. I will keep an eye out for it. Thanks for the heads up.

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Sean



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: SpanQuery parser? Update (ugly hack inside...)

Reply via email to