Having switched the highlighter over from lots of
Query-specific code to using the generic
Query.extractTerms API I realize I have both gained
something (support for all query types) and lost
something (detailed boost info for each term in the
tree eg Fuzzy spelling variants). The boost info was
useful for selecting snippets and grading highlight
intensity.

This exercise has led me to the conclusion that
extractTerms is not the greatest way to provide
information about queries.

I see a clear analogy with the way exceptions are/were
implemented in Java - there used to be no standard way
of unravelling nested exceptions and this was solved
in JDK1.4 by adding a "getCause()" method to
exceptions to allow progressive unravelling of all
exception types.

Unfortunately, Query.extractTerms(Set) is a bit like
solving the Java nested exceptions problem by
providing a method like
Throwable.getMessageStrings(Set) - it only gives part
of the information about the tree elements (ie no
boosts info) and provides no indication of the nested
structure.
Maybe we should have as a standard part of Query:

  //immediate child queries only
  Query [] getNestedQueries();

and...
  //immediate terms only
  Term [] getTerms();


A generic highlighter implementation could then:

a) work with any query type
b) more accurately assess the score contribution each
term provides based on it's position in the stack and
the boosts applied to each parent query on that branch

This doesn't seem a particularly onerous API to
implement and a more feature-rich Query introspection
API may well enable other applications such as Query
optimizers.

Cheers,
Mark






        
        
                
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail 
http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to