Hi Grant,
I'll try to isolate parts of the project in order to make a patch. It should
not take long but as I'm really busy don't expect it to soon ;) BTW, would
be simpler for me to get some help because there are things that seem hard
to understand (the problem I left at work yesterday was a mysterious next()
method on NearSpans that only has 1 submatch and returns true, while I
thought it couldn't be possible ;)).
As for the collect() extra parameter : a HitCollector (object, btw, think an
interface would be great there too) has only a collect(int doc, float score)
method. What I propose would be an extra
collect(int doc, float score, DocumentMatchesHolder matches) method (if
matches==null, fallback on the default collect). I thought about an "Object"
because other people could need different data, but sure this makes more
sense with strong type.
Cedric
Grant Ingersoll-5 wrote:
>
> Hi Cedric,
>
> Thanks for the detailed response. My suggestion would be to write up
> a set of patches that demonstrate what you want for the SpanQuery
> stuff, and the BooleanQuery stuff, preferably as separate patches.
> The SpanQuery stuff makes the most sense to me and since I am slowly,
> but surely, working on it, I could try to incorporate it.
>
> As for the HitCollector, I am not exactly sure what you are trying to
> get at there. What Object is going to be passed in? Is it the Match
> object? What would it mean for other implementations that aren't
> using a Match object? How would it be incorporated into Lucene for a
> general case? Again, a patch here may make it obvious.
>
> -Grant
>
> On Sep 22, 2007, at 5:45 AM, melix wrote:
>
>>
>> Hi all,
>>
>> Sorry for the late response, I've been quite busy (working on my
>> Lucene
>> tweak, and still not finished ;)). Basically, I need to be able to
>> find out
>> what matched on a document basis on a complex query. For example,
>> in a OR
>> clause, I need to know which of the sub(s) clause(s) have matched,
>> and,
>> going deeper in the query tree, for each subclause itself, find out
>> what
>> matched. This is made to be able to score documents with semantics
>> reasoning.
>>
>> As I want to limit breaking Lucene compatibility, I've decided to
>> try, as
>> most as possible, to subclass Lucene classes. This is where it
>> starts to be
>> difficult. So I've subclassed (most of) span queries classes so
>> that the
>> getSpans() method returns my own span interface :
>>
>> public interface IExtendedSpans extends Spans,IMatcher {
>> }
>>
>> public interface IMatcher {
>> Match match();
>> }
>>
>> The reason why I have a separate IMatcher interface is that span
>> queries are
>> not the only queries which may "return" matches. We'll see this
>> later. So I
>> implemented my own SpanNearQuery, which inherits the Lucene SNQ, so
>> that
>> when a span is found, I can return the corresponding match. A match
>> is a
>> collection of submatches, and I've decided to subclass the Match
>> class for
>> each query type (this makes algorithms more readable, and easier to
>> write).
>> For a span near query, the match() method will basically return a
>> SpanNearMatch, and so on.
>>
>> Problem : the Lucene span queries members are private -not
>> protected-, so
>> subclasses cannot use them. For example, my subclass needs access
>> to the
>> clauses, and I have to use the getter while I could directly use
>> the member
>> (performance implication). Next, the spans subclasses are private
>> static
>> classes, and I have to rewrite them to return *my* spans. So in this
>> particular point, this is really annoying because I have to copy
>> the exact
>> inner classes (if not anonymous...) just to add my match() method.
>> This is
>> annoying because by doing this, I'm potentially breaking
>> compatibility with
>> future releases of Lucene.
>>
>> The problem was even harder when I had to add the match() method to
>> the
>> BooleanQuery : this class is so complex, and uses so many protected
>> or inner
>> classes (for optimization purposes, I must understand) that I would
>> have to
>> copy a lot of the original source code just to add my method. As
>> documentation on how it works is really hard to find, I decided it
>> would be
>> simpler if I wrote my own boolean queries (which is what I've done
>> now). I
>> know it must be much less performant, but makes the tasks much easier.
>>
>> By the way, it would really be glad if the you could extract an
>> interface
>> from the Query class. As all my queries implement an interface (to
>> be sure
>> that you don't mix queries which support the match feature with
>> ones that
>> don't), it would avoid many casts (the other solution would be that I
>> extract the interface myself and make my IMatchAwareQuery interface
>> have
>> those methods, but I'm sure it would be cleaner if this was
>> directly in
>> Lucene).
>>
>> Last but not least, it would be glad if the HitCollector class had a
>> collect() method with an Object parameter : the scoring I'm using
>> cannot
>> just work on a collection of floats. It requires the matches, so
>> I'm passing
>> a DocMatchesHolder instance to my HitCollector so that it can work
>> on it.
>> This leads to the following (and not really clean) code recopied in
>> my top
>> level Scorer implementations :
>>
>> public void score(HitCollector aHitCollector) throws IOException {
>> if (aHitCollector instanceof SearchingContext) {
>> SearchingContext ctx = (SearchingContext) aHitCollector;
>> while (next()) {
>> final DocMatchesHolder doc = docMatches();
>> final float score = score();
>> ctx.addHit(doc, score);
>> ctx.collect(doc(), score);
>> }
>> } else super.score(aHitCollector);
>> }
>>
>> Thanks for reading ;)
>>
>> Cedric
>> --
>> View this message in context: http://www.nabble.com/Span-queries%2C-
>> API-and-difficulties-tf4500460.html#a12835063
>> Sent from the Lucene - Java Developer mailing list archive at
>> Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>
> ------------------------------------------------------
> Grant Ingersoll
> http://www.grantingersoll.com/
> http://lucene.grantingersoll.com
> http://www.paperoftheweek.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/Span-queries%2C-API-and-difficulties-tf4500460.html#a12836259
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]