Hi Uwe,
Thanks a lot for the code. I'm digging into it now!
Cheers,
Cuong
On Mon, Mar 24, 2008 at 7:41 PM, Uwe Goetzke <[EMAIL PROTECTED]>
wrote:
> Hi Cuong ,
>
> I have written a TolerantPhraseScorer starting with the code from
> PhraseScorer but I think I have modified it to much to be generally useful.
> We use it with bigramm clusters and therefore does not need the slop factor
> for scoring but have a tolerance factor (depending on the length of the
> phrase). Here are the most relevant code fragments to start with...
> So the idea is to keep queue ordered (calling firstToLast2 and moveLast).
> I have not yet checked the code for optimisations. If you find one, I would
> be glad to hear about it... ;-)
>
>
> protected TolerantPhrasePositions first, last, reallast; // last point to
> the last tpp for the doc varying from tolerance to phrase size (reallast)
>
> protected int tolerance;
>
> /**
> * similar to PhraseScorer but with a tolerance factor
> *
> * @see PhraseScorer
> */
> TolerantPhraseScorer(Weight weight, TermPositions[] tps, int[]
> positions, Similarity similarity,
> byte[] norms, int tolerance)
> {
> super(similarity);
> this.norms = norms;
> this.weight = weight;
> this.value = weight.getValue();
> this.tolerance = tolerance;
> termsize = 0;
> // convert tps to a list
> for (int i = 0; i < tps.length; i++) {
> if (tps[i] != null) {
> TolerantPhrasePositions pp = new
> TolerantPhrasePositions(tps[i], positions[i]);
> termsize++;
> if (reallast != null) {
> // add next to end of list
> reallast.next = pp;
> pp.previous = reallast;
> }
> else
> first = pp;
> reallast = pp;
> if ((termsize >= tolerance) && (last ==
> null))
> last = pp;
> }
> }
> pq = new TolerantPhraseQueue(termsize); //
> construct empty pq
> }
>
>
> public boolean next() throws IOException
> {
> if (firstTime) {
> init();
> firstTime = false;
> }
> else if (more) {
> int doc = last.doc;
> while (doc == last.doc) {
> more = last.next(); //
> trigger further scanning
> moveLast();
> }
> }
> return doNext();
> }
>
> // next without initial increment
> private boolean doNext() throws IOException
> {
> while (more) {
> while (more && first.doc < last.doc) { // find
> doc w/ all the terms
> more = first.skipTo(last.doc);
> // skip first upto last
> firstToLast2();
> // and move it to the end
> }
> if (more) {
> // found a doc with all of the terms
> freq = phraseFreq();
> // check for phrase
> if (freq == 0.0f) {
> // no match
> int doc = last.doc;
> while (doc == last.doc) {
> more = last.next();
> // trigger further scanning
> moveLast();
> }
> }
> else
> return true;
> // found a match
> }
> }
> return false; // no more
> matches
> }
>
>
> private void firstToLast2()
> {
> TolerantPhrasePositions newfirst = first.next;
> TolerantPhrasePositions test = last;
> TolerantPhrasePositions insertp = test;
> while ((test != null) && (first.doc >= test.doc)) {
> insertp = test;
> test = test.next;
> }
> if (insertp == null) { // last elem should not happen
> System.out.println("firstToLast2->insertp==null");
> }
> else {
> first.previous = insertp; // einkoppeln
> first.next = insertp.next;
> if (first.next != null)
> first.next.previous = first;
> insertp.next = first;
> if (test == null) {
> reallast = first;
> reallast.next = null;
> }
> }
> last = last.next;
> first = newfirst;
> first.previous = null;
> }
>
> private void moveLast()
> {
> TolerantPhrasePositions test = last;
> TolerantPhrasePositions insertp = null;
> while ((test != null) && (last.doc >= test.doc)) {
> insertp = test;
> test = test.next;
> }
> if (insertp == null) { // last elem should not happen
> System.out.println("insertp==null");
> }
> else {
> if (insertp != last) {
> TolerantPhrasePositions prev =
> last.previous; // dequeue
> if (prev != null) { // if only 1 character!
> prev.next = last.next;
> prev.next.previous = prev;
> }
> last.previous = insertp; // enqueue
> last.next = insertp.next;
> if (last.next != null)
> last.next.previous = last;
> insertp.next = last;
>
> if (test == null) {
> reallast = last;
> reallast.next = null;
> }
> if (prev != null) { // if only 1 character!
> last = prev.next;
> }
> }
> }
> }
>
> Best Regards
>
> Uwe
>
>
> -----Ursprüngliche Nachricht-----
> Von: climbingrose [mailto:[EMAIL PROTECTED]
> Gesendet: Montag, 24. März 2008 00:37
> An: java-user
> Betreff: Implement a relaxed PhraseQuery?
>
> Hi all,
>
> I posted this in Solr mailing but then I thought it would be more
> appropriate to have it here.
>
> I thought many people would encounter the situation I'm having here.
> Basically, we'd like to have a PhraseQuery with "minimum should match"
> property similar to BooleanQuery. Consider the query "Senior Java
> Developer":
>
> 1) I'd like to do a PhraseQuery on "Senior Java Developer" with a slop of
> say 2, so that the query only matches documents with these words located
> in
> proximity. I don't want to match documents like "Senior <Huge block of
> text>
> Java <Huge block of Text> Developer".
> 2) I also want to relax PhraseQuery a bit so that it not only match
> "Senior
> Java Developer"~2 but also matches "Java Developer"~2 but of course with a
> lower score. I can programmatically generate on the combination but it's
> not
> gonna be efficient if user issues query with many terms.
>
> It looks like the only solution is to hack PhraseScorer and its
> subclasses.
> Has anyone done this before? If yes, please share your experience.
>
>
> --
> Regards,
>
> Cuong Hoang
>
> -----------------------------------------------------------------------
> Healy Hudson GmbH - D-55252 Mainz Kastel
> Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076
>
> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfänger
> sind, dürfen Sie die Informationen nicht offen legen oder benutzen. Wenn Sie
> diese Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte
> umgehend mit, indem Sie diese Email an den Absender zurückschicken. Bitte
> löschen Sie danach diese Email.
> This email is confidential. If you are not the intended recipient, you
> must not disclose or use this information contained in it. If you have
> received this email in error please tell us immediately by return email and
> delete the document.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>