Re: check my SpanPayloadCheckQuery

Erik Hatcher Tue, 25 Apr 2017 17:40:41 -0700

Actually, I think I know the issue.  My Solr tests were subject to caching, and 
while they were all intended to be different I think the .equals and .hashCode 
are broken on SpanPayloadCheckQuery.   In my stumbling around, I had also made 
this change which I now suspect is the fix I needed:


$ git diff 
lucene/queries/src/java/org/apache/lucene/queries/payloads/SpanPayloadCheckQuery.java
diff --git 
a/lucene/queries/src/java/org/apache/lucene/queries/payloads/SpanPayloadCheckQuery.java
 
b/lucene/queries/src/java/org/apache/lucene/queries/payloads/SpanPayloadCheckQuery.java
index 0ff594b7ec..e33eb184ce 100644
--- 
a/lucene/queries/src/java/org/apache/lucene/queries/payloads/SpanPayloadCheckQuery.java
+++ 
b/lucene/queries/src/java/org/apache/lucene/queries/payloads/SpanPayloadCheckQuery.java
@@ -18,6 +18,7 @@ package org.apache.lucene.queries.payloads;
 import java.io.IOException;
 import java.util.List;
 import java.util.Map;
+import java.util.Objects;
 import java.util.Set;
 
 import org.apache.lucene.index.IndexReader;
@@ -182,11 +183,15 @@ public class SpanPayloadCheckQuery extends SpanQuery {
   @Override
   public boolean equals(Object other) {
     return sameClassAs(other) &&
-           payloadToMatch.equals(((SpanPayloadCheckQuery) 
other).payloadToMatch);
+           payloadToMatch.equals(((SpanPayloadCheckQuery) 
other).payloadToMatch) &&
+           match.equals(((SpanPayloadCheckQuery) other).match);
   }
 
   @Override
   public int hashCode() {
-    return classHash() ^ payloadToMatch.hashCode();
+    int result = classHash();
+    result = 31 * result + Objects.hashCode(match);
+    result = 31 * result + Objects.hashCode(payloadToMatch);
+    return result;
   }
 }



> On Apr 25, 2017, at 7:50 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
> 
> Alan,
> 
> Thanks again for the quick replies and assistance.  As these things go, I did 
> a clean build and my query parser started working properly.  Actually I wrote 
> some tests to test out the situation I reported and my tests confirmed how I 
> thought it should have worked and that got me to do a clean build.
> 
> Now the numbers look like this:
> 
>     "facet_queries":{
>       "one-two-A":0,
>       "one-two-AB":1,
>       "one-two-ABC":0,
>       "two-three-A":0,
>       "two-three-AB":0,
>       "two-three-ABC":0,
>       "one-two-three-A":0,
>       "one-two-three-AB":0,
>       "one-two-three-ABC":1},
> 
> Ahh, much better!  One the two that match have three clauses and three 
> payloads that match those clauses. 
> 
> My tests are below, at the Lucene level.
> 
>       Erik
> 
> 
> // I don’t think these tests add anything to TestPayloadCheckQuery, but 
> helped me understand things better:
>   public void testTesting() throws Exception {
> 
>     Analyzer simplePayloadAnalyzer = new Analyzer() {
>       @Override
>       public TokenStreamComponents createComponents(String fieldName) {
>         Tokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE, 
> false);
>         return new TokenStreamComponents(tokenizer, new 
> DelimitedPayloadTokenFilter(tokenizer,'|', new IdentityEncoder()));
>       }
>     };
> 
>     directory = newDirectory();
>     RandomIndexWriter writer = new RandomIndexWriter(random(), directory,
>         newIndexWriterConfig(simplePayloadAnalyzer)
>             .setMaxBufferedDocs(TestUtil.nextInt(random(), 100, 
> 1000)).setMergePolicy(newLogMergePolicy()));
>     Document doc = new Document();
>     doc.add(newTextField("field", "one|A two|B three|C", Field.Store.YES));
>     writer.addDocument(doc);
>     reader = writer.getReader();
>     searcher = newSearcher(reader);
>     writer.close();
> 
>     checkMatch("one two", new String[] {"A"}, false);
>     checkMatch("one two", new String[] {"A", "B"}, true);
>     checkMatch("one two", new String[] {"A", "B", "C"}, false);
> 
>     checkMatch("two three", new String[] {"A"}, false);
>     checkMatch("two three", new String[] {"A", "B"}, false);
>     checkMatch("two three", new String[] {"A", "B", "C"}, false);
> 
>     // extra check just to make sure we can match on “two three” with the 
> right payloads
>     checkMatch("two three", new String[] {"B", "C"}, true);
> 
>     checkMatch("one two three", new String[] {"A"}, false);
>     checkMatch("one two three", new String[] {"A", "B"}, false);
>     checkMatch("one two three", new String[] {"A", "B", "C"}, true);
>   }
> 
>   private void checkMatch(String phrase, String[] payloadArray, boolean 
> willMatch) throws IOException {
>     String[] terms = phrase.split(" ");
>     List<SpanQuery> stqs = new ArrayList<SpanQuery>();
>     for (String term : terms) {
>       stqs.add(new SpanTermQuery(new Term("field", term)));
>     }
>     SpanNearQuery snq = new SpanNearQuery(stqs.toArray(new 
> SpanQuery[stqs.size()]), 0, true);
> 
>     IdentityEncoder encoder = new IdentityEncoder();
>     List<BytesRef> payloads = new ArrayList<>();
>     for (String rawPayload : payloadArray) {
>       payloads.add(encoder.encode(rawPayload.toCharArray()));
>     }
> 
>     SpanPayloadCheckQuery spcq = new SpanPayloadCheckQuery(snq, payloads);
>     System.out.println("spcq = " + spcq);
> 
>     checkHits(spcq, willMatch ? new int[] {0} : new int[] {});
>   }
> 
> 
> 
>> On Apr 25, 2017, at 5:44 AM, Alan Woodward <a...@flax.co.uk 
>> <mailto:a...@flax.co.uk>> wrote:
>> 
>> Hm, maybe - a quick look at the tests suggests that we don’t have anything 
>> that explicitly checks more than 2 clauses.  Can you open an issue and add 
>> something to TestPayloadCheckQuery?
>> 
>> 
>>> On 25 Apr 2017, at 10:23, Erik Hatcher <erik.hatc...@gmail.com 
>>> <mailto:erik.hatc...@gmail.com>> wrote:
>>> 
>>> Alan - thanks for the reply.  Given your explanation is there an off by one 
>>> term issue?   The matches I'm seeing would happen if the last term weren't 
>>> considered. 
>>> 
>>> Do you have an example of multiple payloads too?
>>> 
>>>     Erik
>>> 
>>> On Apr 25, 2017, at 04:16, Alan Woodward <a...@flax.co.uk 
>>> <mailto:a...@flax.co.uk>> wrote:
>>> 
>>>> The query will only match a particular span if all the payloads in that 
>>>> span match the passed-in array.  So for example, in your first query, the 
>>>> inner spanNear query matches two terms (words_dps:one and words_dps:two), 
>>>> so it needs to have an array of two payloads to match.
>>>> 
>>>> You can use it for, for example, parts-of-speech tagging; 
>>>> spanPayCheck(spanTerm(text:run), payloadRef:noun) would only match 
>>>> instances of ‘run’ that are tagged as a noun, rather than a verb.
>>>> 
>>>> I can see a case for a separate query that only matches when all of a 
>>>> span’s payloads match a single payload value
>>>> 
>>>> Alan Woodward
>>>> www.flax.co.uk <http://www.flax.co.uk/>
>>>> 
>>>> 
>>>>> On 25 Apr 2017, at 01:40, Erik Hatcher <erik.hatc...@gmail.com 
>>>>> <mailto:erik.hatc...@gmail.com>> wrote:
>>>>> 
>>>>> I’ve started a belated mission to leverage payloads from Solr 
>>>>> (SOLR-1485), mainly from float payload decoding for weighting in scoring, 
>>>>> but while digging in I’m exploring all that payloads now have to offer 
>>>>> including the SpanPayloadCheckQuery.   However, I’m not yet understanding 
>>>>> how to use it effectively, and what kinds of use cases it _really_ is and 
>>>>> can be used for.  
>>>>> 
>>>>> I think it isn’t working as it should, or at least I’m not understanding 
>>>>> its behavior.    Here’s what I’m indexing, by way of the 
>>>>> DelimitedPayloadTokenFilter:
>>>>> 
>>>>>    one|A two|B three|C
>>>>> 
>>>>> and making the following queries (these translate to SpanNearQuery with 
>>>>> zero slop and inOrder=true):
>>>>> 
>>>>>  spanPayCheck(spanNear([words_dps:one, words_dps:two], 0, true), 
>>>>> payloadRef: A;)
>>>>>  *spanPayCheck(spanNear([words_dps:one, words_dps:two], 0, true), 
>>>>> payloadRef: A;B;)
>>>>>  spanPayCheck(spanNear([words_dps:one, words_dps:two], 0, true), 
>>>>> payloadRef: A;B;C;)
>>>>>  spanPayCheck(spanNear([words_dps:two, words_dps:three], 0, true), 
>>>>> payloadRef: A;)
>>>>>  *spanPayCheck(spanNear([words_dps:two, words_dps:three], 0, true), 
>>>>> payloadRef: A;B;)
>>>>>  spanPayCheck(spanNear([words_dps:two, words_dps:three], 0, true), 
>>>>> payloadRef: A;B;C;)
>>>>>  spanPayCheck(spanNear([words_dps:one, words_dps:two, words_dps:three], 
>>>>> 0, true), payloadRef: A;)
>>>>>  *spanPayCheck(spanNear([words_dps:one, words_dps:two, words_dps:three], 
>>>>> 0, true), payloadRef: A;B;)
>>>>>  spanPayCheck(spanNear([words_dps:one, words_dps:two, words_dps:three], 
>>>>> 0, true), payloadRef: A;B;C;)
>>>>> 
>>>>> Only the ones(*) with the payloads array set to “A” and “B” did it match, 
>>>>> all the others failed to match.   Is that expected?   I’m confused on how 
>>>>> the SpanPayloadCheckQuery uses this payloads array to further filter the 
>>>>> matches on the associated SpanQuery.
>>>>> 
>>>>> Could/would someone explain how this query works and why these matches 
>>>>> are working as they are?  Thanks!
>>>>> 
>>>>> Here’s my test platform below:
>>>>> 
>>>>> ——
>>>>> 
>>>>> bin/post -c payloads -type text/csv -out yes -d $'id,words_dps\n1,one|A 
>>>>> two|B three|C'
>>>>> curl http://localhost:8983/solr/payloads/config/params 
>>>>> <http://localhost:8983/solr/payloads/config/params> -H 
>>>>> 'Content-type:application/json'  -d '{
>>>>> "set" : {
>>>>>  "payload-checks": {
>>>>>    "wt":"json",
>>>>>    "indent":"on",
>>>>>    "debug":"query",
>>>>>    "echoParams":"all",
>>>>>    "facet":"on",
>>>>>    "facet.query": [ 
>>>>>         "{!payload_check key=one-two-A f=words_dps payloads=\"A\"}one 
>>>>> two",
>>>>>         "{!payload_check key=one-two-AB f=words_dps payloads=\"A B\"}one 
>>>>> two",
>>>>>         "{!payload_check key=one-two-ABC f=words_dps payloads=\"A B 
>>>>> C\"}one two",
>>>>>         "{!payload_check key=two-three-A f=words_dps payloads=\"A\"}two 
>>>>> three",
>>>>>         "{!payload_check key=two-three-AB f=words_dps payloads=\"A 
>>>>> B\"}two three",
>>>>>         "{!payload_check key=two-three-ABC f=words_dps payloads=\"A B 
>>>>> C\"}two three",
>>>>>         "{!payload_check key=one-two-three-A f=words_dps 
>>>>> payloads=\"A\"}one two three",
>>>>>         "{!payload_check key=one-two-three-AB f=words_dps payloads=\"A 
>>>>> B\"}one two three",
>>>>>         "{!payload_check key=one-two-three-ABC f=words_dps payloads=\"A B 
>>>>> C\"}one two three"
>>>>>      ]
>>>>>    }
>>>>>  }
>>>>> }'   
>>>>> curl 
>>>>> "http://localhost:8983/solr/payloads/select?q=*:*&useParams=payload-checks
>>>>>  
>>>>> <http://localhost:8983/solr/payloads/select?q=*:*&useParams=payload-checks>”
>>>>> 
>>>>>   • facet_queries: {
>>>>>           • one-two-A: 0,
>>>>>           • one-two-AB: 1,
>>>>>           • one-two-ABC: 0,
>>>>>           • two-three-A: 0,
>>>>>           • two-three-AB: 1,
>>>>>           • two-three-ABC: 0,
>>>>>           • one-two-three-A: 0,
>>>>>           • one-two-three-AB: 1,
>>>>>           • one-two-three-ABC: 0
>>>>> },
>>>>> 
>>>>> —
>>>>> 
>>>>> // not necessarily the latest code on SOLR-1485 - construction zone
>>>>>      public Query parse() throws SyntaxError {
>>>>>        String field = localParams.get(QueryParsing.F);
>>>>>        String value = localParams.get(QueryParsing.V);
>>>>>        String pStr = localParams.get("payloads","");
>>>>> 
>>>>>        IdentityEncoder encoder = new IdentityEncoder();
>>>>>        List<BytesRef> payloads = new ArrayList<>();
>>>>>        String[] rawPayloads = pStr.split(" ");
>>>>>        for (String rawPayload : rawPayloads) {
>>>>>          payloads.add(encoder.encode(rawPayload.toCharArray()));
>>>>>        }
>>>>> 
>>>>>        String[] terms = value.split(" ");
>>>>>        List<SpanQuery> stqs = new ArrayList<SpanQuery>();
>>>>>        for (String term : terms) {
>>>>>          stqs.add(new SpanTermQuery(new Term(field, term)));
>>>>>        }
>>>>>        SpanNearQuery snq = new SpanNearQuery(stqs.toArray(new 
>>>>> SpanQuery[0]), 0, true);
>>>>> 
>>>>>        Query spcq = new SpanPayloadCheckQuery(snq, payloads);
>>>>> 
>>>>>        return spcq;
>>>>>      }
>>>>>    };
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
>>>>> <mailto:dev-unsubscr...@lucene.apache.org>
>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org 
>>>>> <mailto:dev-h...@lucene.apache.org>
>>>>> 
>>>> 
>> 
>

Re: check my SpanPayloadCheckQuery

Reply via email to