Actually, I think I know the issue. My Solr tests were subject to caching, and
while they were all intended to be different I think the .equals and .hashCode
are broken on SpanPayloadCheckQuery. In my stumbling around, I had also made
this change which I now suspect is the fix I needed:
$ git diff
lucene/queries/src/java/org/apache/lucene/queries/payloads/SpanPayloadCheckQuery.java
diff --git
a/lucene/queries/src/java/org/apache/lucene/queries/payloads/SpanPayloadCheckQuery.java
b/lucene/queries/src/java/org/apache/lucene/queries/payloads/SpanPayloadCheckQuery.java
index 0ff594b7ec..e33eb184ce 100644
---
a/lucene/queries/src/java/org/apache/lucene/queries/payloads/SpanPayloadCheckQuery.java
+++
b/lucene/queries/src/java/org/apache/lucene/queries/payloads/SpanPayloadCheckQuery.java
@@ -18,6 +18,7 @@ package org.apache.lucene.queries.payloads;
import java.io.IOException;
import java.util.List;
import java.util.Map;
+import java.util.Objects;
import java.util.Set;
import org.apache.lucene.index.IndexReader;
@@ -182,11 +183,15 @@ public class SpanPayloadCheckQuery extends SpanQuery {
@Override
public boolean equals(Object other) {
return sameClassAs(other) &&
- payloadToMatch.equals(((SpanPayloadCheckQuery)
other).payloadToMatch);
+ payloadToMatch.equals(((SpanPayloadCheckQuery)
other).payloadToMatch) &&
+ match.equals(((SpanPayloadCheckQuery) other).match);
}
@Override
public int hashCode() {
- return classHash() ^ payloadToMatch.hashCode();
+ int result = classHash();
+ result = 31 * result + Objects.hashCode(match);
+ result = 31 * result + Objects.hashCode(payloadToMatch);
+ return result;
}
}
> On Apr 25, 2017, at 7:50 PM, Erik Hatcher <[email protected]> wrote:
>
> Alan,
>
> Thanks again for the quick replies and assistance. As these things go, I did
> a clean build and my query parser started working properly. Actually I wrote
> some tests to test out the situation I reported and my tests confirmed how I
> thought it should have worked and that got me to do a clean build.
>
> Now the numbers look like this:
>
> "facet_queries":{
> "one-two-A":0,
> "one-two-AB":1,
> "one-two-ABC":0,
> "two-three-A":0,
> "two-three-AB":0,
> "two-three-ABC":0,
> "one-two-three-A":0,
> "one-two-three-AB":0,
> "one-two-three-ABC":1},
>
> Ahh, much better! One the two that match have three clauses and three
> payloads that match those clauses.
>
> My tests are below, at the Lucene level.
>
> Erik
>
>
> // I don’t think these tests add anything to TestPayloadCheckQuery, but
> helped me understand things better:
> public void testTesting() throws Exception {
>
> Analyzer simplePayloadAnalyzer = new Analyzer() {
> @Override
> public TokenStreamComponents createComponents(String fieldName) {
> Tokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE,
> false);
> return new TokenStreamComponents(tokenizer, new
> DelimitedPayloadTokenFilter(tokenizer,'|', new IdentityEncoder()));
> }
> };
>
> directory = newDirectory();
> RandomIndexWriter writer = new RandomIndexWriter(random(), directory,
> newIndexWriterConfig(simplePayloadAnalyzer)
> .setMaxBufferedDocs(TestUtil.nextInt(random(), 100,
> 1000)).setMergePolicy(newLogMergePolicy()));
> Document doc = new Document();
> doc.add(newTextField("field", "one|A two|B three|C", Field.Store.YES));
> writer.addDocument(doc);
> reader = writer.getReader();
> searcher = newSearcher(reader);
> writer.close();
>
> checkMatch("one two", new String[] {"A"}, false);
> checkMatch("one two", new String[] {"A", "B"}, true);
> checkMatch("one two", new String[] {"A", "B", "C"}, false);
>
> checkMatch("two three", new String[] {"A"}, false);
> checkMatch("two three", new String[] {"A", "B"}, false);
> checkMatch("two three", new String[] {"A", "B", "C"}, false);
>
> // extra check just to make sure we can match on “two three” with the
> right payloads
> checkMatch("two three", new String[] {"B", "C"}, true);
>
> checkMatch("one two three", new String[] {"A"}, false);
> checkMatch("one two three", new String[] {"A", "B"}, false);
> checkMatch("one two three", new String[] {"A", "B", "C"}, true);
> }
>
> private void checkMatch(String phrase, String[] payloadArray, boolean
> willMatch) throws IOException {
> String[] terms = phrase.split(" ");
> List<SpanQuery> stqs = new ArrayList<SpanQuery>();
> for (String term : terms) {
> stqs.add(new SpanTermQuery(new Term("field", term)));
> }
> SpanNearQuery snq = new SpanNearQuery(stqs.toArray(new
> SpanQuery[stqs.size()]), 0, true);
>
> IdentityEncoder encoder = new IdentityEncoder();
> List<BytesRef> payloads = new ArrayList<>();
> for (String rawPayload : payloadArray) {
> payloads.add(encoder.encode(rawPayload.toCharArray()));
> }
>
> SpanPayloadCheckQuery spcq = new SpanPayloadCheckQuery(snq, payloads);
> System.out.println("spcq = " + spcq);
>
> checkHits(spcq, willMatch ? new int[] {0} : new int[] {});
> }
>
>
>
>> On Apr 25, 2017, at 5:44 AM, Alan Woodward <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hm, maybe - a quick look at the tests suggests that we don’t have anything
>> that explicitly checks more than 2 clauses. Can you open an issue and add
>> something to TestPayloadCheckQuery?
>>
>>
>>> On 25 Apr 2017, at 10:23, Erik Hatcher <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> Alan - thanks for the reply. Given your explanation is there an off by one
>>> term issue? The matches I'm seeing would happen if the last term weren't
>>> considered.
>>>
>>> Do you have an example of multiple payloads too?
>>>
>>> Erik
>>>
>>> On Apr 25, 2017, at 04:16, Alan Woodward <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>> The query will only match a particular span if all the payloads in that
>>>> span match the passed-in array. So for example, in your first query, the
>>>> inner spanNear query matches two terms (words_dps:one and words_dps:two),
>>>> so it needs to have an array of two payloads to match.
>>>>
>>>> You can use it for, for example, parts-of-speech tagging;
>>>> spanPayCheck(spanTerm(text:run), payloadRef:noun) would only match
>>>> instances of ‘run’ that are tagged as a noun, rather than a verb.
>>>>
>>>> I can see a case for a separate query that only matches when all of a
>>>> span’s payloads match a single payload value
>>>>
>>>> Alan Woodward
>>>> www.flax.co.uk <http://www.flax.co.uk/>
>>>>
>>>>
>>>>> On 25 Apr 2017, at 01:40, Erik Hatcher <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>> I’ve started a belated mission to leverage payloads from Solr
>>>>> (SOLR-1485), mainly from float payload decoding for weighting in scoring,
>>>>> but while digging in I’m exploring all that payloads now have to offer
>>>>> including the SpanPayloadCheckQuery. However, I’m not yet understanding
>>>>> how to use it effectively, and what kinds of use cases it _really_ is and
>>>>> can be used for.
>>>>>
>>>>> I think it isn’t working as it should, or at least I’m not understanding
>>>>> its behavior. Here’s what I’m indexing, by way of the
>>>>> DelimitedPayloadTokenFilter:
>>>>>
>>>>> one|A two|B three|C
>>>>>
>>>>> and making the following queries (these translate to SpanNearQuery with
>>>>> zero slop and inOrder=true):
>>>>>
>>>>> spanPayCheck(spanNear([words_dps:one, words_dps:two], 0, true),
>>>>> payloadRef: A;)
>>>>> *spanPayCheck(spanNear([words_dps:one, words_dps:two], 0, true),
>>>>> payloadRef: A;B;)
>>>>> spanPayCheck(spanNear([words_dps:one, words_dps:two], 0, true),
>>>>> payloadRef: A;B;C;)
>>>>> spanPayCheck(spanNear([words_dps:two, words_dps:three], 0, true),
>>>>> payloadRef: A;)
>>>>> *spanPayCheck(spanNear([words_dps:two, words_dps:three], 0, true),
>>>>> payloadRef: A;B;)
>>>>> spanPayCheck(spanNear([words_dps:two, words_dps:three], 0, true),
>>>>> payloadRef: A;B;C;)
>>>>> spanPayCheck(spanNear([words_dps:one, words_dps:two, words_dps:three],
>>>>> 0, true), payloadRef: A;)
>>>>> *spanPayCheck(spanNear([words_dps:one, words_dps:two, words_dps:three],
>>>>> 0, true), payloadRef: A;B;)
>>>>> spanPayCheck(spanNear([words_dps:one, words_dps:two, words_dps:three],
>>>>> 0, true), payloadRef: A;B;C;)
>>>>>
>>>>> Only the ones(*) with the payloads array set to “A” and “B” did it match,
>>>>> all the others failed to match. Is that expected? I’m confused on how
>>>>> the SpanPayloadCheckQuery uses this payloads array to further filter the
>>>>> matches on the associated SpanQuery.
>>>>>
>>>>> Could/would someone explain how this query works and why these matches
>>>>> are working as they are? Thanks!
>>>>>
>>>>> Here’s my test platform below:
>>>>>
>>>>> ——
>>>>>
>>>>> bin/post -c payloads -type text/csv -out yes -d $'id,words_dps\n1,one|A
>>>>> two|B three|C'
>>>>> curl http://localhost:8983/solr/payloads/config/params
>>>>> <http://localhost:8983/solr/payloads/config/params> -H
>>>>> 'Content-type:application/json' -d '{
>>>>> "set" : {
>>>>> "payload-checks": {
>>>>> "wt":"json",
>>>>> "indent":"on",
>>>>> "debug":"query",
>>>>> "echoParams":"all",
>>>>> "facet":"on",
>>>>> "facet.query": [
>>>>> "{!payload_check key=one-two-A f=words_dps payloads=\"A\"}one
>>>>> two",
>>>>> "{!payload_check key=one-two-AB f=words_dps payloads=\"A B\"}one
>>>>> two",
>>>>> "{!payload_check key=one-two-ABC f=words_dps payloads=\"A B
>>>>> C\"}one two",
>>>>> "{!payload_check key=two-three-A f=words_dps payloads=\"A\"}two
>>>>> three",
>>>>> "{!payload_check key=two-three-AB f=words_dps payloads=\"A
>>>>> B\"}two three",
>>>>> "{!payload_check key=two-three-ABC f=words_dps payloads=\"A B
>>>>> C\"}two three",
>>>>> "{!payload_check key=one-two-three-A f=words_dps
>>>>> payloads=\"A\"}one two three",
>>>>> "{!payload_check key=one-two-three-AB f=words_dps payloads=\"A
>>>>> B\"}one two three",
>>>>> "{!payload_check key=one-two-three-ABC f=words_dps payloads=\"A B
>>>>> C\"}one two three"
>>>>> ]
>>>>> }
>>>>> }
>>>>> }'
>>>>> curl
>>>>> "http://localhost:8983/solr/payloads/select?q=*:*&useParams=payload-checks
>>>>>
>>>>> <http://localhost:8983/solr/payloads/select?q=*:*&useParams=payload-checks>”
>>>>>
>>>>> • facet_queries: {
>>>>> • one-two-A: 0,
>>>>> • one-two-AB: 1,
>>>>> • one-two-ABC: 0,
>>>>> • two-three-A: 0,
>>>>> • two-three-AB: 1,
>>>>> • two-three-ABC: 0,
>>>>> • one-two-three-A: 0,
>>>>> • one-two-three-AB: 1,
>>>>> • one-two-three-ABC: 0
>>>>> },
>>>>>
>>>>> —
>>>>>
>>>>> // not necessarily the latest code on SOLR-1485 - construction zone
>>>>> public Query parse() throws SyntaxError {
>>>>> String field = localParams.get(QueryParsing.F);
>>>>> String value = localParams.get(QueryParsing.V);
>>>>> String pStr = localParams.get("payloads","");
>>>>>
>>>>> IdentityEncoder encoder = new IdentityEncoder();
>>>>> List<BytesRef> payloads = new ArrayList<>();
>>>>> String[] rawPayloads = pStr.split(" ");
>>>>> for (String rawPayload : rawPayloads) {
>>>>> payloads.add(encoder.encode(rawPayload.toCharArray()));
>>>>> }
>>>>>
>>>>> String[] terms = value.split(" ");
>>>>> List<SpanQuery> stqs = new ArrayList<SpanQuery>();
>>>>> for (String term : terms) {
>>>>> stqs.add(new SpanTermQuery(new Term(field, term)));
>>>>> }
>>>>> SpanNearQuery snq = new SpanNearQuery(stqs.toArray(new
>>>>> SpanQuery[0]), 0, true);
>>>>>
>>>>> Query spcq = new SpanPayloadCheckQuery(snq, payloads);
>>>>>
>>>>> return spcq;
>>>>> }
>>>>> };
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> <mailto:[email protected]>
>>>>> For additional commands, e-mail: [email protected]
>>>>> <mailto:[email protected]>
>>>>>
>>>>
>>
>