[GitHub] [solr] dsmiley commented on a change in pull request #129: SOLR-15407 untokenized field type with sow=false fix + tests

GitBox Mon, 24 May 2021 08:59:16 -0700


dsmiley commented on a change in pull request #129:
URL: https://github.com/apache/solr/pull/129#discussion_r637963928




##########
File path: 
solr/core/src/test/org/apache/solr/search/TestExtendedDismaxParser.java
##########
@@ -1771,6 +1787,35 @@ public void testSplitOnWhitespace_Basic() throws 
Exception {
     assertThat(parsedquery, anyOf(containsString("((name:stigma | 
title:stigma))"), containsString("((title:stigma | name:stigma))")));
   }
 
+    @Test 
+    public void testSplitOnWhitespace_stringField_shouldBuildSingleClause() 
throws Exception
+    {
+        assertJQ(req("qf", "trait_ss", "defType", "edismax", "q", "multi 
term", "sow", "false"),
+            "/response/numFound==1", "/response/docs/[0]/id=='75'");
+
+        String parsedquery = getParsedQuery(
+            req("qf", "trait_ss", "q", "multi term", "defType", "edismax", 
"sow", "false", "debugQuery", "true"));
+        assertThat(parsedquery, anyOf(containsString("((trait_ss:multi 
term))")));
+    }
+
+    @Test
+    public void 
testSplitOnWhitespace_numericField_shouldBuildAlwaysMultiClause() throws 
Exception

Review comment:
       Again, just drop "testSplitOnWhitespace_" from the method name, I think.

##########
File path: solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java
##########
@@ -1149,18 +1151,26 @@ protected Query getFieldQuery(String field, 
List<String> queryTerms, boolean raw
         return newFieldQuery
             (getAnalyzer(), field, queryText, false, 
fieldAutoGenPhraseQueries, fieldEnableGraphQueries, synonymQueryStyle);
       } else {
-        if (raw) {
+        if (raw) {// assumption: raw = false only when called from 
ExtendedDismaxQueryParser.getQuery()
           return new RawQuery(sf, queryTerms);
         } else {
           if (queryTerms.size() == 1) {
             return ft.getFieldQuery(parser, sf, queryTerms.get(0));
+          } else if(ft instanceof StrField){
+            String queryText = String.join(" ", queryTerms);
+            return ft.getFieldQuery(parser, sf, queryText);
           } else {
             List<Query> subqs = new ArrayList<>();
             for (String queryTerm : queryTerms) {
               try {
                 subqs.add(ft.getFieldQuery(parser, sf, queryTerm));
-              } catch (Exception e) { // assumption: raw = false only when 
called from ExtendedDismaxQueryParser.getQuery()
-                // for edismax: ignore parsing failures
+              } catch (Exception e) {
+                /*
+                This happens when a field tries to parse a query term of 
incompatible type
+                e.g.
+                a numerical field trying to parse a textual query term
+                 */
+                subqs.add(new MatchNoDocsQuery(queryTerm + " is not compatible 
with " + field));

Review comment:
       It appears this change (the addition of MatchNoDocsQuery here) has no 
effect but maybe I'm mistaken?

##########
File path: 
solr/core/src/test/org/apache/solr/search/TestExtendedDismaxParser.java
##########
@@ -1771,6 +1787,35 @@ public void testSplitOnWhitespace_Basic() throws 
Exception {
     assertThat(parsedquery, anyOf(containsString("((name:stigma | 
title:stigma))"), containsString("((title:stigma | name:stigma))")));
   }
 
+    @Test 
+    public void testSplitOnWhitespace_stringField_shouldBuildSingleClause() 
throws Exception

Review comment:
       Based on the test name, I'd expect sow=true each time.  Maybe just drop 
this part of the method name.

##########
File path: solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java
##########
@@ -1149,18 +1151,21 @@ protected Query getFieldQuery(String field, 
List<String> queryTerms, boolean raw
         return newFieldQuery
             (getAnalyzer(), field, queryText, false, 
fieldAutoGenPhraseQueries, fieldEnableGraphQueries, synonymQueryStyle);
       } else {
-        if (raw) {
+        if (raw) {// assumption: raw = false only when called from 
ExtendedDismaxQueryParser.getQuery()
           return new RawQuery(sf, queryTerms);
         } else {
           if (queryTerms.size() == 1) {
             return ft.getFieldQuery(parser, sf, queryTerms.get(0));
+          } else if(ft instanceof StrField){

Review comment:
       In essence, I think the behavior I see here was correct *before* -- no 
special case for either StrField or numerics. In the context of the logic that 
reaches this point, the field is already ft.isTokenized==false.

##########
File path: 
solr/core/src/test/org/apache/solr/search/TestExtendedDismaxParser.java
##########
@@ -1771,6 +1787,35 @@ public void testSplitOnWhitespace_Basic() throws 
Exception {
     assertThat(parsedquery, anyOf(containsString("((name:stigma | 
title:stigma))"), containsString("((title:stigma | name:stigma))")));
   }
 
+    @Test 
+    public void testSplitOnWhitespace_stringField_shouldBuildSingleClause() 
throws Exception
+    {
+        assertJQ(req("qf", "trait_ss", "defType", "edismax", "q", "multi 
term", "sow", "false"),

Review comment:
       This is a change in behavior, and I think it's not a good change.  For a 
non-tokenized field (StrField in this case), I think we should ignore whatever 
"sow" is and split on whitespace any way, thus here have two terms to match.  
It would be straight-forward to document this (no differences between numbers 
and StrField).
   
   I think it could be reasonable to try both ways (both split and don't split) 
and then put a DisjunctionMaxQuery over the two, though I'd prefer not.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[GitHub] [solr] dsmiley commented on a change in pull request #129: SOLR-15407 untokenized field type with sow=false fix + tests

Reply via email to