[GitHub] [jackrabbit-oak] nfsantos commented on a diff in pull request #649: OAK-9875 - Prefix query on a Long, analyzed field fails when executed over Elastic

GitBox Fri, 19 Aug 2022 08:32:42 -0700


nfsantos commented on code in PR #649:
URL: https://github.com/apache/jackrabbit-oak/pull/649#discussion_r950319566



##########
oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/index/ElasticIndexHelper.java:
##########
@@ -207,21 +222,57 @@ private static void mapIndexRules(ElasticIndexDefinition 
indexDefinition, XConte
                     useInSuggest = true;
                 }
             }
-
+            /* TODO: (Add JIRA issue # once available) Potential future 
optimization:
+             *  If a property has propertyIndex=false, we could in principle 
create an ES mapping with index=false to
+             *  avoid creating an index for this property, thus saving time 
and space. However, there is an issue
+             *  with function indexes, which incorrectly rely on the index 
being created even in the case where
+             *  propertyIndex=false. Once this issue is solved, revise the 
logic below to set index=false in the ES
+             *  field if propertyIndex=false.
+             */
             mappingBuilder.startObject(name);
             {
                 // 
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html
                 if (Type.BINARY.equals(type)) {
                     mappingBuilder.field("type", "binary");
+                    // ignore_malformed not allowed for binary fields but also 
not needed as there is no type conversion
+                    if (indexDefinition.isAnalyzed(propertyDefinitions)) {
+                        createNestedAnalyzed(mappingBuilder);
+                    }
                 } else if (Type.LONG.equals(type)) {
                     mappingBuilder.field("type", "long");
+                    mappingBuilder.field("ignore_malformed", true);
+                    if (indexDefinition.isAnalyzed(propertyDefinitions)) {
+                        createNestedAnalyzed(mappingBuilder);
+                    }
                 } else if (Type.DOUBLE.equals(type) || 
Type.DECIMAL.equals(type)) {
                     mappingBuilder.field("type", "double");
+                    mappingBuilder.field("ignore_malformed", true);
+                    if (indexDefinition.isAnalyzed(propertyDefinitions)) {
+                        createNestedAnalyzed(mappingBuilder);
+                    }
                 } else if (Type.DATE.equals(type)) {
                     mappingBuilder.field("type", "date");
+                    mappingBuilder.field("ignore_malformed", true);
+                    if (indexDefinition.isAnalyzed(propertyDefinitions)) {
+                        createNestedAnalyzed(mappingBuilder);
+                    }
                 } else if (Type.BOOLEAN.equals(type)) {
                     mappingBuilder.field("type", "boolean");
+                    // ES does not support "ignore_malformed" on boolean 
properties, so all values added to the index
+                    // must parse as valid booleans or else the document is 
rejected. We therefore also do not support
+                    // the nested full text field, because to do so we would 
need to provide a value for the top level
+                    // field, which must be a valid value. As in this case we 
cannot set ignore_malformed=true to still
+                    // index as full-text the invalid values, we cannot have a 
full-text index. This breaks compatibility
+                    // with Lucene, which accepts any value, even if it is not 
a valid boolean, but this is unlikely to
+                    // have much impact because setting analyzed=true in a 
boolean property should have no real world
+                    // use case.
+                    if (indexDefinition.isAnalyzed(propertyDefinitions)) {
+                        LOG.warn("Property {} of type boolean does not support 
analyzed=true.", name);
+                    }
                 } else {
+                    // OAK-9875 - The ES field with the same name as the 
property should not be used for the full text
+                    // index, it should be used only for the keyword index. 
The full text index is done separately below.
+                    // We preserve the old mapping in case we find some 
problem in production with the new mapping.

Review Comment:
   The comment was out of date. I replaced it by the following:
   ```
   // OAK-9875 - For String properties, the full-text index is stored at the 
top-level field. This is
   // the opposite of what is done for non-String properties, where the full 
text field is nested, as
   // propa.text. We do this to keep backwards compatibility with pre OAK=9875 
indexes.
   ```
   The comment about backwards compatibility is still valid. However, if we 
want to clean the code by breaking backwards compatibility, there will not be a 
better time than now when this module is still not in widespread usage. In the 
future it will be more costly and we will likely not do it. Or do you see any 
plan where we could change the format of the indexes without causing much 
disruption in the future?
   
   I think adding a flag indicating that this is just for backwards 
compatibility is in principle a good idea, but I'm afraid that we will never 
get around to remove this behaviour and this logic will remain forever.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [jackrabbit-oak] nfsantos commented on a diff in pull request #649: OAK-9875 - Prefix query on a Long, analyzed field fails when executed over Elastic

Reply via email to