github-actions[bot] commented on code in PR #64794:
URL: https://github.com/apache/doris/pull/64794#discussion_r3502395460


##########
fe/fe-core/src/main/java/org/apache/doris/indexpolicy/CharReplaceCharFilterValidator.java:
##########
@@ -39,25 +42,20 @@ protected String getTypeName() {
 
     @Override
     protected void validateSpecific(Map<String, String> props) throws 
DdlException {
+        Map<String, String> charFilterProperties = new HashMap<>();
+        
charFilterProperties.put(InvertedIndexUtil.INVERTED_INDEX_PARSER_CHAR_FILTER_TYPE,
+                InvertedIndexUtil.INVERTED_INDEX_CHAR_FILTER_CHAR_REPLACE);
         if (props.containsKey("pattern")) {
-            String pattern = props.get("pattern");
-            if (pattern != null && !pattern.isEmpty()) {
-                for (int i = 0; i < pattern.length(); i++) {
-                    if (pattern.charAt(i) > 255) {
-                        throw new DdlException(
-                                "pattern must contain only single-byte 
characters in [0,255]");
-                    }
-                }
-            }
+            
charFilterProperties.put(InvertedIndexUtil.INVERTED_INDEX_PARSER_CHAR_FILTER_PATTERN,
 props.get("pattern"));
         }
         if (props.containsKey("replacement")) {
-            String replacement = props.get("replacement");
-            if (replacement == null || replacement.length() != 1) {
-                throw new DdlException("replacement must be exactly one byte");
-            }
-            if (replacement.charAt(0) > 255) {
-                throw new DdlException("replacement must be in [0,255]");
-            }
+            
charFilterProperties.put(InvertedIndexUtil.INVERTED_INDEX_PARSER_CHAR_FILTER_REPLACEMENT,
+                    props.get("replacement"));
+        }
+        try {

Review Comment:
   Reusing `checkCharFilterProperties()` here changes the custom-analyzer 
policy contract for `char_replace`: a policy such as `CREATE INVERTED INDEX 
CHAR_FILTER cf PROPERTIES("type"="char_replace")` used to be valid, because FE 
stores the policy properties unchanged, BE strips only `type` when building 
`Settings`, and `CharReplaceCharFilterFactory` defaults a missing `pattern` to 
`,._`. After this bridge map always supplies legacy `char_filter_type`, the 
shared helper rejects the same policy with `Missing 'char_filter_pattern'...` 
before BE can apply that default. Please keep the stricter byte/ASCII 
replacement checks, but do not require `pattern` for the policy path unless the 
BE default is also removed intentionally and existing custom-analyzer usages 
are migrated/tested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to