thomasmueller commented on code in PR #2193:
URL: https://github.com/apache/jackrabbit-oak/pull/2193#discussion_r2021238402
##########
oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/index/ElasticCustomAnalyzer.java:
##########
@@ -145,49 +157,93 @@ public static IndexSettingsAnalysis.Builder
buildCustomAnalyzers(NodeState state
@NotNull
private static TokenizerDefinition loadTokenizer(NodeState state) {
- String name =
normalize(Objects.requireNonNull(state.getString(FulltextIndexConstants.ANL_NAME)));
- Map<String, Object> args = convertNodeState(state);
+ String name;
+ Map<String, Object> args;
+ if (!state.exists()) {
+ LOG.warn("No tokenizer specified; the standard with an empty
configuration");
+ name = "Standard";
+ args = new HashMap<String, Object>();
+ } else {
+ name =
Objects.requireNonNull(state.getString(FulltextIndexConstants.ANL_NAME));
+ try {
+ args = convertNodeState(state);
+ } catch (IOException e) {
+ LOG.warn("Can not load tokenizer; using an empty
configuration", e);
+ args = new HashMap<String, Object>();
+ }
+ }
+ name = normalize(name);
+ if ("n_gram".equals(name)) {
+ // OAK-11568
+ //
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html
+ Integer minGramSize = getIntegerSetting(args, "minGramSize", 2);
+ Integer maxGramSize = getIntegerSetting(args, "maxGramSize", 3);
+ TokenizerDefinition ngram = TokenizerDefinition.of(t -> t.ngram(
+ NGramTokenizer.of(n ->
n.minGram(minGramSize).maxGram(maxGramSize))));
+ return ngram;
+ }
Review Comment:
Yes, I agree!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]