alessandrobenedetti commented on code in PR #4259:
URL: https://github.com/apache/solr/pull/4259#discussion_r3122738302


##########
solr/modules/language-models/src/java/org/apache/solr/languagemodels/documentenrichment/update/processor/DocumentEnrichmentUpdateProcessorFactory.java:
##########
@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.languagemodels.documentenrichment.update.processor;
+
+import dev.langchain4j.model.chat.request.ResponseFormat;
+import dev.langchain4j.model.chat.request.ResponseFormatType;
+import dev.langchain4j.model.chat.request.json.JsonArraySchema;
+import dev.langchain4j.model.chat.request.json.JsonBooleanSchema;
+import dev.langchain4j.model.chat.request.json.JsonIntegerSchema;
+import dev.langchain4j.model.chat.request.json.JsonNumberSchema;
+import dev.langchain4j.model.chat.request.json.JsonObjectSchema;
+import dev.langchain4j.model.chat.request.json.JsonSchema;
+import dev.langchain4j.model.chat.request.json.JsonSchemaElement;
+import dev.langchain4j.model.chat.request.json.JsonStringSchema;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.charset.StandardCharsets;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.RequiredSolrParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.core.SolrResourceLoader;
+import org.apache.solr.languagemodels.documentenrichment.model.SolrChatModel;
+import 
org.apache.solr.languagemodels.documentenrichment.store.rest.ManagedChatModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.rest.ManagedResource;
+import org.apache.solr.rest.ManagedResourceObserver;
+import org.apache.solr.schema.BoolField;
+import org.apache.solr.schema.DatePointField;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.DoublePointField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.FloatPointField;
+import org.apache.solr.schema.IndexSchema;
+import org.apache.solr.schema.IntPointField;
+import org.apache.solr.schema.LongPointField;
+import org.apache.solr.schema.NestPathField;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.schema.StrField;
+import org.apache.solr.schema.TextField;
+import org.apache.solr.schema.UUIDField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+import org.apache.solr.util.plugin.SolrCoreAware;
+
+/**
+ * Generate the content of a field based on other fields specified as input.
+ *
+ * <p>One or more {@code inputField} parameters specify the Solr fields to use 
as input. Each field
+ * name must appear as a {@code {fieldName}} placeholder in the prompt. 
Exactly one of {@code
+ * prompt} or {@code promptFile} must be provided.
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;title_field&lt;/str&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;body_field&lt;/str&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;prompt&quot;&gt;Title: {title_field}. Body: 
{body_field}.&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Multiple {@code inputField} values can also be declared as an array 
using {@code arr}:
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;arr name=&quot;inputField&quot;&gt;
+ *     &lt;str&gt;title_field&lt;/str&gt;
+ *     &lt;str&gt;body_field&lt;/str&gt;
+ *   &lt;/arr&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;prompt&quot;&gt;Title: {title_field}. Body: 
{body_field}.&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Alternatively, the prompt can be loaded from a text file using {@code 
promptFile}:
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;title_field&lt;/str&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;promptFile&quot;&gt;prompt.txt&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Validation rules:
+ *
+ * <ul>
+ *   <li>At least one {@code inputField} must be declared.
+ *   <li>Exactly one of {@code prompt} or {@code promptFile} must be provided.
+ *   <li>Every declared {@code inputField} must have a corresponding {@code 
{fieldName}} placeholder
+ *       in the prompt.
+ *   <li>Every {@code {placeholder}} in the prompt must correspond to a 
declared {@code inputField}.
+ * </ul>
+ */

Review Comment:
   The comment is full of encoded symbols, it's basically unreadable, possible 
built for Javadocs only, but in general if a comment is there it should be 
readable also simply via code



##########
solr/modules/language-models/src/java/org/apache/solr/languagemodels/documentenrichment/update/processor/DocumentEnrichmentUpdateProcessorFactory.java:
##########
@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.languagemodels.documentenrichment.update.processor;
+
+import dev.langchain4j.model.chat.request.ResponseFormat;
+import dev.langchain4j.model.chat.request.ResponseFormatType;
+import dev.langchain4j.model.chat.request.json.JsonArraySchema;
+import dev.langchain4j.model.chat.request.json.JsonBooleanSchema;
+import dev.langchain4j.model.chat.request.json.JsonIntegerSchema;
+import dev.langchain4j.model.chat.request.json.JsonNumberSchema;
+import dev.langchain4j.model.chat.request.json.JsonObjectSchema;
+import dev.langchain4j.model.chat.request.json.JsonSchema;
+import dev.langchain4j.model.chat.request.json.JsonSchemaElement;
+import dev.langchain4j.model.chat.request.json.JsonStringSchema;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.charset.StandardCharsets;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.RequiredSolrParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.core.SolrResourceLoader;
+import org.apache.solr.languagemodels.documentenrichment.model.SolrChatModel;
+import 
org.apache.solr.languagemodels.documentenrichment.store.rest.ManagedChatModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.rest.ManagedResource;
+import org.apache.solr.rest.ManagedResourceObserver;
+import org.apache.solr.schema.BoolField;
+import org.apache.solr.schema.DatePointField;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.DoublePointField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.FloatPointField;
+import org.apache.solr.schema.IndexSchema;
+import org.apache.solr.schema.IntPointField;
+import org.apache.solr.schema.LongPointField;
+import org.apache.solr.schema.NestPathField;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.schema.StrField;
+import org.apache.solr.schema.TextField;
+import org.apache.solr.schema.UUIDField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+import org.apache.solr.util.plugin.SolrCoreAware;
+
+/**
+ * Generate the content of a field based on other fields specified as input.
+ *
+ * <p>One or more {@code inputField} parameters specify the Solr fields to use 
as input. Each field
+ * name must appear as a {@code {fieldName}} placeholder in the prompt. 
Exactly one of {@code
+ * prompt} or {@code promptFile} must be provided.
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;title_field&lt;/str&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;body_field&lt;/str&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;prompt&quot;&gt;Title: {title_field}. Body: 
{body_field}.&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Multiple {@code inputField} values can also be declared as an array 
using {@code arr}:
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;arr name=&quot;inputField&quot;&gt;
+ *     &lt;str&gt;title_field&lt;/str&gt;
+ *     &lt;str&gt;body_field&lt;/str&gt;
+ *   &lt;/arr&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;prompt&quot;&gt;Title: {title_field}. Body: 
{body_field}.&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Alternatively, the prompt can be loaded from a text file using {@code 
promptFile}:
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;title_field&lt;/str&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;promptFile&quot;&gt;prompt.txt&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Validation rules:
+ *
+ * <ul>
+ *   <li>At least one {@code inputField} must be declared.
+ *   <li>Exactly one of {@code prompt} or {@code promptFile} must be provided.
+ *   <li>Every declared {@code inputField} must have a corresponding {@code 
{fieldName}} placeholder
+ *       in the prompt.
+ *   <li>Every {@code {placeholder}} in the prompt must correspond to a 
declared {@code inputField}.
+ * </ul>
+ */
+public class DocumentEnrichmentUpdateProcessorFactory extends 
UpdateRequestProcessorFactory
+    implements SolrCoreAware, ManagedResourceObserver {
+  private static final String INPUT_FIELD_PARAM = "inputField";
+  private static final String OUTPUT_FIELD_PARAM = "outputField";
+  private static final String PROMPT = "prompt";
+  private static final String PROMPT_FILE = "promptFile";
+  private static final String MODEL_NAME = "model";
+  private static final Pattern PLACEHOLDER_PATTERN = 
Pattern.compile("\\{([^}]+)\\}");
+
+  private List<String> inputFields;
+  private String outputField;
+  private String promptText;
+  private String promptFile;
+  private String modelName;
+
+  @Override
+  public void init(final NamedList<?> args) {
+    // removeConfigArgs handles both multiple <str name="inputField"> and <arr 
name="inputField">
+    // and must be called before toSolrParams() since it mutates args in place
+    Collection<String> fieldNames = args.removeConfigArgs(INPUT_FIELD_PARAM);
+    if (fieldNames.isEmpty()) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR, "At least one 'inputField' 
must be provided");
+    }
+    inputFields = List.copyOf(fieldNames);
+
+    Collection<String> outputFields = 
args.removeConfigArgs(OUTPUT_FIELD_PARAM);
+    if (outputFields.isEmpty()) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR, "Exactly one 'outputField' 
must be provided");
+    }
+    if (outputFields.size() > 1) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "Only one 'outputField' can be provided, but found: " + 
outputFields);
+    }
+    outputField = outputFields.iterator().next();
+
+    SolrParams params = args.toSolrParams();
+    RequiredSolrParams required = params.required();
+    modelName = required.get(MODEL_NAME);
+
+    String inlinePrompt = params.get(PROMPT);
+    String promptFilePath = params.get(PROMPT_FILE);
+
+    if (inlinePrompt == null && promptFilePath == null) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR, "Either 'prompt' or 
'promptFile' must be provided");
+    }
+    if (inlinePrompt != null && promptFilePath != null) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "Only one of 'prompt' or 'promptFile' can be provided, not both");
+    }
+    if (inlinePrompt != null) {
+      validatePromptPlaceholders(inlinePrompt, inputFields);
+      this.promptText = inlinePrompt;
+    }
+    this.promptFile = promptFilePath;
+  }
+
+  @Override
+  public void inform(SolrCore core) {
+    final SolrResourceLoader solrResourceLoader = core.getResourceLoader();
+    ManagedChatModelStore.registerManagedChatModelStore(solrResourceLoader, 
this);
+    if (promptFile != null) {
+      try (InputStream is = solrResourceLoader.openResource(promptFile)) {
+        promptText = new String(is.readAllBytes(), 
StandardCharsets.UTF_8).trim();
+      } catch (IOException e) {
+        throw new SolrException(
+            SolrException.ErrorCode.SERVER_ERROR, "Cannot read prompt file: " 
+ promptFile, e);
+      }
+      validatePromptPlaceholders(promptText, inputFields);
+    }
+  }
+
+  @Override
+  public void onManagedResourceInitialized(NamedList<?> args, ManagedResource 
res)
+      throws SolrException {
+    if (res instanceof ManagedChatModelStore store) {
+      store.loadStoredModels();
+    }
+  }
+
+  @Override
+  public UpdateRequestProcessor getInstance(
+      SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor 
next) {
+    IndexSchema latestSchema = req.getCore().getLatestSchema();
+
+    for (String fieldName : inputFields) {
+      if (!latestSchema.isDynamicField(fieldName) && 
!latestSchema.hasExplicitField(fieldName)) {
+        throw new SolrException(
+            SolrException.ErrorCode.SERVER_ERROR, "undefined field: \"" + 
fieldName + "\"");
+      }
+    }
+
+    final SchemaField outputFieldSchema = latestSchema.getField(outputField);
+
+    ResponseFormat responseFormat = buildResponseFormat(outputFieldSchema);
+    boolean multiValued = outputFieldSchema.multiValued();
+
+    ManagedChatModelStore store = 
ManagedChatModelStore.getManagedModelStore(req.getCore());
+    SolrChatModel chatModel = store.getModel(modelName);
+    if (chatModel == null) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "The model configured in the Update Request Processor '"
+              + modelName
+              + "' can't be found in the store: "
+              + ManagedChatModelStore.REST_END_POINT);
+    }
+
+    return new DocumentEnrichmentUpdateProcessor(
+        inputFields, outputField, promptText, chatModel, multiValued, 
responseFormat, req, next);
+  }
+
+  /**
+   * Builds a {@link ResponseFormat} that instructs the model to return a JSON 
object {@code
+   * {"value": ...}} whose value type matches the Solr field type. For 
multivalued fields the value
+   * is wrapped in a {@link JsonArraySchema} nested inside the root {@link 
JsonObjectSchema}.
+   *
+   * <p>Nesting {@link JsonArraySchema} inside a {@link JsonObjectSchema} 
property is supported by
+   * all langchain4j providers that implement structured outputs with {@link 
JsonObjectSchema}
+   * (OpenAI, Azure OpenAI, Google AI, Gemini, Mistral, Ollama, Amazon 
Bedrock, Watsonx).
+   */
+  static ResponseFormat buildResponseFormat(SchemaField schemaField) {
+    JsonSchemaElement valueElement = 
toJsonSchemaElement(schemaField.getType());
+    JsonSchemaElement valueSchema =
+        schemaField.multiValued()
+            ? JsonArraySchema.builder().items(valueElement).build()
+            : valueElement;
+    return ResponseFormat.builder()
+        .type(ResponseFormatType.JSON)
+        .jsonSchema(
+            JsonSchema.builder()
+                .name("output")
+                .rootElement(
+                    JsonObjectSchema.builder()
+                        .addProperty("value", valueSchema)
+                        .required("value")
+                        .build())
+                .build())
+        .build();
+  }

Review Comment:
   will need to clarify this via a call



##########
solr/modules/language-models/src/test-files/modelChatExamples/dummy-chat-model-unsupported.json:
##########
@@ -0,0 +1,8 @@
+{
+  "class": 
"org.apache.solr.languagemodels.documentenrichment.model.DummyChatModel",
+  "name": "dummy-chat-1",
+  "params": {
+    "response": "enriched content",
+    "unsupported": 10

Review Comment:
   unsupported?



##########
solr/modules/language-models/src/java/org/apache/solr/languagemodels/documentenrichment/update/processor/DocumentEnrichmentUpdateProcessorFactory.java:
##########
@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.languagemodels.documentenrichment.update.processor;
+
+import dev.langchain4j.model.chat.request.ResponseFormat;
+import dev.langchain4j.model.chat.request.ResponseFormatType;
+import dev.langchain4j.model.chat.request.json.JsonArraySchema;
+import dev.langchain4j.model.chat.request.json.JsonBooleanSchema;
+import dev.langchain4j.model.chat.request.json.JsonIntegerSchema;
+import dev.langchain4j.model.chat.request.json.JsonNumberSchema;
+import dev.langchain4j.model.chat.request.json.JsonObjectSchema;
+import dev.langchain4j.model.chat.request.json.JsonSchema;
+import dev.langchain4j.model.chat.request.json.JsonSchemaElement;
+import dev.langchain4j.model.chat.request.json.JsonStringSchema;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.charset.StandardCharsets;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.RequiredSolrParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.core.SolrResourceLoader;
+import org.apache.solr.languagemodels.documentenrichment.model.SolrChatModel;
+import 
org.apache.solr.languagemodels.documentenrichment.store.rest.ManagedChatModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.rest.ManagedResource;
+import org.apache.solr.rest.ManagedResourceObserver;
+import org.apache.solr.schema.BoolField;
+import org.apache.solr.schema.DatePointField;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.DoublePointField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.FloatPointField;
+import org.apache.solr.schema.IndexSchema;
+import org.apache.solr.schema.IntPointField;
+import org.apache.solr.schema.LongPointField;
+import org.apache.solr.schema.NestPathField;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.schema.StrField;
+import org.apache.solr.schema.TextField;
+import org.apache.solr.schema.UUIDField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+import org.apache.solr.util.plugin.SolrCoreAware;
+
+/**
+ * Generate the content of a field based on other fields specified as input.
+ *
+ * <p>One or more {@code inputField} parameters specify the Solr fields to use 
as input. Each field
+ * name must appear as a {@code {fieldName}} placeholder in the prompt. 
Exactly one of {@code
+ * prompt} or {@code promptFile} must be provided.
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;title_field&lt;/str&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;body_field&lt;/str&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;prompt&quot;&gt;Title: {title_field}. Body: 
{body_field}.&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Multiple {@code inputField} values can also be declared as an array 
using {@code arr}:
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;arr name=&quot;inputField&quot;&gt;
+ *     &lt;str&gt;title_field&lt;/str&gt;
+ *     &lt;str&gt;body_field&lt;/str&gt;
+ *   &lt;/arr&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;prompt&quot;&gt;Title: {title_field}. Body: 
{body_field}.&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Alternatively, the prompt can be loaded from a text file using {@code 
promptFile}:
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;title_field&lt;/str&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;promptFile&quot;&gt;prompt.txt&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Validation rules:
+ *
+ * <ul>
+ *   <li>At least one {@code inputField} must be declared.
+ *   <li>Exactly one of {@code prompt} or {@code promptFile} must be provided.
+ *   <li>Every declared {@code inputField} must have a corresponding {@code 
{fieldName}} placeholder
+ *       in the prompt.
+ *   <li>Every {@code {placeholder}} in the prompt must correspond to a 
declared {@code inputField}.
+ * </ul>
+ */
+public class DocumentEnrichmentUpdateProcessorFactory extends 
UpdateRequestProcessorFactory
+    implements SolrCoreAware, ManagedResourceObserver {
+  private static final String INPUT_FIELD_PARAM = "inputField";
+  private static final String OUTPUT_FIELD_PARAM = "outputField";
+  private static final String PROMPT = "prompt";
+  private static final String PROMPT_FILE = "promptFile";
+  private static final String MODEL_NAME = "model";
+  private static final Pattern PLACEHOLDER_PATTERN = 
Pattern.compile("\\{([^}]+)\\}");
+
+  private List<String> inputFields;
+  private String outputField;
+  private String promptText;
+  private String promptFile;
+  private String modelName;
+
+  @Override
+  public void init(final NamedList<?> args) {
+    // removeConfigArgs handles both multiple <str name="inputField"> and <arr 
name="inputField">
+    // and must be called before toSolrParams() since it mutates args in place
+    Collection<String> fieldNames = args.removeConfigArgs(INPUT_FIELD_PARAM);
+    if (fieldNames.isEmpty()) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR, "At least one 'inputField' 
must be provided");
+    }
+    inputFields = List.copyOf(fieldNames);
+
+    Collection<String> outputFields = 
args.removeConfigArgs(OUTPUT_FIELD_PARAM);
+    if (outputFields.isEmpty()) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR, "Exactly one 'outputField' 
must be provided");
+    }
+    if (outputFields.size() > 1) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "Only one 'outputField' can be provided, but found: " + 
outputFields);
+    }
+    outputField = outputFields.iterator().next();
+
+    SolrParams params = args.toSolrParams();
+    RequiredSolrParams required = params.required();
+    modelName = required.get(MODEL_NAME);
+
+    String inlinePrompt = params.get(PROMPT);
+    String promptFilePath = params.get(PROMPT_FILE);
+
+    if (inlinePrompt == null && promptFilePath == null) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR, "Either 'prompt' or 
'promptFile' must be provided");
+    }
+    if (inlinePrompt != null && promptFilePath != null) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "Only one of 'prompt' or 'promptFile' can be provided, not both");
+    }
+    if (inlinePrompt != null) {
+      validatePromptPlaceholders(inlinePrompt, inputFields);
+      this.promptText = inlinePrompt;
+    }
+    this.promptFile = promptFilePath;
+  }
+
+  @Override
+  public void inform(SolrCore core) {
+    final SolrResourceLoader solrResourceLoader = core.getResourceLoader();
+    ManagedChatModelStore.registerManagedChatModelStore(solrResourceLoader, 
this);
+    if (promptFile != null) {
+      try (InputStream is = solrResourceLoader.openResource(promptFile)) {
+        promptText = new String(is.readAllBytes(), 
StandardCharsets.UTF_8).trim();
+      } catch (IOException e) {
+        throw new SolrException(
+            SolrException.ErrorCode.SERVER_ERROR, "Cannot read prompt file: " 
+ promptFile, e);
+      }
+      validatePromptPlaceholders(promptText, inputFields);
+    }
+  }
+
+  @Override
+  public void onManagedResourceInitialized(NamedList<?> args, ManagedResource 
res)
+      throws SolrException {
+    if (res instanceof ManagedChatModelStore store) {
+      store.loadStoredModels();
+    }
+  }
+
+  @Override
+  public UpdateRequestProcessor getInstance(
+      SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor 
next) {
+    IndexSchema latestSchema = req.getCore().getLatestSchema();
+
+    for (String fieldName : inputFields) {
+      if (!latestSchema.isDynamicField(fieldName) && 
!latestSchema.hasExplicitField(fieldName)) {
+        throw new SolrException(
+            SolrException.ErrorCode.SERVER_ERROR, "undefined field: \"" + 
fieldName + "\"");
+      }
+    }
+
+    final SchemaField outputFieldSchema = latestSchema.getField(outputField);
+
+    ResponseFormat responseFormat = buildResponseFormat(outputFieldSchema);
+    boolean multiValued = outputFieldSchema.multiValued();
+
+    ManagedChatModelStore store = 
ManagedChatModelStore.getManagedModelStore(req.getCore());
+    SolrChatModel chatModel = store.getModel(modelName);
+    if (chatModel == null) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "The model configured in the Update Request Processor '"
+              + modelName
+              + "' can't be found in the store: "
+              + ManagedChatModelStore.REST_END_POINT);
+    }
+
+    return new DocumentEnrichmentUpdateProcessor(
+        inputFields, outputField, promptText, chatModel, multiValued, 
responseFormat, req, next);
+  }
+
+  /**
+   * Builds a {@link ResponseFormat} that instructs the model to return a JSON 
object {@code
+   * {"value": ...}} whose value type matches the Solr field type. For 
multivalued fields the value
+   * is wrapped in a {@link JsonArraySchema} nested inside the root {@link 
JsonObjectSchema}.
+   *
+   * <p>Nesting {@link JsonArraySchema} inside a {@link JsonObjectSchema} 
property is supported by
+   * all langchain4j providers that implement structured outputs with {@link 
JsonObjectSchema}
+   * (OpenAI, Azure OpenAI, Google AI, Gemini, Mistral, Ollama, Amazon 
Bedrock, Watsonx).
+   */
+  static ResponseFormat buildResponseFormat(SchemaField schemaField) {
+    JsonSchemaElement valueElement = 
toJsonSchemaElement(schemaField.getType());
+    JsonSchemaElement valueSchema =
+        schemaField.multiValued()
+            ? JsonArraySchema.builder().items(valueElement).build()
+            : valueElement;
+    return ResponseFormat.builder()
+        .type(ResponseFormatType.JSON)
+        .jsonSchema(
+            JsonSchema.builder()
+                .name("output")
+                .rootElement(
+                    JsonObjectSchema.builder()
+                        .addProperty("value", valueSchema)
+                        .required("value")
+                        .build())
+                .build())
+        .build();
+  }
+
+  private static JsonSchemaElement toJsonSchemaElement(FieldType fieldType) {
+    // DenseVectorField extends FloatPointField, so it must be rejected before 
the numeric checks
+    if (fieldType instanceof DenseVectorField
+        || fieldType instanceof UUIDField
+        || fieldType instanceof NestPathField) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "field type is not supported by Document Enrichment: "
+              + fieldType.getClass().getSimpleName());
+    }
+    if (fieldType instanceof StrField
+        || fieldType instanceof TextField
+        || fieldType instanceof DatePointField) {
+      return new JsonStringSchema();
+    } else if (fieldType instanceof IntPointField || fieldType instanceof 
LongPointField) {
+      return new JsonIntegerSchema();
+    } else if (fieldType instanceof FloatPointField || fieldType instanceof 
DoublePointField) {
+      return new JsonNumberSchema();
+    } else if (fieldType instanceof BoolField) {
+      return new JsonBooleanSchema();
+    } else {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "field type is not supported by Document Enrichment: "
+              + fieldType.getClass().getSimpleName());
+    }
+  }
+
+  private static void validatePromptPlaceholders(String prompt, List<String> 
fieldNames) {

Review Comment:
   fieldNames -> inputFields



##########
solr/modules/language-models/src/test/org/apache/solr/languagemodels/documentenrichment/model/DummyChatModel.java:
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.languagemodels.documentenrichment.model;
+
+import dev.langchain4j.data.message.AiMessage;
+import dev.langchain4j.data.message.UserMessage;
+import dev.langchain4j.model.chat.ChatModel;
+import dev.langchain4j.model.chat.request.ChatRequest;
+import dev.langchain4j.model.chat.response.ChatResponse;
+
+/**
+ * A deterministic {@link ChatModel} for testing. It returns a fixed response 
string regardless of
+ * the input, allowing tests to assert exact enriched-field values without 
real API calls.
+ *
+ * <p>The builder also exposes {@code unsupported} and {@code ambiguous} 
setter methods to exercise
+ * the reflection-based parameter handling in {@link
+ * 
org.apache.solr.languagemodels.documentenrichment.model.SolrChatModel#getInstance}.
+ */
+public class DummyChatModel implements ChatModel {
+
+  /** The text of the last prompt received by any instance. Useful for test 
assertions. */
+  public static String lastReceivedPrompt;
+
+  private final String response;
+
+  public DummyChatModel(String response) {
+    this.response = response;
+  }
+
+  @Override
+  public ChatResponse chat(ChatRequest chatRequest) {
+    lastReceivedPrompt = ((UserMessage) 
chatRequest.messages().getFirst()).singleText();
+    return ChatResponse.builder().aiMessage(AiMessage.from(response)).build();
+  }
+
+  public static DummyChatModelBuilder builder() {
+    return new DummyChatModelBuilder();
+  }
+
+  public static class DummyChatModelBuilder {
+    private String response = "dummy response";
+    private int intValue;
+
+    public DummyChatModelBuilder() {}
+
+    public DummyChatModelBuilder response(String response) {
+      this.response = response;
+      return this;
+    }
+
+    /** Intentionally has no String overload so the reflection code raises a 
BAD_REQUEST error. */
+    public DummyChatModelBuilder unsupported(Integer input) {
+      return this;
+    }
+
+    /** Two overloads make this param "ambiguous": the reflection code should 
default to String. */
+    public DummyChatModelBuilder ambiguous(int input) {
+      this.intValue = input;
+      return this;
+    }
+
+    public DummyChatModelBuilder ambiguous(String input) {
+      this.intValue = Integer.valueOf(input);
+      return this;
+    }

Review Comment:
   will have to elaborate this



##########
solr/modules/language-models/src/test/org/apache/solr/languagemodels/documentenrichment/store/rest/TestChatModelManager.java:
##########


Review Comment:
   not sure we need a separate model manager, but tests are ok and may need to 
be relocated



##########
solr/solr-ref-guide/modules/configuration-guide/pages/update-request-processors.adoc:
##########
@@ -421,6 +421,10 @@ The 
{solr-javadocs}/modules/language-models/index.html[`language-models`] module
 It uses external text to vectors language models to perform the vectorisation 
for each processed document.
 For more information: xref:query-guide:text-to-vector.adoc[Update Request 
Processor]
 
+{solr-javadocs}/modules/language-models/org/apache/solr/languagemodels/documentenrichment/update/processor/DocumentEnrichmentUpdateProcessorFactory.html[DocumentEnrichmentUpdateProcessorFactory]::
 Update processor which, starting from one or more fields in input and a given 
prompt, adds the output of an LLM as the value of a new field.
+It uses external chat language models to perform the enrichment of each 
processed document.

Review Comment:
   It uses external Large Language Model services to perform the enrichment of 
each processed document.



##########
solr/modules/language-models/src/test-files/modelChatExamples/dummy-chat-model-ambiguous.json:
##########
@@ -0,0 +1,8 @@
+{
+  "class": 
"org.apache.solr.languagemodels.documentenrichment.model.DummyChatModel",
+  "name": "dummy-chat-1",
+  "params": {
+    "response": "enriched content",
+    "ambiguous": 10

Review Comment:
   ambiguous?



##########
solr/modules/language-models/src/test-files/solr/collection1/conf/enumsConfig.xml:
##########


Review Comment:
   What's this file?



##########
solr/modules/language-models/src/test-files/solr/collection1/conf/solrconfig-document-enrichment.xml:
##########
@@ -0,0 +1,258 @@
+<?xml version="1.0" ?>
+<!-- Licensed to the Apache Software Foundation (ASF) under one or more 
contributor
+ license agreements. See the NOTICE file distributed with this work for 
additional
+ information regarding copyright ownership. The ASF licenses this file to
+ You under the Apache License, Version 2.0 (the "License"); you may not use
+ this file except in compliance with the License. You may obtain a copy of
+ the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required
+ by applicable law or agreed to in writing, software distributed under the
+ License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
+ OF ANY KIND, either express or implied. See the License for the specific
+ language governing permissions and limitations under the License. -->
+
+<config>
+    <luceneMatchVersion>${tests.luceneMatchVersion:LATEST}</luceneMatchVersion>
+ <dataDir>${solr.data.dir:}</dataDir>
+ <directoryFactory name="DirectoryFactory"
+                   class="${solr.directoryFactory:solr.MockDirectoryFactory}" 
/>
+ <schemaFactory class="ClassicIndexSchemaFactory" />
+
+ <requestDispatcher>
+   <requestParsers />
+ </requestDispatcher>
+
+ <query>
+  <filterCache class="solr.CaffeineCache" size="4096"
+   initialSize="2048" autowarmCount="0" />
+ </query>
+ <requestHandler name="/select" class="solr.SearchHandler" />
+
+ <updateHandler class="solr.DirectUpdateHandler2">
+  <autoCommit>
+   <maxTime>15000</maxTime>
+   <openSearcher>false</openSearcher>
+  </autoCommit>
+  <autoSoftCommit>
+   <maxTime>1000</maxTime>
+  </autoSoftCommit>
+  <updateLog>
+   <str name="dir">${solr.data.dir:}</str>
+  </updateLog>
+ </updateHandler>
+
+ <requestHandler name="/query" class="solr.SearchHandler">
+  <lst name="defaults">
+   <str name="echoParams">explicit</str>
+   <str name="wt">json</str>
+   <str name="indent">true</str>
+   <str name="df">id</str>
+  </lst>
+ </requestHandler>
+
+ <updateRequestProcessorChain name="documentEnrichment">
+  <processor 
class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory">
+   <str name="inputField">string_field</str>
+   <str name="outputField">enriched_field</str>
+   <str name="prompt">Summarize this content: {string_field}</str>
+   <str name="model">dummy-chat-1</str>
+  </processor>
+  <processor class="solr.RunUpdateProcessorFactory"/>
+ </updateRequestProcessorChain>
+
+  <updateRequestProcessorChain name="documentEnrichmentArrInputField">
+    <processor 
class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory">
+      <arr name="inputField">
+        <str>string_field</str>
+        <str>body_field</str>
+      </arr>
+      <str name="outputField">enriched_field</str>
+      <str name="prompt">Title: {string_field}. Body: {body_field}.</str>

Review Comment:
   I don't understand this prompt, what type of enrichment do we expect?



##########
solr/modules/language-models/src/java/org/apache/solr/languagemodels/documentenrichment/update/processor/DocumentEnrichmentUpdateProcessorFactory.java:
##########
@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.languagemodels.documentenrichment.update.processor;
+
+import dev.langchain4j.model.chat.request.ResponseFormat;
+import dev.langchain4j.model.chat.request.ResponseFormatType;
+import dev.langchain4j.model.chat.request.json.JsonArraySchema;
+import dev.langchain4j.model.chat.request.json.JsonBooleanSchema;
+import dev.langchain4j.model.chat.request.json.JsonIntegerSchema;
+import dev.langchain4j.model.chat.request.json.JsonNumberSchema;
+import dev.langchain4j.model.chat.request.json.JsonObjectSchema;
+import dev.langchain4j.model.chat.request.json.JsonSchema;
+import dev.langchain4j.model.chat.request.json.JsonSchemaElement;
+import dev.langchain4j.model.chat.request.json.JsonStringSchema;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.charset.StandardCharsets;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.RequiredSolrParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.core.SolrResourceLoader;
+import org.apache.solr.languagemodels.documentenrichment.model.SolrChatModel;
+import 
org.apache.solr.languagemodels.documentenrichment.store.rest.ManagedChatModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.rest.ManagedResource;
+import org.apache.solr.rest.ManagedResourceObserver;
+import org.apache.solr.schema.BoolField;
+import org.apache.solr.schema.DatePointField;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.DoublePointField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.FloatPointField;
+import org.apache.solr.schema.IndexSchema;
+import org.apache.solr.schema.IntPointField;
+import org.apache.solr.schema.LongPointField;
+import org.apache.solr.schema.NestPathField;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.schema.StrField;
+import org.apache.solr.schema.TextField;
+import org.apache.solr.schema.UUIDField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+import org.apache.solr.util.plugin.SolrCoreAware;
+
+/**
+ * Generate the content of a field based on other fields specified as input.
+ *
+ * <p>One or more {@code inputField} parameters specify the Solr fields to use 
as input. Each field
+ * name must appear as a {@code {fieldName}} placeholder in the prompt. 
Exactly one of {@code
+ * prompt} or {@code promptFile} must be provided.
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;title_field&lt;/str&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;body_field&lt;/str&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;prompt&quot;&gt;Title: {title_field}. Body: 
{body_field}.&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Multiple {@code inputField} values can also be declared as an array 
using {@code arr}:
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;arr name=&quot;inputField&quot;&gt;
+ *     &lt;str&gt;title_field&lt;/str&gt;
+ *     &lt;str&gt;body_field&lt;/str&gt;
+ *   &lt;/arr&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;prompt&quot;&gt;Title: {title_field}. Body: 
{body_field}.&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Alternatively, the prompt can be loaded from a text file using {@code 
promptFile}:
+ *
+ * <pre class="prettyprint" >
+ * &lt;processor 
class=&quot;solr.llm.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory&quot;&gt;
+ *   &lt;str name=&quot;inputField&quot;&gt;title_field&lt;/str&gt;
+ *   &lt;str name=&quot;outputField&quot;&gt;enriched_field&lt;/str&gt;
+ *   &lt;str name=&quot;promptFile&quot;&gt;prompt.txt&lt;/str&gt;
+ *   &lt;str name=&quot;model&quot;&gt;ChatModel&lt;/str&gt;
+ * &lt;/processor&gt;
+ * </pre>
+ *
+ * <p>Validation rules:
+ *
+ * <ul>
+ *   <li>At least one {@code inputField} must be declared.
+ *   <li>Exactly one of {@code prompt} or {@code promptFile} must be provided.
+ *   <li>Every declared {@code inputField} must have a corresponding {@code 
{fieldName}} placeholder
+ *       in the prompt.
+ *   <li>Every {@code {placeholder}} in the prompt must correspond to a 
declared {@code inputField}.
+ * </ul>
+ */
+public class DocumentEnrichmentUpdateProcessorFactory extends 
UpdateRequestProcessorFactory
+    implements SolrCoreAware, ManagedResourceObserver {
+  private static final String INPUT_FIELD_PARAM = "inputField";
+  private static final String OUTPUT_FIELD_PARAM = "outputField";
+  private static final String PROMPT = "prompt";
+  private static final String PROMPT_FILE = "promptFile";
+  private static final String MODEL_NAME = "model";
+  private static final Pattern PLACEHOLDER_PATTERN = 
Pattern.compile("\\{([^}]+)\\}");
+
+  private List<String> inputFields;
+  private String outputField;
+  private String promptText;
+  private String promptFile;
+  private String modelName;
+
+  @Override
+  public void init(final NamedList<?> args) {
+    // removeConfigArgs handles both multiple <str name="inputField"> and <arr 
name="inputField">
+    // and must be called before toSolrParams() since it mutates args in place
+    Collection<String> fieldNames = args.removeConfigArgs(INPUT_FIELD_PARAM);
+    if (fieldNames.isEmpty()) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR, "At least one 'inputField' 
must be provided");
+    }
+    inputFields = List.copyOf(fieldNames);
+
+    Collection<String> outputFields = 
args.removeConfigArgs(OUTPUT_FIELD_PARAM);
+    if (outputFields.isEmpty()) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR, "Exactly one 'outputField' 
must be provided");
+    }
+    if (outputFields.size() > 1) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "Only one 'outputField' can be provided, but found: " + 
outputFields);
+    }
+    outputField = outputFields.iterator().next();
+
+    SolrParams params = args.toSolrParams();
+    RequiredSolrParams required = params.required();
+    modelName = required.get(MODEL_NAME);
+
+    String inlinePrompt = params.get(PROMPT);
+    String promptFilePath = params.get(PROMPT_FILE);
+
+    if (inlinePrompt == null && promptFilePath == null) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR, "Either 'prompt' or 
'promptFile' must be provided");
+    }
+    if (inlinePrompt != null && promptFilePath != null) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "Only one of 'prompt' or 'promptFile' can be provided, not both");
+    }
+    if (inlinePrompt != null) {
+      validatePromptPlaceholders(inlinePrompt, inputFields);
+      this.promptText = inlinePrompt;
+    }
+    this.promptFile = promptFilePath;
+  }
+
+  @Override
+  public void inform(SolrCore core) {
+    final SolrResourceLoader solrResourceLoader = core.getResourceLoader();
+    ManagedChatModelStore.registerManagedChatModelStore(solrResourceLoader, 
this);
+    if (promptFile != null) {
+      try (InputStream is = solrResourceLoader.openResource(promptFile)) {
+        promptText = new String(is.readAllBytes(), 
StandardCharsets.UTF_8).trim();
+      } catch (IOException e) {
+        throw new SolrException(
+            SolrException.ErrorCode.SERVER_ERROR, "Cannot read prompt file: " 
+ promptFile, e);
+      }
+      validatePromptPlaceholders(promptText, inputFields);
+    }
+  }
+
+  @Override
+  public void onManagedResourceInitialized(NamedList<?> args, ManagedResource 
res)
+      throws SolrException {
+    if (res instanceof ManagedChatModelStore store) {
+      store.loadStoredModels();
+    }
+  }
+
+  @Override
+  public UpdateRequestProcessor getInstance(
+      SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor 
next) {
+    IndexSchema latestSchema = req.getCore().getLatestSchema();
+
+    for (String fieldName : inputFields) {
+      if (!latestSchema.isDynamicField(fieldName) && 
!latestSchema.hasExplicitField(fieldName)) {
+        throw new SolrException(
+            SolrException.ErrorCode.SERVER_ERROR, "undefined field: \"" + 
fieldName + "\"");
+      }
+    }
+
+    final SchemaField outputFieldSchema = latestSchema.getField(outputField);
+
+    ResponseFormat responseFormat = buildResponseFormat(outputFieldSchema);
+    boolean multiValued = outputFieldSchema.multiValued();
+
+    ManagedChatModelStore store = 
ManagedChatModelStore.getManagedModelStore(req.getCore());
+    SolrChatModel chatModel = store.getModel(modelName);
+    if (chatModel == null) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "The model configured in the Update Request Processor '"
+              + modelName
+              + "' can't be found in the store: "
+              + ManagedChatModelStore.REST_END_POINT);
+    }
+
+    return new DocumentEnrichmentUpdateProcessor(
+        inputFields, outputField, promptText, chatModel, multiValued, 
responseFormat, req, next);
+  }
+
+  /**
+   * Builds a {@link ResponseFormat} that instructs the model to return a JSON 
object {@code
+   * {"value": ...}} whose value type matches the Solr field type. For 
multivalued fields the value
+   * is wrapped in a {@link JsonArraySchema} nested inside the root {@link 
JsonObjectSchema}.
+   *
+   * <p>Nesting {@link JsonArraySchema} inside a {@link JsonObjectSchema} 
property is supported by
+   * all langchain4j providers that implement structured outputs with {@link 
JsonObjectSchema}
+   * (OpenAI, Azure OpenAI, Google AI, Gemini, Mistral, Ollama, Amazon 
Bedrock, Watsonx).
+   */
+  static ResponseFormat buildResponseFormat(SchemaField schemaField) {
+    JsonSchemaElement valueElement = 
toJsonSchemaElement(schemaField.getType());
+    JsonSchemaElement valueSchema =
+        schemaField.multiValued()
+            ? JsonArraySchema.builder().items(valueElement).build()
+            : valueElement;
+    return ResponseFormat.builder()
+        .type(ResponseFormatType.JSON)
+        .jsonSchema(
+            JsonSchema.builder()
+                .name("output")
+                .rootElement(
+                    JsonObjectSchema.builder()
+                        .addProperty("value", valueSchema)
+                        .required("value")
+                        .build())
+                .build())
+        .build();
+  }
+
+  private static JsonSchemaElement toJsonSchemaElement(FieldType fieldType) {
+    // DenseVectorField extends FloatPointField, so it must be rejected before 
the numeric checks
+    if (fieldType instanceof DenseVectorField
+        || fieldType instanceof UUIDField
+        || fieldType instanceof NestPathField) {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "field type is not supported by Document Enrichment: "
+              + fieldType.getClass().getSimpleName());
+    }
+    if (fieldType instanceof StrField
+        || fieldType instanceof TextField
+        || fieldType instanceof DatePointField) {
+      return new JsonStringSchema();
+    } else if (fieldType instanceof IntPointField || fieldType instanceof 
LongPointField) {
+      return new JsonIntegerSchema();
+    } else if (fieldType instanceof FloatPointField || fieldType instanceof 
DoublePointField) {
+      return new JsonNumberSchema();
+    } else if (fieldType instanceof BoolField) {
+      return new JsonBooleanSchema();
+    } else {
+      throw new SolrException(
+          SolrException.ErrorCode.SERVER_ERROR,
+          "field type is not supported by Document Enrichment: "
+              + fieldType.getClass().getSimpleName());
+    }

Review Comment:
   I think with switch-case java construct this part will be more readable



##########
solr/modules/language-models/src/test-files/modelEmbeddingExamples/dummy-model.json:
##########


Review Comment:
   not sure we need this relocation, but in case it is:
   embeddingModelExamples



##########
solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc:
##########
@@ -0,0 +1,534 @@
+= Document Enrichment with LLMs
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+This module brings the power of *Large Language Models* to Solr.
+
+More specifically, it enables calling an LLM at indexing time to enrich 
documents with additional/generated/extracted
+data. Given a prompt and a set of input fields, for each document, the LLM is 
invoked through
+https://github.com/langchain4j/langchain4j[LangChain4j], and the result is 
stored in an output field, which can support
+multiple types and may also be multivalued.
+
+_Without_ this module, the LLM calls to enrich documents must be done 
_outside_ Solr, before indexing.
+
+[IMPORTANT]
+====
+This module sends your documents off to some hosted service on the internet.
+There are cost, privacy, performance, and service availability implications on 
such a strong dependency that should be
+diligently examined before employing this module in a serious way.
+
+====
+
+At the moment, Solr supports a subset of the LLM providers available in 
LangChain4j.
+
+*Disclaimer*: Apache Solr is *in no way* affiliated to any of these 
corporations or services.
+
+If you want to add support for additional services or improve the support for 
the existing ones, feel free to
+contribute:
+
+* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to 
Solr]
+
+== Module
+
+This is provided via the `language-models` 
xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be
+enabled before use.
+
+== Language Model Configuration
+
+Language Models is a module and therefore its plugins must be configured in 
`solrconfig.xml`.
+
+=== Minimum Requirements
+
+* Enable the `language-models` module to make the Language Models classes 
available on Solr's classpath.
+See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details.
+
+* An 
{solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[UpdateRequestProcessorChain]
+that includes at least one `DocumentEnrichmentUpdateProcessor` update 
processor.
+
+=== Update Processor Chain Design
+
+To properly design the Update Processor Chain for Document Enrichment, several 
parameters must be defined:
+
+`inputField`::
++
+[%autowidth,frame=none]
+|===
+s|Required |Default: none
+|===
++
+The field whose content is passed to the LLM to enrich the documents. Every 
`inputField` declared must be referred to in
+the prompt.
+
++
+Multiple `inputField` are supported and can be defined by using one of the 
following notations:
+
+* Add more than one `inputField` string element
++
+[source,xml]
+----
+<updateRequestProcessorChain name="documentEnrichment">
+  <processor 
class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory">
+   <str name="inputField">title</str>
+   <str name="inputField">body</str>
+   <str name="outputField">summary</str>
+   <str name="prompt">Summarize with the following information. Title: 
{title}. Body: {body}.</str>
+   <str name="model">chat-model</str>
+  </processor>
+  <processor class="solr.RunUpdateProcessorFactory"/>
+ </updateRequestProcessorChain>
+----
+
+* Substitute the `inputField` string element with an array of string elements 
with the same name
++
+[source,xml]
+----
+<arr name="inputField">
+    <str>title</str>
+    <str>body</str>
+</arr>
+----
+
+
+`outputField`::
++
+[%autowidth,frame=none]
+|===
+s|Required |Default: none
+|===
++
+The LLM response is mapped to the specified `outputField`, and only one field 
is supported as output. Note that this
+module only supports a subset of Solr's available field types, which includes:
+
+* *String/Text*: `StrField`, `TextField`, `SortableTextField`
+* *Date*: `DatePointField` (the LLM must return an ISO-8601 date string; it 
might be useful to tune your prompt accordingly, to avoid indexing errors)
+* *Numeric*: `IntPointField`, `LongPointField`, `FloatPointField`, 
`DoublePointField`
+* *Boolean*: `BoolField`
+
+
+These fields _can_ be multivalued. Solr uses structured output from 
LangChain4j to deal with LLMs' responses.
+
+
+`prompt` or `promptFile`::
++
+[%autowidth,frame=none]
+|===
+s|Exactly one of these parameters is required |Default: none
+|===
++
+Two different ways to define a prompt are available: one directly in the 
solrconfig and one through a dedicated file.
+Either way, the content of the prompt _must_ contain a special token for each 
`inputField` declared, that are the
+`fieldName` surrounded by curly brackets (e.g., `{string_field}`, in the 
example below). Solr will throw an error if
+the parameters are not properly defined.
++
+These parameters can be defined in one of the following ways:
+
+* Update processor definition with the `prompt` parameter
++
+[source,xml]
+----
+<updateRequestProcessorChain name="documentEnrichment">
+  <processor 
class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory">
+   <str name="inputField">string_field</str>
+   <str name="outputField">summary</str>
+   <str name="prompt">Summarize this content: {string_field}</str>
+   <str name="model">model-name</str>
+  </processor>
+  <processor class="solr.RunUpdateProcessorFactory"/>
+ </updateRequestProcessorChain>
+----
+
+* Update processor definition with the parameter `promptFile` parameter: in 
this case, the file `prompt.txt` must be
+uploaded to Solr inside the config folder of the collection (e.g., similarly 
to `solrconfig.xml`, `synonyms.txt`, etc.)
++
+[source,xml]
+----
+<updateRequestProcessorChain name="documentEnrichment">
+  <processor 
class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory">
+   <str name="inputField">string_field</str>
+   <str name="outputField">summary</str>
+   <str name="promptFile">prompt.txt</str>
+   <str name="model">model-name</str>
+  </processor>
+  <processor class="solr.RunUpdateProcessorFactory"/>
+ </updateRequestProcessorChain>
+----
+
+`model`::
++
+[%autowidth,frame=none]
+|===
+s|Required |Default: none
+|===
++
+
+The name of the model that will be uploaded via REST. See 
xref:document-enrichment-with-llms.adoc#chat-model-setup[] for
+more information.
+
+
+For more details on how to work with update request processors in Apache Solr, 
please refer to the dedicated page:
+xref:configuration-guide:update-request-processors.adoc[Update Request 
Processor]
+
+[IMPORTANT]
+====
+This update processor sends your document field content off to some hosted 
service on the internet.
+There are serious performance implications that should be diligently examined 
before employing this component in production.
+It will slow down substantially your indexing pipeline so make sure to stress 
test your solution before going live.
+
+====
+
+[NOTE]
+====
+If any `inputField` value is absent or empty for a given document, enrichment 
is silently skipped for that document:
+the `outputField` is not added and the document is indexed as-is.
+
+If the LLM call fails at runtime (e.g., network error, model timeout), the 
exception is caught and logged but is
+*non-fatal*: the document is still indexed without the `outputField`.
+Monitor your indexing logs to detect documents that were not enriched as 
expected.
+====
+
+== Chat Model Setup

Review Comment:
   Chat Model is a LangChain4j naming, please remove it entirely from the doc 
and Solr where possible.
   Furthermore we don't offer any chat style interaction so it can be 
misleading.
   
   let's just use 'general purpose LLM' 



##########
solr/solr-ref-guide/modules/configuration-guide/pages/update-request-processors.adoc:
##########
@@ -421,6 +421,10 @@ The 
{solr-javadocs}/modules/language-models/index.html[`language-models`] module
 It uses external text to vectors language models to perform the vectorisation 
for each processed document.
 For more information: xref:query-guide:text-to-vector.adoc[Update Request 
Processor]
 
+{solr-javadocs}/modules/language-models/org/apache/solr/languagemodels/documentenrichment/update/processor/DocumentEnrichmentUpdateProcessorFactory.html[DocumentEnrichmentUpdateProcessorFactory]::
 Update processor which, starting from one or more fields in input and a given 
prompt, adds the output of an LLM as the value of a new field.

Review Comment:
   Update processor that takes one or more fields and a given prompt in input 
and returns the output of an LLM as the value of a new field.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to