aruggero commented on code in PR #4259: URL: https://github.com/apache/solr/pull/4259#discussion_r3084761971
########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an Review Comment: More specifically, it enables calling an LLM at indexing time to enrich documents with additional/generated/extracted data. Given a prompt and a set of input fields, for each document, the LLM is invoked through https://github.com/langchain4j/langchain4j[LangChain4j], and the result is stored in an `outputField`, which can support multiple types and may also be multivalued. ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. Review Comment: At the moment, Solr supports a subset of the LLM providers available in LangChain4j. ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. Review Comment: _Without_ this module, the LLM calls to enrich documents must be done _outside_ Solr, before indexing. ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to Review Comment: Remove ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration Review Comment: Mmmm.. maybe "Chat Model setup?" ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. Review Comment: A model is a chat model that generates a text response given a prompt. ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat +completion. + +[IMPORTANT] +==== +The Solr chat model specifies the parameters to access the APIs, the LLM doesn't run internally in Solr + +==== + +A model is described by these parameters: + + +`class`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The model implementation. +Accepted values: + +* `dev.langchain4j.model.ollama.OllamaChatModel` +* `dev.langchain4j.model.mistralai.MistralAiChatModel` +* `dev.langchain4j.model.anthropic.AnthropicChatModel` +* `dev.langchain4j.model.openai.OpenAiChatModel` +* `dev.langchain4j.model.googleai.GoogleAiGeminiChatModel` + +`name`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The identifier of your model, this is used by any component that intends to use the model (e.g., `DocumentEnrichmentUpdateProcessorFactory` update processor). + +`params`:: ++ +[%autowidth,frame=none] +|=== +|Optional |Default: none +|=== ++ +Each model class has potentially different params. +Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: https://docs.langchain4j.dev/category/language-models[Chat Models in LangChain4j]. + +=== Supported Models +Apache Solr uses https://github.com/langchain4j/langchain4j[LangChain4j] to support document enrichment with LLMs. +The models currently supported are: + +[tabs#supported-chat-models] +====== +Ollama:: ++ +==== + +[source,json] +---- +{ + "class": "dev.langchain4j.model.ollama.OllamaChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "http://localhost:11434", + "modelName": "<a-local/hosted-chat-model>", + "timeout": 300, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +MistralAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.mistralai.MistralAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.mistral.ai/v1", + "apiKey": "<your-mistralAI-api-key>", + "modelName": "<a-mistralAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +OpenAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "<your-openAI-api-key>", + "modelName": "<a-openAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Anthropic:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.anthropic.AnthropicChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.anthropic.com/v1/", + "apiKey": "<your-anthropic-api-key>", + "modelName": "<a-anthropic-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Gemini:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.googleai.GoogleAiGeminiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://generativelanguage.googleapis.com/v1beta/", + "apiKey": "<your-geminiAi-api-key>", + "modelName": "<a-geminiAi-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +====== + +=== Uploading a Model + +To upload the model in a `/path/myModel.json` file, please run: + +[source,bash] +---- +curl -XPUT 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json' +---- + +To delete the `currentModel` model: + +[source,bash] +---- +curl -XDELETE 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store/currentModel' +---- + +To view all models: + +[source,text] +http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store + + +.Example: /path/myOpenAIModel.json +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "openai-1", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "apiKey-openAI", + "modelName": "gpt-5.4-nano", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- + +== How to Trigger Document Enrichment during Indexing +To create new fields starting from existent ones in your documents at indexing time you need to configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[Update Request Processor Chain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor in one of the 2 following way: Review Comment: To create new fields from existing document fields at indexing time, configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[UpdateRequestProcessorChain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor. This can be done in one of the following two ways: ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat +completion. + +[IMPORTANT] +==== +The Solr chat model specifies the parameters to access the APIs, the LLM doesn't run internally in Solr + +==== + +A model is described by these parameters: + + +`class`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The model implementation. +Accepted values: + +* `dev.langchain4j.model.ollama.OllamaChatModel` +* `dev.langchain4j.model.mistralai.MistralAiChatModel` +* `dev.langchain4j.model.anthropic.AnthropicChatModel` +* `dev.langchain4j.model.openai.OpenAiChatModel` +* `dev.langchain4j.model.googleai.GoogleAiGeminiChatModel` + +`name`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The identifier of your model, this is used by any component that intends to use the model (e.g., `DocumentEnrichmentUpdateProcessorFactory` update processor). + +`params`:: ++ +[%autowidth,frame=none] +|=== +|Optional |Default: none +|=== ++ +Each model class has potentially different params. +Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: https://docs.langchain4j.dev/category/language-models[Chat Models in LangChain4j]. + +=== Supported Models +Apache Solr uses https://github.com/langchain4j/langchain4j[LangChain4j] to support document enrichment with LLMs. +The models currently supported are: + +[tabs#supported-chat-models] +====== +Ollama:: ++ +==== + +[source,json] +---- +{ + "class": "dev.langchain4j.model.ollama.OllamaChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "http://localhost:11434", + "modelName": "<a-local/hosted-chat-model>", + "timeout": 300, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +MistralAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.mistralai.MistralAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.mistral.ai/v1", + "apiKey": "<your-mistralAI-api-key>", + "modelName": "<a-mistralAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +OpenAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "<your-openAI-api-key>", + "modelName": "<a-openAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Anthropic:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.anthropic.AnthropicChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.anthropic.com/v1/", + "apiKey": "<your-anthropic-api-key>", + "modelName": "<a-anthropic-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Gemini:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.googleai.GoogleAiGeminiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://generativelanguage.googleapis.com/v1beta/", + "apiKey": "<your-geminiAi-api-key>", + "modelName": "<a-geminiAi-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +====== + +=== Uploading a Model + +To upload the model in a `/path/myModel.json` file, please run: + +[source,bash] +---- +curl -XPUT 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json' +---- + +To delete the `currentModel` model: + +[source,bash] +---- +curl -XDELETE 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store/currentModel' +---- + +To view all models: + +[source,text] +http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store + + +.Example: /path/myOpenAIModel.json +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "openai-1", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "apiKey-openAI", + "modelName": "gpt-5.4-nano", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- + +== How to Trigger Document Enrichment during Indexing +To create new fields starting from existent ones in your documents at indexing time you need to configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[Update Request Processor Chain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor in one of the 2 following way: + +* Update processor with parameter `prompt` ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +* Update processor with parameter `promptFile`: in this case, the file `prompt.txt` must be uploaded to Solr similarly to any other configuration file (e.g., `solrconfig.xml`, `synonyms.txt`, etc.) ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="promptFile">prompt.txt</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +Exactly one of the following parameters is required: `prompt` or `promptFile`. + +Another important feature of this module is that one (or more) `inputField` needs to be injected in the prompt. This is Review Comment: I would structure this part as: These are the available parameters: bullet point and then the explanation of each... meaning.. usage... required or not... similar to the ones you put in langchain4j part ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat +completion. + +[IMPORTANT] +==== +The Solr chat model specifies the parameters to access the APIs, the LLM doesn't run internally in Solr + +==== + +A model is described by these parameters: + + +`class`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The model implementation. +Accepted values: + +* `dev.langchain4j.model.ollama.OllamaChatModel` +* `dev.langchain4j.model.mistralai.MistralAiChatModel` +* `dev.langchain4j.model.anthropic.AnthropicChatModel` +* `dev.langchain4j.model.openai.OpenAiChatModel` +* `dev.langchain4j.model.googleai.GoogleAiGeminiChatModel` + +`name`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The identifier of your model, this is used by any component that intends to use the model (e.g., `DocumentEnrichmentUpdateProcessorFactory` update processor). + +`params`:: ++ +[%autowidth,frame=none] +|=== +|Optional |Default: none +|=== ++ +Each model class has potentially different params. +Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: https://docs.langchain4j.dev/category/language-models[Chat Models in LangChain4j]. + +=== Supported Models +Apache Solr uses https://github.com/langchain4j/langchain4j[LangChain4j] to support document enrichment with LLMs. +The models currently supported are: + +[tabs#supported-chat-models] +====== +Ollama:: ++ +==== + +[source,json] +---- +{ + "class": "dev.langchain4j.model.ollama.OllamaChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "http://localhost:11434", + "modelName": "<a-local/hosted-chat-model>", + "timeout": 300, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +MistralAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.mistralai.MistralAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.mistral.ai/v1", + "apiKey": "<your-mistralAI-api-key>", + "modelName": "<a-mistralAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +OpenAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "<your-openAI-api-key>", + "modelName": "<a-openAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Anthropic:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.anthropic.AnthropicChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.anthropic.com/v1/", + "apiKey": "<your-anthropic-api-key>", + "modelName": "<a-anthropic-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Gemini:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.googleai.GoogleAiGeminiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://generativelanguage.googleapis.com/v1beta/", + "apiKey": "<your-geminiAi-api-key>", + "modelName": "<a-geminiAi-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +====== + +=== Uploading a Model + +To upload the model in a `/path/myModel.json` file, please run: + +[source,bash] +---- +curl -XPUT 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json' +---- + +To delete the `currentModel` model: + +[source,bash] +---- +curl -XDELETE 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store/currentModel' +---- + +To view all models: + +[source,text] +http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store + + +.Example: /path/myOpenAIModel.json +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "openai-1", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "apiKey-openAI", + "modelName": "gpt-5.4-nano", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- + +== How to Trigger Document Enrichment during Indexing +To create new fields starting from existent ones in your documents at indexing time you need to configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[Update Request Processor Chain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor in one of the 2 following way: + +* Update processor with parameter `prompt` ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +* Update processor with parameter `promptFile`: in this case, the file `prompt.txt` must be uploaded to Solr similarly to any other configuration file (e.g., `solrconfig.xml`, `synonyms.txt`, etc.) ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="promptFile">prompt.txt</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +Exactly one of the following parameters is required: `prompt` or `promptFile`. + +Another important feature of this module is that one (or more) `inputField` needs to be injected in the prompt. This is +done by some special tokens, that are the `fieldName` surrounded by curly brackets (e.g., `{string_field}`, in the +example above). These tokens are _mandatory_ for this module to work properly. Solr will throw an error if the +parameters are not properly defined. +For example, both the prompt and the content of the file prompt.txt, must contain the text '{string_field}', which +will be substituted with the content of the `string_field` field for each document. An example of a valid prompt with +multiple input fields is as follows: + +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">title</str> + <str name="inputField">body</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize with the following information. Title: {title}. Body: {body}.</str> + <str name="model">chat-model</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +Another way of using more than one `inputField` is by using the following notation, instead of more than one parameter Review Comment: Multiple `inputField` could also be defined by using the following notation: ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat +completion. + +[IMPORTANT] +==== +The Solr chat model specifies the parameters to access the APIs, the LLM doesn't run internally in Solr + +==== + +A model is described by these parameters: + + +`class`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The model implementation. +Accepted values: + +* `dev.langchain4j.model.ollama.OllamaChatModel` +* `dev.langchain4j.model.mistralai.MistralAiChatModel` +* `dev.langchain4j.model.anthropic.AnthropicChatModel` +* `dev.langchain4j.model.openai.OpenAiChatModel` +* `dev.langchain4j.model.googleai.GoogleAiGeminiChatModel` + +`name`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The identifier of your model, this is used by any component that intends to use the model (e.g., `DocumentEnrichmentUpdateProcessorFactory` update processor). + +`params`:: ++ +[%autowidth,frame=none] +|=== +|Optional |Default: none +|=== ++ +Each model class has potentially different params. +Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: https://docs.langchain4j.dev/category/language-models[Chat Models in LangChain4j]. + +=== Supported Models +Apache Solr uses https://github.com/langchain4j/langchain4j[LangChain4j] to support document enrichment with LLMs. +The models currently supported are: + +[tabs#supported-chat-models] +====== +Ollama:: ++ +==== + +[source,json] +---- +{ + "class": "dev.langchain4j.model.ollama.OllamaChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "http://localhost:11434", + "modelName": "<a-local/hosted-chat-model>", + "timeout": 300, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +MistralAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.mistralai.MistralAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.mistral.ai/v1", + "apiKey": "<your-mistralAI-api-key>", + "modelName": "<a-mistralAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +OpenAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "<your-openAI-api-key>", + "modelName": "<a-openAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Anthropic:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.anthropic.AnthropicChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.anthropic.com/v1/", + "apiKey": "<your-anthropic-api-key>", + "modelName": "<a-anthropic-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Gemini:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.googleai.GoogleAiGeminiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://generativelanguage.googleapis.com/v1beta/", + "apiKey": "<your-geminiAi-api-key>", + "modelName": "<a-geminiAi-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +====== + +=== Uploading a Model + +To upload the model in a `/path/myModel.json` file, please run: + +[source,bash] +---- +curl -XPUT 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json' +---- + +To delete the `currentModel` model: + +[source,bash] +---- +curl -XDELETE 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store/currentModel' +---- + +To view all models: + +[source,text] +http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store + + +.Example: /path/myOpenAIModel.json +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "openai-1", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "apiKey-openAI", + "modelName": "gpt-5.4-nano", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- + +== How to Trigger Document Enrichment during Indexing +To create new fields starting from existent ones in your documents at indexing time you need to configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[Update Request Processor Chain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor in one of the 2 following way: Review Comment: I would move this part where you talk about configuring the model, since this is an explanation of how to write that part ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat +completion. + +[IMPORTANT] +==== +The Solr chat model specifies the parameters to access the APIs, the LLM doesn't run internally in Solr + +==== + +A model is described by these parameters: + + +`class`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The model implementation. +Accepted values: + +* `dev.langchain4j.model.ollama.OllamaChatModel` +* `dev.langchain4j.model.mistralai.MistralAiChatModel` +* `dev.langchain4j.model.anthropic.AnthropicChatModel` +* `dev.langchain4j.model.openai.OpenAiChatModel` +* `dev.langchain4j.model.googleai.GoogleAiGeminiChatModel` + +`name`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The identifier of your model, this is used by any component that intends to use the model (e.g., `DocumentEnrichmentUpdateProcessorFactory` update processor). + +`params`:: ++ +[%autowidth,frame=none] +|=== +|Optional |Default: none +|=== ++ +Each model class has potentially different params. +Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: https://docs.langchain4j.dev/category/language-models[Chat Models in LangChain4j]. + +=== Supported Models +Apache Solr uses https://github.com/langchain4j/langchain4j[LangChain4j] to support document enrichment with LLMs. +The models currently supported are: + +[tabs#supported-chat-models] +====== +Ollama:: ++ +==== + +[source,json] +---- +{ + "class": "dev.langchain4j.model.ollama.OllamaChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "http://localhost:11434", + "modelName": "<a-local/hosted-chat-model>", + "timeout": 300, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +MistralAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.mistralai.MistralAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.mistral.ai/v1", + "apiKey": "<your-mistralAI-api-key>", + "modelName": "<a-mistralAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +OpenAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "<your-openAI-api-key>", + "modelName": "<a-openAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Anthropic:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.anthropic.AnthropicChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.anthropic.com/v1/", + "apiKey": "<your-anthropic-api-key>", + "modelName": "<a-anthropic-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Gemini:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.googleai.GoogleAiGeminiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://generativelanguage.googleapis.com/v1beta/", + "apiKey": "<your-geminiAi-api-key>", + "modelName": "<a-geminiAi-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +====== + +=== Uploading a Model + +To upload the model in a `/path/myModel.json` file, please run: + +[source,bash] +---- +curl -XPUT 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json' +---- + +To delete the `currentModel` model: + +[source,bash] +---- +curl -XDELETE 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store/currentModel' +---- + +To view all models: + +[source,text] +http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store + + +.Example: /path/myOpenAIModel.json +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "openai-1", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "apiKey-openAI", + "modelName": "gpt-5.4-nano", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- + +== How to Trigger Document Enrichment during Indexing +To create new fields starting from existent ones in your documents at indexing time you need to configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[Update Request Processor Chain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor in one of the 2 following way: + +* Update processor with parameter `prompt` ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +* Update processor with parameter `promptFile`: in this case, the file `prompt.txt` must be uploaded to Solr similarly to any other configuration file (e.g., `solrconfig.xml`, `synonyms.txt`, etc.) Review Comment: Do you mean that the file needs to be put inside the config folder of the collection? ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat +completion. + +[IMPORTANT] +==== +The Solr chat model specifies the parameters to access the APIs, the LLM doesn't run internally in Solr + +==== + +A model is described by these parameters: + + +`class`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The model implementation. +Accepted values: + +* `dev.langchain4j.model.ollama.OllamaChatModel` +* `dev.langchain4j.model.mistralai.MistralAiChatModel` +* `dev.langchain4j.model.anthropic.AnthropicChatModel` +* `dev.langchain4j.model.openai.OpenAiChatModel` +* `dev.langchain4j.model.googleai.GoogleAiGeminiChatModel` + +`name`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The identifier of your model, this is used by any component that intends to use the model (e.g., `DocumentEnrichmentUpdateProcessorFactory` update processor). + +`params`:: ++ +[%autowidth,frame=none] +|=== +|Optional |Default: none +|=== ++ +Each model class has potentially different params. +Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: https://docs.langchain4j.dev/category/language-models[Chat Models in LangChain4j]. + +=== Supported Models +Apache Solr uses https://github.com/langchain4j/langchain4j[LangChain4j] to support document enrichment with LLMs. +The models currently supported are: + +[tabs#supported-chat-models] +====== +Ollama:: ++ +==== + +[source,json] +---- +{ + "class": "dev.langchain4j.model.ollama.OllamaChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "http://localhost:11434", + "modelName": "<a-local/hosted-chat-model>", + "timeout": 300, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +MistralAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.mistralai.MistralAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.mistral.ai/v1", + "apiKey": "<your-mistralAI-api-key>", + "modelName": "<a-mistralAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +OpenAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "<your-openAI-api-key>", + "modelName": "<a-openAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Anthropic:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.anthropic.AnthropicChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.anthropic.com/v1/", + "apiKey": "<your-anthropic-api-key>", + "modelName": "<a-anthropic-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Gemini:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.googleai.GoogleAiGeminiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://generativelanguage.googleapis.com/v1beta/", + "apiKey": "<your-geminiAi-api-key>", + "modelName": "<a-geminiAi-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +====== + +=== Uploading a Model + +To upload the model in a `/path/myModel.json` file, please run: + +[source,bash] +---- +curl -XPUT 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json' +---- + +To delete the `currentModel` model: + +[source,bash] +---- +curl -XDELETE 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store/currentModel' +---- + +To view all models: + +[source,text] +http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store + + +.Example: /path/myOpenAIModel.json +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "openai-1", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "apiKey-openAI", + "modelName": "gpt-5.4-nano", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- + +== How to Trigger Document Enrichment during Indexing +To create new fields starting from existent ones in your documents at indexing time you need to configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[Update Request Processor Chain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor in one of the 2 following way: + +* Update processor with parameter `prompt` ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +* Update processor with parameter `promptFile`: in this case, the file `prompt.txt` must be uploaded to Solr similarly to any other configuration file (e.g., `solrconfig.xml`, `synonyms.txt`, etc.) ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="promptFile">prompt.txt</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +Exactly one of the following parameters is required: `prompt` or `promptFile`. + +Another important feature of this module is that one (or more) `inputField` needs to be injected in the prompt. This is +done by some special tokens, that are the `fieldName` surrounded by curly brackets (e.g., `{string_field}`, in the +example above). These tokens are _mandatory_ for this module to work properly. Solr will throw an error if the +parameters are not properly defined. +For example, both the prompt and the content of the file prompt.txt, must contain the text '{string_field}', which +will be substituted with the content of the `string_field` field for each document. An example of a valid prompt with Review Comment: I think the part so far could be explained in a more schematic and better understandable way. ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat Review Comment: A model is a reference to an external API that runs the Large Language Model responsible for chat completion. ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat +completion. + +[IMPORTANT] +==== +The Solr chat model specifies the parameters to access the APIs, the LLM doesn't run internally in Solr + +==== + +A model is described by these parameters: + + +`class`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The model implementation. +Accepted values: + +* `dev.langchain4j.model.ollama.OllamaChatModel` +* `dev.langchain4j.model.mistralai.MistralAiChatModel` +* `dev.langchain4j.model.anthropic.AnthropicChatModel` +* `dev.langchain4j.model.openai.OpenAiChatModel` +* `dev.langchain4j.model.googleai.GoogleAiGeminiChatModel` + +`name`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The identifier of your model, this is used by any component that intends to use the model (e.g., `DocumentEnrichmentUpdateProcessorFactory` update processor). + +`params`:: ++ +[%autowidth,frame=none] +|=== +|Optional |Default: none +|=== ++ +Each model class has potentially different params. +Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: https://docs.langchain4j.dev/category/language-models[Chat Models in LangChain4j]. + +=== Supported Models +Apache Solr uses https://github.com/langchain4j/langchain4j[LangChain4j] to support document enrichment with LLMs. +The models currently supported are: + +[tabs#supported-chat-models] +====== +Ollama:: ++ +==== + +[source,json] +---- +{ + "class": "dev.langchain4j.model.ollama.OllamaChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "http://localhost:11434", + "modelName": "<a-local/hosted-chat-model>", + "timeout": 300, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +MistralAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.mistralai.MistralAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.mistral.ai/v1", + "apiKey": "<your-mistralAI-api-key>", + "modelName": "<a-mistralAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +OpenAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "<your-openAI-api-key>", + "modelName": "<a-openAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Anthropic:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.anthropic.AnthropicChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.anthropic.com/v1/", + "apiKey": "<your-anthropic-api-key>", + "modelName": "<a-anthropic-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Gemini:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.googleai.GoogleAiGeminiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://generativelanguage.googleapis.com/v1beta/", + "apiKey": "<your-geminiAi-api-key>", + "modelName": "<a-geminiAi-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +====== + +=== Uploading a Model + +To upload the model in a `/path/myModel.json` file, please run: + +[source,bash] +---- +curl -XPUT 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json' +---- + +To delete the `currentModel` model: + +[source,bash] +---- +curl -XDELETE 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store/currentModel' +---- + +To view all models: + +[source,text] +http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store + + +.Example: /path/myOpenAIModel.json +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "openai-1", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "apiKey-openAI", + "modelName": "gpt-5.4-nano", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- + +== How to Trigger Document Enrichment during Indexing +To create new fields starting from existent ones in your documents at indexing time you need to configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[Update Request Processor Chain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor in one of the 2 following way: + +* Update processor with parameter `prompt` ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +* Update processor with parameter `promptFile`: in this case, the file `prompt.txt` must be uploaded to Solr similarly to any other configuration file (e.g., `solrconfig.xml`, `synonyms.txt`, etc.) Review Comment: Update processor definition with the `promptFile` parameter ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat +completion. + +[IMPORTANT] +==== +The Solr chat model specifies the parameters to access the APIs, the LLM doesn't run internally in Solr + +==== + +A model is described by these parameters: + + +`class`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The model implementation. +Accepted values: + +* `dev.langchain4j.model.ollama.OllamaChatModel` +* `dev.langchain4j.model.mistralai.MistralAiChatModel` +* `dev.langchain4j.model.anthropic.AnthropicChatModel` +* `dev.langchain4j.model.openai.OpenAiChatModel` +* `dev.langchain4j.model.googleai.GoogleAiGeminiChatModel` + +`name`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The identifier of your model, this is used by any component that intends to use the model (e.g., `DocumentEnrichmentUpdateProcessorFactory` update processor). + +`params`:: ++ +[%autowidth,frame=none] +|=== +|Optional |Default: none +|=== ++ +Each model class has potentially different params. +Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: https://docs.langchain4j.dev/category/language-models[Chat Models in LangChain4j]. + +=== Supported Models +Apache Solr uses https://github.com/langchain4j/langchain4j[LangChain4j] to support document enrichment with LLMs. +The models currently supported are: + +[tabs#supported-chat-models] +====== +Ollama:: ++ +==== + +[source,json] +---- +{ + "class": "dev.langchain4j.model.ollama.OllamaChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "http://localhost:11434", + "modelName": "<a-local/hosted-chat-model>", + "timeout": 300, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +MistralAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.mistralai.MistralAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.mistral.ai/v1", + "apiKey": "<your-mistralAI-api-key>", + "modelName": "<a-mistralAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +OpenAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "<your-openAI-api-key>", + "modelName": "<a-openAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Anthropic:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.anthropic.AnthropicChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.anthropic.com/v1/", + "apiKey": "<your-anthropic-api-key>", + "modelName": "<a-anthropic-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Gemini:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.googleai.GoogleAiGeminiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://generativelanguage.googleapis.com/v1beta/", + "apiKey": "<your-geminiAi-api-key>", + "modelName": "<a-geminiAi-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +====== + +=== Uploading a Model + +To upload the model in a `/path/myModel.json` file, please run: + +[source,bash] +---- +curl -XPUT 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json' +---- + +To delete the `currentModel` model: + +[source,bash] +---- +curl -XDELETE 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store/currentModel' +---- + +To view all models: + +[source,text] +http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store + + +.Example: /path/myOpenAIModel.json +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "openai-1", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "apiKey-openAI", + "modelName": "gpt-5.4-nano", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- + +== How to Trigger Document Enrichment during Indexing +To create new fields starting from existent ones in your documents at indexing time you need to configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[Update Request Processor Chain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor in one of the 2 following way: + +* Update processor with parameter `prompt` Review Comment: Update processor definition with the `prompt` parameter ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat +completion. + +[IMPORTANT] +==== +The Solr chat model specifies the parameters to access the APIs, the LLM doesn't run internally in Solr + +==== + +A model is described by these parameters: + + +`class`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The model implementation. +Accepted values: + +* `dev.langchain4j.model.ollama.OllamaChatModel` +* `dev.langchain4j.model.mistralai.MistralAiChatModel` +* `dev.langchain4j.model.anthropic.AnthropicChatModel` +* `dev.langchain4j.model.openai.OpenAiChatModel` +* `dev.langchain4j.model.googleai.GoogleAiGeminiChatModel` + +`name`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The identifier of your model, this is used by any component that intends to use the model (e.g., `DocumentEnrichmentUpdateProcessorFactory` update processor). + +`params`:: ++ +[%autowidth,frame=none] +|=== +|Optional |Default: none +|=== ++ +Each model class has potentially different params. +Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: https://docs.langchain4j.dev/category/language-models[Chat Models in LangChain4j]. + +=== Supported Models +Apache Solr uses https://github.com/langchain4j/langchain4j[LangChain4j] to support document enrichment with LLMs. +The models currently supported are: + +[tabs#supported-chat-models] +====== +Ollama:: ++ +==== + +[source,json] +---- +{ + "class": "dev.langchain4j.model.ollama.OllamaChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "http://localhost:11434", + "modelName": "<a-local/hosted-chat-model>", + "timeout": 300, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +MistralAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.mistralai.MistralAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.mistral.ai/v1", + "apiKey": "<your-mistralAI-api-key>", + "modelName": "<a-mistralAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +OpenAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "<your-openAI-api-key>", + "modelName": "<a-openAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Anthropic:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.anthropic.AnthropicChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.anthropic.com/v1/", + "apiKey": "<your-anthropic-api-key>", + "modelName": "<a-anthropic-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Gemini:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.googleai.GoogleAiGeminiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://generativelanguage.googleapis.com/v1beta/", + "apiKey": "<your-geminiAi-api-key>", + "modelName": "<a-geminiAi-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +====== + +=== Uploading a Model + +To upload the model in a `/path/myModel.json` file, please run: + +[source,bash] +---- +curl -XPUT 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json' +---- + +To delete the `currentModel` model: + +[source,bash] +---- +curl -XDELETE 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store/currentModel' +---- + +To view all models: + +[source,text] +http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store + + +.Example: /path/myOpenAIModel.json +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "openai-1", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "apiKey-openAI", + "modelName": "gpt-5.4-nano", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- + +== How to Trigger Document Enrichment during Indexing +To create new fields starting from existent ones in your documents at indexing time you need to configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[Update Request Processor Chain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor in one of the 2 following way: + +* Update processor with parameter `prompt` ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +* Update processor with parameter `promptFile`: in this case, the file `prompt.txt` must be uploaded to Solr similarly to any other configuration file (e.g., `solrconfig.xml`, `synonyms.txt`, etc.) ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="promptFile">prompt.txt</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +Exactly one of the following parameters is required: `prompt` or `promptFile`. + +Another important feature of this module is that one (or more) `inputField` needs to be injected in the prompt. This is +done by some special tokens, that are the `fieldName` surrounded by curly brackets (e.g., `{string_field}`, in the +example above). These tokens are _mandatory_ for this module to work properly. Solr will throw an error if the Review Comment: above or below? ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat +completion. + +[IMPORTANT] +==== +The Solr chat model specifies the parameters to access the APIs, the LLM doesn't run internally in Solr + +==== + +A model is described by these parameters: + + +`class`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The model implementation. +Accepted values: + +* `dev.langchain4j.model.ollama.OllamaChatModel` +* `dev.langchain4j.model.mistralai.MistralAiChatModel` +* `dev.langchain4j.model.anthropic.AnthropicChatModel` +* `dev.langchain4j.model.openai.OpenAiChatModel` +* `dev.langchain4j.model.googleai.GoogleAiGeminiChatModel` + +`name`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The identifier of your model, this is used by any component that intends to use the model (e.g., `DocumentEnrichmentUpdateProcessorFactory` update processor). + +`params`:: ++ +[%autowidth,frame=none] +|=== +|Optional |Default: none +|=== ++ +Each model class has potentially different params. +Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: https://docs.langchain4j.dev/category/language-models[Chat Models in LangChain4j]. + +=== Supported Models +Apache Solr uses https://github.com/langchain4j/langchain4j[LangChain4j] to support document enrichment with LLMs. +The models currently supported are: + +[tabs#supported-chat-models] +====== +Ollama:: ++ +==== + +[source,json] +---- +{ + "class": "dev.langchain4j.model.ollama.OllamaChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "http://localhost:11434", + "modelName": "<a-local/hosted-chat-model>", + "timeout": 300, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +MistralAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.mistralai.MistralAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.mistral.ai/v1", + "apiKey": "<your-mistralAI-api-key>", + "modelName": "<a-mistralAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +OpenAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "<your-openAI-api-key>", + "modelName": "<a-openAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Anthropic:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.anthropic.AnthropicChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.anthropic.com/v1/", + "apiKey": "<your-anthropic-api-key>", + "modelName": "<a-anthropic-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Gemini:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.googleai.GoogleAiGeminiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://generativelanguage.googleapis.com/v1beta/", + "apiKey": "<your-geminiAi-api-key>", + "modelName": "<a-geminiAi-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +====== + +=== Uploading a Model + +To upload the model in a `/path/myModel.json` file, please run: + +[source,bash] +---- +curl -XPUT 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json' +---- + +To delete the `currentModel` model: + +[source,bash] +---- +curl -XDELETE 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store/currentModel' +---- + +To view all models: + +[source,text] +http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store + + +.Example: /path/myOpenAIModel.json +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "openai-1", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "apiKey-openAI", + "modelName": "gpt-5.4-nano", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- + +== How to Trigger Document Enrichment during Indexing +To create new fields starting from existent ones in your documents at indexing time you need to configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[Update Request Processor Chain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor in one of the 2 following way: + +* Update processor with parameter `prompt` ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +* Update processor with parameter `promptFile`: in this case, the file `prompt.txt` must be uploaded to Solr similarly to any other configuration file (e.g., `solrconfig.xml`, `synonyms.txt`, etc.) ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="promptFile">prompt.txt</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +Exactly one of the following parameters is required: `prompt` or `promptFile`. + +Another important feature of this module is that one (or more) `inputField` needs to be injected in the prompt. This is +done by some special tokens, that are the `fieldName` surrounded by curly brackets (e.g., `{string_field}`, in the +example above). These tokens are _mandatory_ for this module to work properly. Solr will throw an error if the +parameters are not properly defined. +For example, both the prompt and the content of the file prompt.txt, must contain the text '{string_field}', which +will be substituted with the content of the `string_field` field for each document. An example of a valid prompt with +multiple input fields is as follows: + +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">title</str> + <str name="inputField">body</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize with the following information. Title: {title}. Body: {body}.</str> + <str name="model">chat-model</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +Another way of using more than one `inputField` is by using the following notation, instead of more than one parameter +with the same name: +[source,xml] +---- +<arr name="inputField"> + <str>title</str> + <str>body</str> +</arr> +---- + +The LLM response is mapped to the specified `outputField`. Note that this module only supports a subset of Solr's Review Comment: Maybe we can also specify that only one outputField is supported ########## solr/solr-ref-guide/modules/indexing-guide/pages/document-enrichment-with-llms.adoc: ########## @@ -0,0 +1,500 @@ += Document Enrichment with LLMs +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +This module brings the power of *Large Language Models* to Solr. + +More specifically, it provides the capability, at indexing time, given a prompt and a set of input fields, of calling an +LLM through https://github.com/langchain4j/langchain4j[LangChain4j] for each document and store the result of the call +in an `outputField`, that can be of multiple types and even multivalued. + +_Without_ this module, the LLM calls must be done _outside_ Solr, before indexing. + +[IMPORTANT] +==== +This module sends your documents off to some hosted service on the internet. +There are cost, privacy, performance, and service availability implications on such a strong dependency that should be +diligently examined before employing this module in a serious way. + +==== + +At the moment a subset of LLM providers supported by LangChain4j is supported by Solr. + +*Disclaimer*: Apache Solr is *in no way* affiliated to any of these corporations or services. + +If you want to add support for additional services or improve the support for the existing ones, feel free to +contribute: + +* https://github.com/apache/solr/blob/main/CONTRIBUTING.md[Contributing to Solr] + +== Module + +This is provided via the `language-models` xref:configuration-guide:solr-modules.adoc[Solr Module] that needs to be +enabled before use. + +== Language Model Configuration + +Language Models is a module and therefore its plugins must be configured in `solrconfig.xml`. + +=== Minimum Requirements + +* Enable the `language-models` module to make the Language Models classes available on Solr's classpath. +See xref:configuration-guide:solr-modules.adoc[Solr Module] for more details. + +* An update processor, similar to the one below, must be declared in `solrconfig.xml`: ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- +[NOTE] +==== +If no component is configured in `solrconfig.xml`, the `ChatModel` store will not be registered and requests to +`/schema/chat-model-store` will return an error. +==== + +== Chat Model Configuration + +=== Models + +* A model in this module is a chat model, that answers with text given a prompt. +* A model in this Solr module is a reference to an external API that runs the Large Language Model responsible for chat +completion. + +[IMPORTANT] +==== +The Solr chat model specifies the parameters to access the APIs, the LLM doesn't run internally in Solr + +==== + +A model is described by these parameters: + + +`class`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The model implementation. +Accepted values: + +* `dev.langchain4j.model.ollama.OllamaChatModel` +* `dev.langchain4j.model.mistralai.MistralAiChatModel` +* `dev.langchain4j.model.anthropic.AnthropicChatModel` +* `dev.langchain4j.model.openai.OpenAiChatModel` +* `dev.langchain4j.model.googleai.GoogleAiGeminiChatModel` + +`name`:: ++ +[%autowidth,frame=none] +|=== +s|Required |Default: none +|=== ++ +The identifier of your model, this is used by any component that intends to use the model (e.g., `DocumentEnrichmentUpdateProcessorFactory` update processor). + +`params`:: ++ +[%autowidth,frame=none] +|=== +|Optional |Default: none +|=== ++ +Each model class has potentially different params. +Many are shared but for the full set of parameters of the model you are interested in please refer to the official documentation of the LangChain4j version included in Solr: https://docs.langchain4j.dev/category/language-models[Chat Models in LangChain4j]. + +=== Supported Models +Apache Solr uses https://github.com/langchain4j/langchain4j[LangChain4j] to support document enrichment with LLMs. +The models currently supported are: + +[tabs#supported-chat-models] +====== +Ollama:: ++ +==== + +[source,json] +---- +{ + "class": "dev.langchain4j.model.ollama.OllamaChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "http://localhost:11434", + "modelName": "<a-local/hosted-chat-model>", + "timeout": 300, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +MistralAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.mistralai.MistralAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.mistral.ai/v1", + "apiKey": "<your-mistralAI-api-key>", + "modelName": "<a-mistralAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +OpenAI:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "<your-openAI-api-key>", + "modelName": "<a-openAI-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Anthropic:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.anthropic.AnthropicChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://api.anthropic.com/v1/", + "apiKey": "<your-anthropic-api-key>", + "modelName": "<a-anthropic-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== + +Gemini:: ++ +==== +[source,json] +---- +{ + "class": "dev.langchain4j.model.googleai.GoogleAiGeminiChatModel", + "name": "<a-name-for-your-model>", + "params": { + "baseUrl": "https://generativelanguage.googleapis.com/v1beta/", + "apiKey": "<your-geminiAi-api-key>", + "modelName": "<a-geminiAi-chat-model>", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- +==== +====== + +=== Uploading a Model + +To upload the model in a `/path/myModel.json` file, please run: + +[source,bash] +---- +curl -XPUT 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json' +---- + +To delete the `currentModel` model: + +[source,bash] +---- +curl -XDELETE 'http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store/currentModel' +---- + +To view all models: + +[source,text] +http://localhost:8983/solr/YOUR_COLLECTION/schema/chat-model-store + + +.Example: /path/myOpenAIModel.json +[source,json] +---- +{ + "class": "dev.langchain4j.model.openai.OpenAiChatModel", + "name": "openai-1", + "params": { + "baseUrl": "https://api.openai.com/v1", + "apiKey": "apiKey-openAI", + "modelName": "gpt-5.4-nano", + "timeout": 60, + "logRequests": true, + "logResponses": true, + "maxRetries": 5 + } +} +---- + +== How to Trigger Document Enrichment during Indexing +To create new fields starting from existent ones in your documents at indexing time you need to configure an {solr-javadocs}/core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[Update Request Processor Chain] that includes at least one `DocumentEnrichmentUpdateProcessor` update request processor in one of the 2 following way: + +* Update processor with parameter `prompt` ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize this content: {string_field}</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +* Update processor with parameter `promptFile`: in this case, the file `prompt.txt` must be uploaded to Solr similarly to any other configuration file (e.g., `solrconfig.xml`, `synonyms.txt`, etc.) ++ +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">string_field</str> + <str name="outputField">summary</str> + <str name="promptFile">prompt.txt</str> + <str name="model">model-name</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +Exactly one of the following parameters is required: `prompt` or `promptFile`. + +Another important feature of this module is that one (or more) `inputField` needs to be injected in the prompt. This is +done by some special tokens, that are the `fieldName` surrounded by curly brackets (e.g., `{string_field}`, in the +example above). These tokens are _mandatory_ for this module to work properly. Solr will throw an error if the +parameters are not properly defined. +For example, both the prompt and the content of the file prompt.txt, must contain the text '{string_field}', which +will be substituted with the content of the `string_field` field for each document. An example of a valid prompt with +multiple input fields is as follows: + +[source,xml] +---- +<updateRequestProcessorChain name="documentEnrichment"> + <processor class="solr.languagemodels.documentenrichment.update.processor.DocumentEnrichmentUpdateProcessorFactory"> + <str name="inputField">title</str> + <str name="inputField">body</str> + <str name="outputField">summary</str> + <str name="prompt">Summarize with the following information. Title: {title}. Body: {body}.</str> + <str name="model">chat-model</str> + </processor> + <processor class="solr.RunUpdateProcessorFactory"/> + </updateRequestProcessorChain> +---- + +Another way of using more than one `inputField` is by using the following notation, instead of more than one parameter +with the same name: +[source,xml] +---- +<arr name="inputField"> + <str>title</str> + <str>body</str> +</arr> +---- + +The LLM response is mapped to the specified `outputField`. Note that this module only supports a subset of Solr's +available field types, which includes: + +* *String/Text*: `StrField`, `TextField`, `SortableTextField` +* *Date*: `DatePointField` (the LLM must return an ISO-8601 date string; it might be useful to tune your prompt accordingly, to avoid indexing errors) +* *Numeric*: `IntPointField`, `LongPointField`, `FloatPointField`, `DoublePointField` +* *Boolean*: `BoolField` + + +These fields _can_ be multivalued. Solr uses structured output from LangChain4j to deal with LLMs' responses. + + +For more details on how to work with update request processors in Apache Solr, please refer to the dedicated page: +xref:configuration-guide:update-request-processors.adoc[Update Request Processor] + +[IMPORTANT] +==== +This update processor sends your document field content off to some hosted service on the internet. +There are serious performance implications that should be diligently examined before employing this component in production. +It will slow down substantially your indexing pipeline so make sure to stress test your solution before going live. + +==== + +[NOTE] +==== +If any `inputField` value is absent or empty for a given document, enrichment is silently skipped for that document: +the `outputField` is not added and the document is indexed as-is. + +If the LLM call fails at runtime (e.g., network error, model timeout), the exception is caught and logged but is +*non-fatal*: the document is still indexed without the `outputField`. +Monitor your indexing logs to detect documents that were not enriched as expected. +==== + +=== Index first and enrich your documents on a second pass +LLM calls are usually quite slow, so, depending on your use case it could be a good idea to index first your documents Review Comment: LLM calls are typically slow, so depending on your use case, it may be preferable to first index your documents and enrich them with LLM-generated fields at a later stage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
