Hi everyone,

I’d like to propose an enhancement to the Text to Vector(language-models)
module to support pluggable/custom embedding model implementations.

At the moment, SolrTextToVectorModel is tightly coupled to LangChain4j’s
EmbeddingModel interface. This effectively limits support to the bundled
LangChain4j providers (HuggingFace, OpenAI, etc.). If someone wants to
integrate a custom embedding endpoint, they currently need to implement the
full LangChain4j EmbeddingModel interface, including its builder
conventions — even if they don’t otherwise use LangChain4j.

There’s also no Solr-native abstraction for text-to-vector conversion today.

My proposal is to introduce a Solr-native TextToVectorModel interface and
decouple the module from LangChain4j. For backward compatibility, we could
add a Langchain4jModelAdapter that implements TextToVectorModel by wrapping
a LangChain4j EmbeddingModel. That way, existing configurations would
continue to work unchanged.

With this approach, users could implement TextToVectorModel in their own
JAR, drop it into Solr’s classpath, and register it via the existing REST
API without taking on a LangChain4j dependency.

The change would involve:

   - adding TextToVectorModel
   - adding Langchain4jModelAdapter
   - updating SolrTextToVectorModel factory logic to support both paths

I’d appreciate feedback on whether this direction makes sense. I’m happy to
open a JIRA and put together a draft PR for discussion. I have a working
implementation locally that demonstrates the approach.


Thanks,

Prathmesh Deshmukh

Reply via email to