gabrielmagno commented on PR #1213: URL: https://github.com/apache/solr/pull/1213#issuecomment-1337440280
@epugh for this version I combined two "example" models (BERT + item2vec), just to server as an example. If we are willing provide the instructions on how to create the models and the vectors itself, I guess it would be better to use a single model solution, for simplicity. I could recreate the vectors using only BERT (which I believe is good enought for our example). The easiest way I know to create a vector representation of text data is by using the `sentence_transformers` Python library with a pre-trained BERT model. It is possible to create vectors with 3 lines of code: ``` from sentence_transformers import SentenceTransformer, util model = SentenceTransformer("all-mpnet-base-v2") my_vector = model.encode("This is my text") ``` The only issue is that the vectors from this model have **768 dimensions**. For the example I simply got the first 5 dimensions and concatenate to the other model. This is not a really appropriate way to create the vector in real scenarios. There are other techniques (e.g. Model Distillation) that could reduce the number of dimensions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org