gabrielmagno commented on PR #1213:
URL: https://github.com/apache/solr/pull/1213#issuecomment-1337440280

   @epugh for this version I combined two "example" models (BERT + item2vec), 
just to server as an example.
   
   If we are willing provide the instructions on how to create the models and 
the vectors itself, I guess it would be better to use a single model solution, 
for simplicity. I could recreate the vectors using only BERT (which I believe 
is good enought for our example).
   
   The easiest way I know to create a vector representation of text data is by 
using the `sentence_transformers` Python library with a pre-trained BERT model. 
It is possible to create vectors with 3 lines of code:
   ```
   from sentence_transformers import SentenceTransformer, util
   
   model = SentenceTransformer("all-mpnet-base-v2")
   
   my_vector = model.encode("This is my text")
   ```
   
   The only issue is that the vectors from this model have **768 dimensions**. 
For the example I simply got the first 5 dimensions and concatenate to the 
other model. This is not a really appropriate way to create the vector in real 
scenarios. There are other techniques (e.g. Model Distillation) that could 
reduce the number of dimensions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to