Dear all,
Very good morning please.I have some big texts in my tables. On average, each 
row contains about 4.2KB data and there are 9.5 million rows.I want to perform 
various conceptual searches on technical terms, technical phrases and would 
like to retrieve all texts with nearest meanings.  So I have to vectorize the 
data.What is the best approach please?
I was trying to fragment the data into small fragments of 4.2 KB & then do 
embedding using small vector size with the help of pgvector.Once I have the 
embedding vectors on fragments, then I can combine them using some close 
relationship model or average.
This way, we generate embedding for the full text.
Or would you recommend any other approach to generate embedding for the full 
text please?
Also I have another question. I have title, abstract & description where 
description is about 3KB and I would like to search title, abstract, 
description. Should I merge all the data (& generate embeddings) or keep the 
embeddings separate?
Have a wonderful day please.Thank you,Apurba K. Saha

Reply via email to