[
https://issues.apache.org/jira/browse/SOLR-17679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Khaled Alkhouli updated SOLR-17679:
-----------------------------------
Attachment: Screenshot from 2025-02-20 16-31-48.png
Description:
Hello Apache Solr team,
I am building a hybrid search engine that combines lexical search (traditional
keyword-based search) and vector search (semantic search using embeddings) in a
single request. I’m aiming to achieve the following in one request:
# *Lexical Search:* Using edismax with specified fields and weights.
# *Vector Search:* Using K-Nearest Neighbors (KNN) based on embeddings.
# *Hybrid Score Combination:* The final score is the sum of the normalized
lexical score and the vector search score. If a document appears in only one
search, the other score should be treated as zero.
I have implemented the following logic using Python:
{code:java}
def hybrid_search(query, top_k=10):
embedding = np.array(embed([query]), dtype=np.float32
embedding = list(embedding[0])
lxq= rf"""{{!type=edismax
qf='text'
q.op=OR
tie=0.1
bq=''
bf=''
boost=''
}}({query})"""
solr_query = {"params": {
"q": "{!bool filter=$retrievalStage must=$rankingStage}",
"rankingStage":
"{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))",
"retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", #
Union
"normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)",
"lexicalQuery": lxq,
"vectorQuery": f"{{!knn f=all_v512 topK={top_k}}}{embedding}",
"fl": "text",
"rows": top_k,
"fq": [""],
"rq": "{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}",
"rqq": "{!frange l=$cutoff}query($rankingStage)",
"sort": "score desc",
}}
response = requests.post(SOLR_URL, headers=HEADERS, json=solr_query)
response = response.json()
return response {code}
The response returns documents with a combined score, which I assume is the
addition of:
* *Lexical Search Score:* Normalized between 0 and 1.
* *Vector Search Score:* Already bounded between 0 and 1.
If a document is present in one search but not the other, the score from the
missing part is added as zero. Attached is an image of the current output.
h3. *Request:*
I would like documentation or guidance on the following:
# {*}View and Return Individual Scores:{*}{*}{*}1.1 Lexical search score
1.2 Vector search score
1.3 Final combined score (already retrieved)
I would like to display all three scores in the response together for each
document.
# *Cutoff Logic:*
I am using a Python function to calculate a cutoff threshold based on the
scores. Is it possible to implement this cutoff directly in Solr so that only
documents that pass a certain threshold are returned? If so, how can I achieve
this within Solr’s query syntax, without relying on external Python logic?
How can I retrieve the following scores in the same request?
*
I appreciate any help or documentation that can assist with:
* Returning separate scores for lexical and vector queries.
* Implementing cutoff logic natively in Solr.
Thank you!
was:
Hello Apache Solr team,
I am building a hybrid search engine that combines lexical search (traditional
keyword-based search) and vector search (semantic search using embeddings) in a
single request. I’m aiming to achieve the following in one request:
# *Lexical Search:* Using edismax with specified fields and weights.
# *Vector Search:* Using K-Nearest Neighbors (KNN) based on embeddings.
# *Hybrid Score Combination:* The final score is the sum of the normalized
lexical score and the vector search score. If a document appears in only one
search, the other score should be treated as zero.
I have implemented the following logic using Python:
{{}}
{code:java}
def hybrid_search(query, top_k=10):
embedding = np.array(embed([query]), dtype=np.float32) embedding =
list(embedding[0]) lxq = rf"""{{!type=edismax qf='all_txt'
q.op=OR tie=0.1
}}({query_terms})""" solr_query = { "params": { "q":
"{!bool filter=$retrievalStage must=$rankingStage}",
"rankingStage":
"{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))",
"retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", # Union
"normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)",
"lexicalQuery": lxq, "vectorQuery": f"{{!knn f=all_v512
topK={top_k}}}{embedding}", "fl": "post_id,all_txt,score",
"rows": top_k, "fq": [""], "rq": "{!rerank
reRankQuery=$rqq reRankDocs=100 reRankWeight=3}", "rqq": "{!frange
l=$cutoff}query($rankingStage)", "sort": "score desc",
"cutoff": f"{cutoff_ratio}" } } response =
requests.post(SOLR_URL, headers=HEADERS, json=solr_query) response =
response.json() return response
{code}
The response returns documents with a combined score, which I assume is the
addition of:
* *Lexical Search Score:* Normalized between 0 and 1.
* *Vector Search Score:* Already bounded between 0 and 1.
If a document is present in one search but not the other, the score from the
missing part is added as zero.
h3. *Request:*
I would like documentation or guidance on the following:
# *View and Return Individual Scores:*
How can I retrieve the following scores in the same request?
** Lexical search score
** Vector search score
** Final combined score (already retrieved)
I would like to display all three scores in the response together for each
document.
# *Cutoff Logic:*
I am using a Python function to calculate a cutoff threshold based on the
scores. Is it possible to implement this cutoff directly in Solr so that only
documents that pass a certain threshold are returned? If so, how can I achieve
this within Solr’s query syntax, without relying on external Python logic?
I appreciate any help or documentation that can assist with:
* Returning separate scores for lexical and vector queries.
* Implementing cutoff logic natively in Solr.
Thank you!
Labels: hybrid-search search solr vector-based-search (was: )
> Request for Documentation on Hybrid Lexical and Vector Search with Score
> Breakdown and Cutoff Logic
> ---------------------------------------------------------------------------------------------------
>
> Key: SOLR-17679
> URL: https://issues.apache.org/jira/browse/SOLR-17679
> Project: Solr
> Issue Type: Task
> Components: search
> Affects Versions: 9.6.1
> Reporter: Khaled Alkhouli
> Priority: Major
> Labels: hybrid-search, search, solr, vector-based-search
> Attachments: Screenshot from 2025-02-20 16-31-48.png
>
>
> Hello Apache Solr team,
> I am building a hybrid search engine that combines lexical search
> (traditional keyword-based search) and vector search (semantic search using
> embeddings) in a single request. I’m aiming to achieve the following in one
> request:
> # *Lexical Search:* Using edismax with specified fields and weights.
> # *Vector Search:* Using K-Nearest Neighbors (KNN) based on embeddings.
> # *Hybrid Score Combination:* The final score is the sum of the normalized
> lexical score and the vector search score. If a document appears in only one
> search, the other score should be treated as zero.
> I have implemented the following logic using Python:
> {code:java}
> def hybrid_search(query, top_k=10):
> embedding = np.array(embed([query]), dtype=np.float32
> embedding = list(embedding[0])
> lxq= rf"""{{!type=edismax
> qf='text'
> q.op=OR
> tie=0.1
> bq=''
> bf=''
> boost=''
> }}({query})"""
> solr_query = {"params": {
> "q": "{!bool filter=$retrievalStage must=$rankingStage}",
> "rankingStage":
> "{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))",
> "retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}",
> # Union
> "normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)",
> "lexicalQuery": lxq,
> "vectorQuery": f"{{!knn f=all_v512 topK={top_k}}}{embedding}",
> "fl": "text",
> "rows": top_k,
> "fq": [""],
> "rq": "{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}",
> "rqq": "{!frange l=$cutoff}query($rankingStage)",
> "sort": "score desc",
> }}
> response = requests.post(SOLR_URL, headers=HEADERS, json=solr_query)
> response = response.json()
> return response {code}
> The response returns documents with a combined score, which I assume is the
> addition of:
> * *Lexical Search Score:* Normalized between 0 and 1.
> * *Vector Search Score:* Already bounded between 0 and 1.
> If a document is present in one search but not the other, the score from the
> missing part is added as zero. Attached is an image of the current output.
> h3. *Request:*
> I would like documentation or guidance on the following:
> # {*}View and Return Individual Scores:{*}{*}{*}1.1 Lexical search score
> 1.2 Vector search score
> 1.3 Final combined score (already retrieved)
> I would like to display all three scores in the response together for each
> document.
> # *Cutoff Logic:*
> I am using a Python function to calculate a cutoff threshold based on the
> scores. Is it possible to implement this cutoff directly in Solr so that only
> documents that pass a certain threshold are returned? If so, how can I
> achieve this within Solr’s query syntax, without relying on external Python
> logic?
> How can I retrieve the following scores in the same request?
> *
> I appreciate any help or documentation that can assist with:
> * Returning separate scores for lexical and vector queries.
> * Implementing cutoff logic natively in Solr.
> Thank you!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]