[ 
https://issues.apache.org/jira/browse/SOLR-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865107#comment-17865107
 ] 

Eric Pugh commented on SOLR-10359:
----------------------------------

I wanted to share some work that I've been doing in this space.   As part of 
another project, I've been able to contribute to a standard we are calling 
"User Behavior Interactions" for tracking what users are doing.  This standard, 
which is NOT tied to any specific search engine, like Solr, is documented at 
[https://github.com/o19s/ubi.]   There is a draft PR for implementing UBI for 
Solr here: [https://github.com/apache/solr/pull/2452]

I have hopes that in the latter half of 2024, we'll be publishing some jupyter 
notebook style demonstration code for taking UBI based data and producing 
implicit judgements from that data ;).  

> User Interactions Logging Module
> --------------------------------
>
>                 Key: SOLR-10359
>                 URL: https://issues.apache.org/jira/browse/SOLR-10359
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Alessandro Benedetti
>            Priority: Major
>              Labels: CTR, evaluation
>
> *Introduction*
> Being able to evaluate the quality of your search engine is becoming more and 
> more important day by day.
> This issue is to put a milestone to integrate online evaluation metrics with 
> Solr.
> *Scope*
> Scope of this issue is to provide a set of components able to :
> 1) Collect Search Results impressions ( results shown per query)
> 2) Collect Users interactions ( user interactions on the search results per 
> query e.g. clicks, bookmarking,ect )
> 3) Calculate evaluation metrics on demand, such as Click Through Rate, DCG ...
> *Technical Design*
> A SearchComponent can be designed :
> *UsersEventsLoggerComponent*
> A property (such as storeDir) will define where the data collected will be 
> stored.
> Different data structures can be explored, to keep it simple, a first 
> implementation can be a Lucene Index.
> *Data Model*
> The user event can be modelled in the following way :
> <query> - the user query the event is related to
> <result_id> - the ID of the search result involved in the interaction
> <result_position> - the position in the ranking of the search result involved 
> in the interaction
> <timestamp> - time when the interaction happened
> <relevancy_rating> - 0 for impressions, a value between 1-5 to identify the 
> type of user event, the semantic will depend on the domain and use cases
> <test_group> - this can identify a variant, in A/B testing
> *Impressions Logging*
> When the SearchComponent  is assigned to a request handler, everytime it 
> processes a request and return to the user a result set for a query, the 
> component will collect the impressions ( results returned) and index them in 
> the auxiliary lucene index.
> This will happen in parallel as soon as you return the results to avoid 
> affecting the query time.
> Of course an impact on CPU load and memory is expected, will be interesting 
> to minimise it.
> *User Events Logging*
> An UpdateHandler will be exposed to accept POST requests and collect user 
> events.
> Everytime a request is sent, the user event will be indexed in the underline 
> auxiliary Lucene Index.
> *Stats Calculation*
> A RequestHandler will be exposed to be able to calculate stats and 
> aggregations for the metrics :
> /evaluation?metric=ctr&stats=query&compare=testA,testB
> This request could calculate the CTR for our testA and testB to compare.
> Showing stats in total and per query ( to highlight the queries with 
> lower/higher CTR).
> The calculations will happen separating the <test_group> for an easy 
> comparison.
> Will be important to keep it as simple as possible for a first version, to 
> then extend it as much as we like



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to