[ https://issues.apache.org/jira/browse/SOLR-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865107#comment-17865107 ]
Eric Pugh commented on SOLR-10359: ---------------------------------- I wanted to share some work that I've been doing in this space. As part of another project, I've been able to contribute to a standard we are calling "User Behavior Interactions" for tracking what users are doing. This standard, which is NOT tied to any specific search engine, like Solr, is documented at [https://github.com/o19s/ubi.] There is a draft PR for implementing UBI for Solr here: [https://github.com/apache/solr/pull/2452] I have hopes that in the latter half of 2024, we'll be publishing some jupyter notebook style demonstration code for taking UBI based data and producing implicit judgements from that data ;). > User Interactions Logging Module > -------------------------------- > > Key: SOLR-10359 > URL: https://issues.apache.org/jira/browse/SOLR-10359 > Project: Solr > Issue Type: New Feature > Reporter: Alessandro Benedetti > Priority: Major > Labels: CTR, evaluation > > *Introduction* > Being able to evaluate the quality of your search engine is becoming more and > more important day by day. > This issue is to put a milestone to integrate online evaluation metrics with > Solr. > *Scope* > Scope of this issue is to provide a set of components able to : > 1) Collect Search Results impressions ( results shown per query) > 2) Collect Users interactions ( user interactions on the search results per > query e.g. clicks, bookmarking,ect ) > 3) Calculate evaluation metrics on demand, such as Click Through Rate, DCG ... > *Technical Design* > A SearchComponent can be designed : > *UsersEventsLoggerComponent* > A property (such as storeDir) will define where the data collected will be > stored. > Different data structures can be explored, to keep it simple, a first > implementation can be a Lucene Index. > *Data Model* > The user event can be modelled in the following way : > <query> - the user query the event is related to > <result_id> - the ID of the search result involved in the interaction > <result_position> - the position in the ranking of the search result involved > in the interaction > <timestamp> - time when the interaction happened > <relevancy_rating> - 0 for impressions, a value between 1-5 to identify the > type of user event, the semantic will depend on the domain and use cases > <test_group> - this can identify a variant, in A/B testing > *Impressions Logging* > When the SearchComponent is assigned to a request handler, everytime it > processes a request and return to the user a result set for a query, the > component will collect the impressions ( results returned) and index them in > the auxiliary lucene index. > This will happen in parallel as soon as you return the results to avoid > affecting the query time. > Of course an impact on CPU load and memory is expected, will be interesting > to minimise it. > *User Events Logging* > An UpdateHandler will be exposed to accept POST requests and collect user > events. > Everytime a request is sent, the user event will be indexed in the underline > auxiliary Lucene Index. > *Stats Calculation* > A RequestHandler will be exposed to be able to calculate stats and > aggregations for the metrics : > /evaluation?metric=ctr&stats=query&compare=testA,testB > This request could calculate the CTR for our testA and testB to compare. > Showing stats in total and per query ( to highlight the queries with > lower/higher CTR). > The calculations will happen separating the <test_group> for an easy > comparison. > Will be important to keep it as simple as possible for a first version, to > then extend it as much as we like -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org