[ https://issues.apache.org/jira/browse/SOLR-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940882#comment-15940882 ]
Alexandre Rafalovitch commented on SOLR-10359: ---------------------------------------------- There seem to be two things mixed in here: * Logging the search queries and results received (either as count or as specific ids). And - maybe - statistics on that. * User interactions on front-end The first item can probably be solved with a SearchComponent and I would love to see what that could look like. Especially if it is flexible enough to be also used for debugging. The second one seems to be happening well out of Solr control (UI clicks, what user selected, etc). I am not sure if that fits into Solr itself. Commercial platforms (such as Fusion) might be integrating it, but they control more of a stack. > User Events Logger Component > ---------------------------- > > Key: SOLR-10359 > URL: https://issues.apache.org/jira/browse/SOLR-10359 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Alessandro Benedetti > Labels: CTR, evaluation > > *Introduction* > Being able to evaluate the quality of your search engine is becoming more and > more important day by day. > This issue is to put a milestone to integrate online evaluation metrics with > Solr. > *Scope* > Scope of this issue is to provide a set of components able to : > 1) Collect Search Results impressions ( results shown per query) > 2) Collect Users events ( user interactions on the search results per query > e.g. clicks, bookmarking,ect ) > 3) Calculate evaluation metrics on demand, such as Click Through Rate, DCG ... > *Technical Design* > A SearchComponent can be designed : > *UsersEventsLoggerComponent* > A property (such as storeDir) will define where the data collected will be > stored. > Different data structures can be explored, to keep it simple, a first > implementation can be a Lucene Index. > *Data Model* > The user event can be modelled in the following way : > <query> - the user query the event is related to > <result_id> - the ID of the search result involved in the interaction > <result_position> - the position in the ranking of the search result involved > in the interaction > <timestamp> - time when the interaction happened > <relevancy_rating> - 0 for impressions, a value between 1-5 to identify the > type of user event, the semantic will depend on the domain and use cases > <test_group> - this can identify a variant, in A/B testing > *Impressions Logging* > When the SearchComponent is assigned to a request handler, everytime it > processes a request and return to the user a result set for a query, the > component will collect the impressions ( results returned) and index them in > the auxiliary lucene index. > This will happen in parallel as soon as you return the results to avoid > affecting the query time. > Of course an impact on CPU load and memory is expected, will be interesting > to minimise it. > * User Events Logging * > An UpdateHandler will be exposed to accept POST requests and collect user > events. > Everytime a request is sent, the user event will be indexed in the underline > auxiliary Lucene Index. > * Stats Calculation * > A RequestHandler will be exposed to be able to calculate stats and > aggregations for the metrics : > /evaluation?metric=ctr&stats=query&compare=testA,testB > This request could calculate the CTR for our testA and testB to compare. > Showing stats in total and per query ( to highlight the queries with > lower/higher CTR). > The calculations will happen separating the <test_group> for an easy > comparison. > Will be important to keep it as simple as possible for a first version, to > then extend it as much as we like -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org