Alessandro Benedetti created SOLR-10359:
-------------------------------------------
Summary: Users Events Logger Component
Key: SOLR-10359
URL: https://issues.apache.org/jira/browse/SOLR-10359
Project: Solr
Issue Type: New Feature
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Alessandro Benedetti
*Introduction*
Being able to evaluate the quality of your search engine is becoming more and
more important day by day.
This issue is to put a milestone to integrate online evaluation metrics with
Solr.
*Scope*
Scope of this issue is to provide a set of components able to :
1) Collect Search Results impressions ( results shown per query)
2) Collect Users events ( user interactions on the search results per query
e.g. clicks, bookmarking,ect )
3) Calculate evaluation metrics on demand, such as Click Through Rate, DCG ...
*Technical Design*
A SearchComponent can be designed :
*UsersEventsLoggerComponent*
A property (such as storeDir) will define where the data collected will be
stored.
Different data structures can be explored, to keep it simple, a first
implementation can be a Lucene Index.
*Data Model*
The user event can be modelled in the following way :
<query> - the user query the event is related to
<result_id> - the ID of the search result involved in the interaction
<result_position> - the position in the ranking of the search result involved
in the interaction
<timestamp> - time when the interaction happened
<relevancy_rating> - 0 for impressions, a value between 1-5 to identify the
type of user event, the semantic will depend on the domain and use cases
<test_group> - this can identify a variant, in A/B testing
*Impressions Logging*
When the SearchComponent is assigned to a request handler, everytime it
processes a request and return to the user a result set for a query, the
component will collect the impressions ( results returned) and index them in
the auxiliary lucene index.
This will happen in parallel as soon as you return the results to avoid
affecting the query time.
Of course an impact on CPU load and memory is expected, will be interesting to
minimise it.
* User Events Logging *
An UpdateHandler will be exposed to accept POST requests and collect user
events.
Everytime a request is sent, the user event will be indexed in the underline
auxiliary Lucene Index.
* Stats Calculation *
A RequestHandler will be exposed to be able to calculate stats and aggregations
for the metrics :
/evaluation?metric=ctr&stats=query&compare=testA,testB
This request could calculate the CTR for our testA and testB to compare.
Showing stats in total and per query ( to highlight the queries with
lower/higher CTR).
The calculations will happen separating the <test_group> for an easy comparison.
Will be important to keep it as simple as possible for a first version, to then
extend it as much as we like
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]