Hi Joaquin

Very neat, thanks for sharing,

Viewing search relevance as something akin to a classification problem is
actually a driving narrative in Taming Search <http://manning.com/turnbull>.
We generalize the relevance problem as one of measuring the similarity
between features of content (locations of restaurants, price of a product,
the words in the body of articles, expanded synonyms in articles, etc) and
features of a query (the search terms, user usage history, any location,
etc). What makes search interesting is that unlike other classification
systems, search has built in similarity systems (largely TF*IDF).

So we actually cut the other direction from your talk. It appears that you
amend the search engine to change the underlying scoring to be based on
machine learning constructs. In our book, we work the opposite way. We
largely enable feature similarity classifications between document and
query by massaging features into terms and use the built in TF*IDF or other
relevant similarity approach.

We feel this plays to the advantages of a search engine. Search engines
already have some basic text analysis built in. They've also been heavily
optimized for most forms of text-based similarity. If you can massage text
such that your TF*IDF similarity reflects a rough proportion of text-based
features important to your users, this tends to reflect their intuitive
notions of relevance. A lot of this work involves feature section, or what
we term in the book feature modeling. What features should you introduce to
your documents that can be used to generate good signals at ranking time.

You can read more about our thoughts here
<http://java.dzone.com/articles/solr-and-elasticsearch>.

That all being said, what makes your stuff interesting is when you have
enough supervised training data over good-enough features. This can be hard
to do for a broad swatch of "middle tier" search applications, but
increasingly useful as scale goes up. I'd be interested to hear your
thoughts on this article
<http://opensourceconnections.com/blog/2014/10/08/when-click-scoring-can-hurt-search-relevance-a-roadmap-to-better-signals-processing-in-search/>
I wrote about collecting click tracking and other relevance feedback data:

Good stuff! Again, thanks for sharing,
-Doug



On Wed, Apr 29, 2015 at 6:58 PM, J. Delgado <joaquin.delg...@gmail.com>
wrote:

> Here is a presentation on the topic:
>
> http://www.slideshare.net/joaquindelgado1/where-search-meets-machine-learning04252015final
>
> Search can be viewed as a combination of a) A problem of constraint
> satisfaction, which is the process of finding a solution to a set of
> constraints (query) that impose conditions that the variables (fields) must
> satisfy with a resulting object (document) being a solution in the feasible
> region (result set), plus b) A scoring/ranking problem of assigning values
> to different alternatives, according to some convenient scale. This
> ultimately provides a mechanism to sort various alternatives in the result
> set in order of importance, value or preference. In particular scoring in
> search has evolved from being a document centric calculation (e.g. TF-IDF)
> proper from its information retrieval roots, to a function that is more
> context sensitive (e.g. include geo-distance ranking) or user centric (e.g.
> takes user parameters for personalization) as well as other factors that
> depend on the domain and task at hand. However, most system that
> incorporate machine learning techniques to perform classification or
> generate scores for these specialized tasks do so as a post retrieval
> re-ranking function, outside of search! In this talk I show ways of
> incorporating advanced scoring functions, based on supervised learning and
> bid scaling models, into popular search engines such as Elastic Search and
> potentially SOLR. I'll provide practical examples of how to construct such
> "ML Scoring" plugins in search to generalize the application of a search
> engine as a model evaluator for supervised learning tasks. This will
> facilitate the building of systems that can do computational advertising,
> recommendations and specialized search systems, applicable to many domains.
>
> Code to support it (only elastic search for now):
> https://github.com/sdhu/elasticsearch-prediction
>
> -- J
>
>
>
>
>


-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Taming Search <http://manning.com/turnbull> from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Reply via email to