subject:"Solr relevancy tuning"

Re: Solr relevancy tuning

2014-06-12 Thread Doug Turnbull

I realize I never responded to this thread, shame on me!

Jorge/Giovanni Kelvin looks pretty cool -- thanks for sharing it. When we
use Quepid ,we sometimes do it at places with existing relevancy test
scripts like Kelvin. Quepid/test scripts tend to satisfy different nitches.
In addition to testing, Quepid is a GUI for helping you explain/investigate
and sandbox in addition to test. Sometimes this is nice for fuzzier/more
qualitative judgments especially when you want to collaborate with
non-technical stakeholders. Its been our replacement for the "spreadsheet"
that a lot of our clients used before Quepid -- where the non-technical
folks would list

Scripts work very well for getting that pass/fail response. Its nice that
Kelvin gives you a "temperature" instead of necessarily a pass fail, that
level of fuzzyness is definitely useful.

We certainly see value in both (and will probably be doing more to
integrate Quepid with continuous integration/scripting).

Cheers,
-Doug


On Mon, May 5, 2014 at 2:47 AM, Jorge Luis Betancourt González <
jlbetanco...@uci.cu> wrote:

> One good thing about kelvin it's more a programmatic task, so you could
> execute the scripts after a few changes/deployment and get a general idea
> if the new changes has impacted into the search experience; yeah sure the
> changing catalog it's still a problem but I kind of like to be able to
> execute a few commands and presto get it done. This could become a must-run
> test in the test suite of the app. I kind of do this already but testing
> from the user interface, using the test library provided by symfony2
> (framework I'm using) and the functional tests. It's not
> test-driven-search-relevancy "perse" but we ensure not to mess up with some
> basic queries we use to test the search feature.
>
> - Original Message -
> From: "Giovanni Bricconi" 
> To: "solr-user" 
> Cc: "Ahmet Arslan" 
> Sent: Friday, April 11, 2014 5:15:56 AM
> Subject: Re: Solr relevancy tuning
>
> Hello Doug
>
> I have just watched the quepid demonstration video, and I strongly agree
> with your introduction: it is very hard to involve marketing/business
> people in repeated testing session, and speadsheets or other kind of files
> are not the right tool to use.
> Currenlty I'm quite alone in my tuning task and having a visual approach
> could be benefical for me, you are giving me many good inputs!
>
> I see that kelvin (my scripted tool) and queepid follows the same path. In
> queepid someone quickly whatches the results and applies colours to result,
> in kelvin you enter one on more queries (network cable, ethernet cable) and
> states that the result must contains ethernet in the title, or must come
> from a list of product categories.
>
> I also do diffs of results, before and after changes, to check what is
> going on; but I have to do that in a very unix-scripted way.
>
> Have you considered of placing a counter of total red/bad results in
> quepid? I use this index to have a quick overview of changes impact across
> all queries. Actually I repeat tests in production from times to time, and
> if I see the "kelvin temperature" rising (the number of errors going up) I
> know I have to check what's going on because new products maybe are having
> a bad impact on the index.
>
> I also keep counters of products with low quality images/no images at all
> or too short listings, sometimes are useful to undestand better what will
> happen if you change some bq/fq in the application.
>
> I see also that after changes in quepid someone have to check "gray"
> results and assign them a colour, in kelvin case sometimes the conditions
> can do a bit of magic (new product names still contains SM-G900F) but
> sometimes can introduce false errors (the new product name contains only
> Galaxy 5 and not the product code SM-G900F). So some checks are needed but
> with quepid everybody can do the check, with kelvin you have to change some
> line of a script, and not everybody is able/willing to do that.
>
> The idea of a static index is a good suggestion, I will try to have it in
> the next round of search engine improvement.
>
> Thank you Doug!
>
>
>
>
> 2014-04-09 17:48 GMT+02:00 Doug Turnbull <
> dturnb...@opensourceconnections.com>:
>
> > Hey Giovanni, nice to meet you.
> >
> > I'm the person that did the Test Driven Relevancy talk. We've got a
> product
> > Quepid (http://quepid.com) that lets you gather good/bad results for
> > queries and do a sort of test driven development against search
> relevancy.
> > Sounds similar to your existing scripted approach. Have you considered
> > keeping a st

Re: Solr relevancy tuning

2014-05-04 Thread Jorge Luis Betancourt González

One good thing about kelvin it's more a programmatic task, so you could execute 
the scripts after a few changes/deployment and get a general idea if the new 
changes has impacted into the search experience; yeah sure the changing catalog 
it's still a problem but I kind of like to be able to execute a few commands 
and presto get it done. This could become a must-run test in the test suite of 
the app. I kind of do this already but testing from the user interface, using 
the test library provided by symfony2 (framework I'm using) and the functional 
tests. It's not test-driven-search-relevancy "perse" but we ensure not to mess 
up with some basic queries we use to test the search feature.

- Original Message -
From: "Giovanni Bricconi" 
To: "solr-user" 
Cc: "Ahmet Arslan" 
Sent: Friday, April 11, 2014 5:15:56 AM
Subject: Re: Solr relevancy tuning

Hello Doug

I have just watched the quepid demonstration video, and I strongly agree
with your introduction: it is very hard to involve marketing/business
people in repeated testing session, and speadsheets or other kind of files
are not the right tool to use.
Currenlty I'm quite alone in my tuning task and having a visual approach
could be benefical for me, you are giving me many good inputs!

I see that kelvin (my scripted tool) and queepid follows the same path. In
queepid someone quickly whatches the results and applies colours to result,
in kelvin you enter one on more queries (network cable, ethernet cable) and
states that the result must contains ethernet in the title, or must come
from a list of product categories.

I also do diffs of results, before and after changes, to check what is
going on; but I have to do that in a very unix-scripted way.

Have you considered of placing a counter of total red/bad results in
quepid? I use this index to have a quick overview of changes impact across
all queries. Actually I repeat tests in production from times to time, and
if I see the "kelvin temperature" rising (the number of errors going up) I
know I have to check what's going on because new products maybe are having
a bad impact on the index.

I also keep counters of products with low quality images/no images at all
or too short listings, sometimes are useful to undestand better what will
happen if you change some bq/fq in the application.

I see also that after changes in quepid someone have to check "gray"
results and assign them a colour, in kelvin case sometimes the conditions
can do a bit of magic (new product names still contains SM-G900F) but
sometimes can introduce false errors (the new product name contains only
Galaxy 5 and not the product code SM-G900F). So some checks are needed but
with quepid everybody can do the check, with kelvin you have to change some
line of a script, and not everybody is able/willing to do that.

The idea of a static index is a good suggestion, I will try to have it in
the next round of search engine improvement.

Thank you Doug!

2014-04-09 17:48 GMT+02:00 Doug Turnbull <
dturnb...@opensourceconnections.com>:

> Hey Giovanni, nice to meet you.
>
> I'm the person that did the Test Driven Relevancy talk. We've got a product
> Quepid (http://quepid.com) that lets you gather good/bad results for
> queries and do a sort of test driven development against search relevancy.
> Sounds similar to your existing scripted approach. Have you considered
> keeping a static catalog for testing purposes? We had a project with a lot
> of updates and date-dependent relevancy. This lets you create some test
> scenarios against a static data set. However, one downside is you can't
> recreate problems in production in your test setup exactly-- you have to
> find a similar issue that reflects what you're seeing.
>
> Cheers,
> -Doug
>
>
> On Wed, Apr 9, 2014 at 10:42 AM, Giovanni Bricconi <
> giovanni.bricc...@banzai.it> wrote:
>
> > Thank you for the links.
> >
> > The book is really useful, I will definitively have to spend some time
> > reformatting the logs to to access number of result founds, session id
> and
> > much more.
> >
> > I'm also quite happy that my test cases produces similar results to the
> > precision reports shown at the beginning of the book.
> >
> > Giovanni
> >
> >
> > 2014-04-09 12:59 GMT+02:00 Ahmet Arslan :
> >
> > > Hi Giovanni,
> > >
> > > Here are some relevant pointers :
> > >
> > >
> > >
> >
> http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy
> > >
> > >
> > > http://rosenfeldmedia.com/books/search-analytics/
> > >
> > > http://www.sematext.com/

Re: Solr relevancy tuning

2014-04-11 Thread Giovanni Bricconi

Hello Doug

I have just watched the quepid demonstration video, and I strongly agree
with your introduction: it is very hard to involve marketing/business
people in repeated testing session, and speadsheets or other kind of files
are not the right tool to use.
Currenlty I'm quite alone in my tuning task and having a visual approach
could be benefical for me, you are giving me many good inputs!

I see that kelvin (my scripted tool) and queepid follows the same path. In
queepid someone quickly whatches the results and applies colours to result,
in kelvin you enter one on more queries (network cable, ethernet cable) and
states that the result must contains ethernet in the title, or must come
from a list of product categories.

I also do diffs of results, before and after changes, to check what is
going on; but I have to do that in a very unix-scripted way.

Have you considered of placing a counter of total red/bad results in
quepid? I use this index to have a quick overview of changes impact across
all queries. Actually I repeat tests in production from times to time, and
if I see the "kelvin temperature" rising (the number of errors going up) I
know I have to check what's going on because new products maybe are having
a bad impact on the index.

I also keep counters of products with low quality images/no images at all
or too short listings, sometimes are useful to undestand better what will
happen if you change some bq/fq in the application.

I see also that after changes in quepid someone have to check "gray"
results and assign them a colour, in kelvin case sometimes the conditions
can do a bit of magic (new product names still contains SM-G900F) but
sometimes can introduce false errors (the new product name contains only
Galaxy 5 and not the product code SM-G900F). So some checks are needed but
with quepid everybody can do the check, with kelvin you have to change some
line of a script, and not everybody is able/willing to do that.

The idea of a static index is a good suggestion, I will try to have it in
the next round of search engine improvement.

Thank you Doug!

2014-04-09 17:48 GMT+02:00 Doug Turnbull <
dturnb...@opensourceconnections.com>:

> Hey Giovanni, nice to meet you.
>
> I'm the person that did the Test Driven Relevancy talk. We've got a product
> Quepid (http://quepid.com) that lets you gather good/bad results for
> queries and do a sort of test driven development against search relevancy.
> Sounds similar to your existing scripted approach. Have you considered
> keeping a static catalog for testing purposes? We had a project with a lot
> of updates and date-dependent relevancy. This lets you create some test
> scenarios against a static data set. However, one downside is you can't
> recreate problems in production in your test setup exactly-- you have to
> find a similar issue that reflects what you're seeing.
>
> Cheers,
> -Doug
>
>
> On Wed, Apr 9, 2014 at 10:42 AM, Giovanni Bricconi <
> giovanni.bricc...@banzai.it> wrote:
>
> > Thank you for the links.
> >
> > The book is really useful, I will definitively have to spend some time
> > reformatting the logs to to access number of result founds, session id
> and
> > much more.
> >
> > I'm also quite happy that my test cases produces similar results to the
> > precision reports shown at the beginning of the book.
> >
> > Giovanni
> >
> >
> > 2014-04-09 12:59 GMT+02:00 Ahmet Arslan :
> >
> > > Hi Giovanni,
> > >
> > > Here are some relevant pointers :
> > >
> > >
> > >
> >
> http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy
> > >
> > >
> > > http://rosenfeldmedia.com/books/search-analytics/
> > >
> > > http://www.sematext.com/search-analytics/index.html
> > >
> > >
> > > Ahmet
> > >
> > >
> > > On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi <
> > > giovanni.bricc...@banzai.it> wrote:
> > > It is about one year I'm working on an e-commerce site, and
> > unfortunately I
> > > have no "information retrieval" background, so probably I am missing
> some
> > > important practices about relevance tuning and search engines.
> > > During this period I had to fix many "bugs" about bad search results,
> > which
> > > I have solved sometimes tuning edismax weights, sometimes creating ad
> hoc
> > > query filters or query boosting; but I am still not able to figure out
> > what
> > > should be the correct process to improve search results relevance.
> > >
> > > These are the practices I am following, I would really appreciate any
> > > comments about them and any hints about what practices you follow in
> your
> > > projects:
> > >
> > > - In order to have a measure of search quality I have written many test
> > > cases such as "if the user searches for <> the search
> > > result should display at least four <> products with the words
> > > <> and <> in the title". I have written a tool that
> > read
> > > such tests from json files and applies them to my appli

Re: Solr relevancy tuning

2014-04-09 Thread Doug Turnbull

Hey Giovanni, nice to meet you.

I'm the person that did the Test Driven Relevancy talk. We've got a product
Quepid (http://quepid.com) that lets you gather good/bad results for
queries and do a sort of test driven development against search relevancy.
Sounds similar to your existing scripted approach. Have you considered
keeping a static catalog for testing purposes? We had a project with a lot
of updates and date-dependent relevancy. This lets you create some test
scenarios against a static data set. However, one downside is you can't
recreate problems in production in your test setup exactly-- you have to
find a similar issue that reflects what you're seeing.

Cheers,
-Doug


On Wed, Apr 9, 2014 at 10:42 AM, Giovanni Bricconi <
giovanni.bricc...@banzai.it> wrote:

> Thank you for the links.
>
> The book is really useful, I will definitively have to spend some time
> reformatting the logs to to access number of result founds, session id and
> much more.
>
> I'm also quite happy that my test cases produces similar results to the
> precision reports shown at the beginning of the book.
>
> Giovanni
>
>
> 2014-04-09 12:59 GMT+02:00 Ahmet Arslan :
>
> > Hi Giovanni,
> >
> > Here are some relevant pointers :
> >
> >
> >
> http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy
> >
> >
> > http://rosenfeldmedia.com/books/search-analytics/
> >
> > http://www.sematext.com/search-analytics/index.html
> >
> >
> > Ahmet
> >
> >
> > On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi <
> > giovanni.bricc...@banzai.it> wrote:
> > It is about one year I'm working on an e-commerce site, and
> unfortunately I
> > have no "information retrieval" background, so probably I am missing some
> > important practices about relevance tuning and search engines.
> > During this period I had to fix many "bugs" about bad search results,
> which
> > I have solved sometimes tuning edismax weights, sometimes creating ad hoc
> > query filters or query boosting; but I am still not able to figure out
> what
> > should be the correct process to improve search results relevance.
> >
> > These are the practices I am following, I would really appreciate any
> > comments about them and any hints about what practices you follow in your
> > projects:
> >
> > - In order to have a measure of search quality I have written many test
> > cases such as "if the user searches for <> the search
> > result should display at least four <> products with the words
> > <> and <> in the title". I have written a tool that
> read
> > such tests from json files and applies them to my applications, and then
> > counts the number of results that does not match the criterias stated in
> > the test cases. (for those interested this tool is available at
> > https://github.com/gibri/kelvin but it is still quite a prototype)
> >
> > - I use this count as a quality index, I tried various times to change
> the
> > edismax weight to lower the whole number of error, or to add new
> > filters/boostings to the application to try to decrease the error count.
> >
> > - The pros of this is that at least you have a number to look at, and
> that
> > you have a quick way of checking the impact of a modification.
> >
> > - The bad side is that you have to maintain the test cases: now I have
> > about 800 tests and my product catalogue changes often, this implies that
> > some products exits the catalog, and some test cases cant pass anymore.
> >
> > - I am populating the test cases using errors reported from users, and I
> > feel that this is driving the test cases too much toward pathologic
> cases.
> > An more over I haven't many test for cases that are working well now.
> >
> > I would like to use search logs as drivers to generate tests, but I feel
> I
> > haven't picked the right path. Using top queries, manually reviewing
> > results, and then writing tests is a slow process; moreover many top
> > queries are ambiguous or are driven by site ads.
> >
> > Many many queries are unique per users. How to deal with these cases?
> >
> > How are you using your log to find out test cases to fix? Are you looking
> > for queries where the user is not "opening" any returned results? Which
> kpi
> > have you chosen to find out query that are not providing good results?
> And
> > what are you using as kpi for the whole search, beside the conversion
> rate?
> >
> > Can you suggest me any other practices you are using on your projects?
> >
> > Thank you very much in advance
> >
> > Giovanni
> >
> >
>



-- 
Doug Turnbull
Search & Big Data Architect
OpenSource Connections

Re: Solr relevancy tuning

2014-04-09 Thread Giovanni Bricconi

Thank you for the links.

The book is really useful, I will definitively have to spend some time
reformatting the logs to to access number of result founds, session id and
much more.

I'm also quite happy that my test cases produces similar results to the
precision reports shown at the beginning of the book.

Giovanni


2014-04-09 12:59 GMT+02:00 Ahmet Arslan :

> Hi Giovanni,
>
> Here are some relevant pointers :
>
>
> http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy
>
>
> http://rosenfeldmedia.com/books/search-analytics/
>
> http://www.sematext.com/search-analytics/index.html
>
>
> Ahmet
>
>
> On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi <
> giovanni.bricc...@banzai.it> wrote:
> It is about one year I'm working on an e-commerce site, and unfortunately I
> have no "information retrieval" background, so probably I am missing some
> important practices about relevance tuning and search engines.
> During this period I had to fix many "bugs" about bad search results, which
> I have solved sometimes tuning edismax weights, sometimes creating ad hoc
> query filters or query boosting; but I am still not able to figure out what
> should be the correct process to improve search results relevance.
>
> These are the practices I am following, I would really appreciate any
> comments about them and any hints about what practices you follow in your
> projects:
>
> - In order to have a measure of search quality I have written many test
> cases such as "if the user searches for <> the search
> result should display at least four <> products with the words
> <> and <> in the title". I have written a tool that read
> such tests from json files and applies them to my applications, and then
> counts the number of results that does not match the criterias stated in
> the test cases. (for those interested this tool is available at
> https://github.com/gibri/kelvin but it is still quite a prototype)
>
> - I use this count as a quality index, I tried various times to change the
> edismax weight to lower the whole number of error, or to add new
> filters/boostings to the application to try to decrease the error count.
>
> - The pros of this is that at least you have a number to look at, and that
> you have a quick way of checking the impact of a modification.
>
> - The bad side is that you have to maintain the test cases: now I have
> about 800 tests and my product catalogue changes often, this implies that
> some products exits the catalog, and some test cases cant pass anymore.
>
> - I am populating the test cases using errors reported from users, and I
> feel that this is driving the test cases too much toward pathologic cases.
> An more over I haven't many test for cases that are working well now.
>
> I would like to use search logs as drivers to generate tests, but I feel I
> haven't picked the right path. Using top queries, manually reviewing
> results, and then writing tests is a slow process; moreover many top
> queries are ambiguous or are driven by site ads.
>
> Many many queries are unique per users. How to deal with these cases?
>
> How are you using your log to find out test cases to fix? Are you looking
> for queries where the user is not "opening" any returned results? Which kpi
> have you chosen to find out query that are not providing good results? And
> what are you using as kpi for the whole search, beside the conversion rate?
>
> Can you suggest me any other practices you are using on your projects?
>
> Thank you very much in advance
>
> Giovanni
>
>

Re: Solr relevancy tuning

2014-04-09 Thread Ahmet Arslan

Hi Giovanni,

Here are some relevant pointers : 

http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy
 

http://rosenfeldmedia.com/books/search-analytics/ 

http://www.sematext.com/search-analytics/index.html 


Ahmet


On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi 
 wrote:
It is about one year I'm working on an e-commerce site, and unfortunately I
have no "information retrieval" background, so probably I am missing some
important practices about relevance tuning and search engines.
During this period I had to fix many "bugs" about bad search results, which
I have solved sometimes tuning edismax weights, sometimes creating ad hoc
query filters or query boosting; but I am still not able to figure out what
should be the correct process to improve search results relevance.

These are the practices I am following, I would really appreciate any
comments about them and any hints about what practices you follow in your
projects:

- In order to have a measure of search quality I have written many test
cases such as "if the user searches for <> the search
result should display at least four <> products with the words
<> and <> in the title". I have written a tool that read
such tests from json files and applies them to my applications, and then
counts the number of results that does not match the criterias stated in
the test cases. (for those interested this tool is available at
https://github.com/gibri/kelvin but it is still quite a prototype)

- I use this count as a quality index, I tried various times to change the
edismax weight to lower the whole number of error, or to add new
filters/boostings to the application to try to decrease the error count.

- The pros of this is that at least you have a number to look at, and that
you have a quick way of checking the impact of a modification.

- The bad side is that you have to maintain the test cases: now I have
about 800 tests and my product catalogue changes often, this implies that
some products exits the catalog, and some test cases cant pass anymore.

- I am populating the test cases using errors reported from users, and I
feel that this is driving the test cases too much toward pathologic cases.
An more over I haven't many test for cases that are working well now.

I would like to use search logs as drivers to generate tests, but I feel I
haven't picked the right path. Using top queries, manually reviewing
results, and then writing tests is a slow process; moreover many top
queries are ambiguous or are driven by site ads.

Many many queries are unique per users. How to deal with these cases?

How are you using your log to find out test cases to fix? Are you looking
for queries where the user is not "opening" any returned results? Which kpi
have you chosen to find out query that are not providing good results? And
what are you using as kpi for the whole search, beside the conversion rate?

Can you suggest me any other practices you are using on your projects?

Thank you very much in advance

Giovanni

Solr relevancy tuning

2014-04-09 Thread Giovanni Bricconi

It is about one year I'm working on an e-commerce site, and unfortunately I
have no "information retrieval" background, so probably I am missing some
important practices about relevance tuning and search engines.
During this period I had to fix many "bugs" about bad search results, which
I have solved sometimes tuning edismax weights, sometimes creating ad hoc
query filters or query boosting; but I am still not able to figure out what
should be the correct process to improve search results relevance.

These are the practices I am following, I would really appreciate any
comments about them and any hints about what practices you follow in your
projects:

- In order to have a measure of search quality I have written many test
cases such as "if the user searches for <> the search
result should display at least four <> products with the words
<> and <> in the title". I have written a tool that read
such tests from json files and applies them to my applications, and then
counts the number of results that does not match the criterias stated in
the test cases. (for those interested this tool is available at
https://github.com/gibri/kelvin but it is still quite a prototype)

- I use this count as a quality index, I tried various times to change the
edismax weight to lower the whole number of error, or to add new
filters/boostings to the application to try to decrease the error count.

- The pros of this is that at least you have a number to look at, and that
you have a quick way of checking the impact of a modification.

- The bad side is that you have to maintain the test cases: now I have
about 800 tests and my product catalogue changes often, this implies that
some products exits the catalog, and some test cases cant pass anymore.

- I am populating the test cases using errors reported from users, and I
feel that this is driving the test cases too much toward pathologic cases.
An more over I haven't many test for cases that are working well now.

I would like to use search logs as drivers to generate tests, but I feel I
haven't picked the right path. Using top queries, manually reviewing
results, and then writing tests is a slow process; moreover many top
queries are ambiguous or are driven by site ads.

Many many queries are unique per users. How to deal with these cases?

How are you using your log to find out test cases to fix? Are you looking
for queries where the user is not "opening" any returned results? Which kpi
have you chosen to find out query that are not providing good results? And
what are you using as kpi for the whole search, beside the conversion rate?

Can you suggest me any other practices you are using on your projects?

Thank you very much in advance

Giovanni

Re: Solr relevancy tuning

Re: Solr relevancy tuning

Re: Solr relevancy tuning

Re: Solr relevancy tuning

Re: Solr relevancy tuning

Re: Solr relevancy tuning

Solr relevancy tuning

7 matches

Site Navigation

Mail list logo

Footer information