Re: edismax pf2 and ps

2010-08-30 Thread Ron Mayer
Short summary:

 * Multiple simultaneous phrase boosts with different ps2 parameters
   are working very nicely for me on a few million doc QA system.

 * I've submitted an updated patch to Jira incorporating feedback
   from the jira comments.   Will be testing it more this week.
   https://issues.apache.org/jira/browse/SOLR-2058

On 2010-08-19 Ron Mayer wrote:
> Chris Hostetter wrote:
>> [Yonik Seeley wrote]
>> : Perhaps fold it into the pf/pf2 syntax?
>> : pf=text~1^2  // proposed syntax...
>>
>> Big +1 to this idea ... 
> ...
> I added a ticket here: https://issues.apache.org/jira/browse/SOLR-2058
> and attached my patch to that ticket.

Just wanted to comment that has been working extremely well for me; with
multiple simultaneous phrase boosts with different slops at the same time.

I also cleaned up the patch based on comments in Jira and submitted a
newer version.

In particular, I find if I use the following:
* a high boost(500) on pf  with slop of 0
* a moderate boost (50) on pf  with a slop of 50
* a moderate boost (50) on pf2 with a slop of 0
* a low boost (10)  on pf2 with a slop of 10
it's doing a great job of getting the most relevant document
in the #1 spot (thanks to the slop=0 boosts), and a very good
job at getting the entire first page of results filled with
highly relevant documents (thanks to the shingles and more
liberal phrase-slop boosts).


I'm even having some luck with a whole bunch of those clauses like
the following that has a variety of phrase slops on a variety
of fields:
http://app2.fli:28983/solr/core0/select?pf=source_doc~1^500+text_stem~1^500+source_doc~50^50+text_stem~20^50&defType=edismax&hl.maxAnalyzedChars=50&q.alt=*%3A*&ps=1&qt=fliqs&pf2=text_stem^50+text_stem~10^10+text_unstem~10^10&start=0&q=red+baseball+cap+black+leather+jacket&mm=100%25&debugQuery=on&fl=id,score
which seems to be returning quickly enough on my collection of 4 million 
documents:


0
287
−

100%
text_stem^50 text_stem~10^10 text_unstem~10^10
*:*
50
edismax
−

source_doc~1^500 text_stem~1^500 source_doc~50^50 text_stem~20^50

on
id,score
0
red baseball cap black leather jacket
fliqs
1


...

+((DisjunctionMaxQuery((text_stem:red^0.5)~0.01)
   DisjunctionMaxQuery((text_stem:basebal^0.5)~0.01)
   DisjunctionMaxQuery((text_stem:cap^0.5)~0.01)
   DisjunctionMaxQuery((text_stem:black^0.5)~0.01)
   DisjunctionMaxQuery((text_stem:leather^0.5)~0.01)
   DisjunctionMaxQuery((text_stem:jacket^0.5)~0.01)
  )~6)
   DisjunctionMaxQuery((source_doc:"red baseball cap black leather 
jacket"~50^50.0)~0.01)
   DisjunctionMaxQuery((source_doc:"red baseball cap black leather 
jacket"~1^500.0 | text_stem:"red basebal cap black leather 
jacket"~1^500.0)~0.01)
   DisjunctionMaxQuery((text_stem:"red basebal cap black leather 
jacket"~20^50.0)~0.01)
  (DisjunctionMaxQuery((text_stem:"red basebal"~1^50.0)~0.01)
   DisjunctionMaxQuery((text_stem:"basebal cap"~1^50.0)~0.01)
   DisjunctionMaxQuery((text_stem:"cap black"~1^50.0)~0.01)
   DisjunctionMaxQuery((text_stem:"black leather"~1^50.0)~0.01)
   DisjunctionMaxQuery((text_stem:"leather jacket"~1^50.0)~0.01)
  )
  (DisjunctionMaxQuery((text_unstem:"red baseball"~10^10.0 | text_stem:"red 
basebal"~10^10.0)~0.01)
   DisjunctionMaxQuery((text_unstem:"baseball cap"~10^10.0 | text_stem:"basebal 
cap"~10^10.0)~0.01)
   DisjunctionMaxQuery((text_unstem:"cap black"~10^10.0 | text_stem:"cap 
black"~10^10.0)~0.01)
   DisjunctionMaxQuery((text_unstem:"black leather"~10^10.0 | text_stem:"black 
leather"~10^10.0)~0.01)
   DisjunctionMaxQuery((text_unstem:"leather jacket"~10^10.0 | 
text_stem:"leather jacket"~10^10.0)~0.01)
  )





Re: edismax pf2 and ps

2010-08-19 Thread Ron Mayer
Chris Hostetter wrote:
> : Perhaps fold it into the pf/pf2 syntax?
> : 
> : pf=text^2// current syntax... makes phrases with a boost of 2
> : pf=text~1^2  // proposed syntax... makes phrases with a slop of 1 and
> : a boost of 2
> : 
> : That actually seems pretty natural given the lucene query syntax - an
> : actual boosted sloppy phrase query already looks like
> : text:"foo bar"~1^2
> 
> Big +1 to this idea ... the existing "ps" param can stick arround as the 
> default for any field that doesn't specify it's own slop in the pf/pf2/pf3 
> fields using the "~" syntax.

I think I have a decent first draft of a patch that implements this.

Hopefully I'm figuring out the right way to submit patches to this community.
I added a ticket here: https://issues.apache.org/jira/browse/SOLR-2058
and attached my patch to that ticket.   Any feedback, either on the patch
or on how best to submit things to this community would be appreciated.


This patch seems to happily turn a query like
  
http://localhost:8983/solr/select?defType=edismax&fl=id,text,score&q=enterprise+search+foobar&ps=5&qf=text&debugQuery=true&pf2=name~0^&pf2=name^12+name~10
into what I believe is the desired parsed query:

+((text:enterpris) (text:search) (text:foobar))
 ((name:"enterprise search"~5^12.0) (name:"search foobar"~5^12.0))
 ((name:"enterprise search"^.0) (name:"search foobar"^.0))
 ((name:"enterprise search"~10) (name:"search foobar"~10))

which looks like it should give a high boost to docs where both words
appear right next to each other, but still substantial boosts to docs
where the pairs of words are a few words apart.


I'll start testing it with real data today.


One question:

* Where might I find documentation and/or test cases for the pf2, pf3
  parameters? I quick grep of the sources from the tree I got from
  git://git.apache.org/lucene-solr.git
  didn't reveal any obvious docs or tests with those parameters.
  $ git grep pf2 | grep -v 'Binary file'
  solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java:
   U.parseFieldBoostsAndSlop(solrParams.getParams("pf2"));


Am I on the right track?


   Ron


Re: edismax pf2 and ps

2010-08-16 Thread Chris Hostetter

: Perhaps fold it into the pf/pf2 syntax?
: 
: pf=text^2// current syntax... makes phrases with a boost of 2
: pf=text~1^2  // proposed syntax... makes phrases with a slop of 1 and
: a boost of 2
: 
: That actually seems pretty natural given the lucene query syntax - an
: actual boosted sloppy phrase query already looks like
: text:"foo bar"~1^2

Big +1 to this idea ... the existing "ps" param can stick arround as the 
default for any field that doesn't specify it's own slop in the pf/pf2/pf3 
fields using the "~" syntax.


-Hoss



Re: edismax pf2 and ps

2010-08-13 Thread Yonik Seeley
On Fri, Aug 13, 2010 at 2:38 PM, Ron Mayer  wrote:
> Yonik Seeley wrote:
>> Perhaps a ps2 parameter to match pf2?
>
> That might be nice.
>
> I could try to put together such a patch if people were interested.
>
> One more thing I've been contemplating is if my results might
> be even better if I had a couple different "pf2"s with different "ps"'s
> at the same time.
>
> In particular.   One with ps=0 to put a high boost on ones the have
> the right ordering of words.  For example insuring that:
>  "red hat black jacket"
> boosts only red hats and not black hats.
>
> And another pf2 with a more modest boost with ps=5 or so to handle
> the query above also boosting docs with "red baseball hat".
>
>
> Not sure of a good way to express that in config options, tho.

Perhaps fold it into the pf/pf2 syntax?

pf=text^2// current syntax... makes phrases with a boost of 2
pf=text~1^2  // proposed syntax... makes phrases with a slop of 1 and
a boost of 2

That actually seems pretty natural given the lucene query syntax - an
actual boosted sloppy phrase query already looks like
text:"foo bar"~1^2

-Yonik
http://www.lucidimagination.com


Re: edismax pf2 and ps

2010-08-13 Thread Ron Mayer
Yonik Seeley wrote:
> Perhaps a ps2 parameter to match pf2?

That might be nice.

I could try to put together such a patch if people were interested.

One more thing I've been contemplating is if my results might
be even better if I had a couple different "pf2"s with different "ps"'s
at the same time.

In particular.   One with ps=0 to put a high boost on ones the have
the right ordering of words.  For example insuring that:
  "red hat black jacket"
boosts only red hats and not black hats.

And another pf2 with a more modest boost with ps=5 or so to handle
the query above also boosting docs with "red baseball hat".


Not sure of a good way to express that in config options, tho.



> -Yonik
> http://www.lucidimagination.com
> 
> On Fri, Aug 13, 2010 at 2:11 PM, Ron Mayer  wrote:
>> Jayendra Patil wrote:
>>> We pretty much had the same issue, ended up customizing the ExtendedDismax
>>> code.
>>>
>>> In your case its just a change of a single line
>>> addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
>>>  tiebreaker, pslop);
>>> to
>>> addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
>>>  tiebreaker, 0);
>> Thanks!!!  Indeed it seems to be providing better results for me (at first
>> glance on a test system).
>>
>> Is there any way of lobbying to make this change in the official releases?
>>
>>
>>> On Thu, Aug 12, 2010 at 1:04 PM, Ron Mayer  wrote:
 Short summary:

   Is there any way I can specify that I want a lot
   of phrase slop for the "pf" parameter, but none
   at all for the "pf2" parameter?

 I find the 'pf' parameter with a pretty large 'ps' to do a very
 nice job for providing a modest boost to many documents that are
 quite well related to many queries in my system.

 In contrast, I find the 'pf2' parameter with zero 'ps' does
 extremely well at providing a high boost to documents that
 are often exactly what someone's searching for.

 Is there any way I can get both effects?

 Edismax's pf2 parameter is really nice for boosting exact [sub]phrases
 in queries like 'black jacket red cap white shoes'.   But as soon
 as even a little phrase slop (ps) is added, it seems like it starts
 boosting documents with red jackets and white caps just as much as
 those with black jackets and red caps.

 My gut feeling is that if I could have "pf" with a large phrase
 slop and the pf2 with zero phrase slop, it'd give me better overall
 results than any single phrase slop setting that gets applied to both.

 Is there any good way for me to test that?

  Thanks,
   Ron


>>



Re: edismax pf2 and ps

2010-08-13 Thread Yonik Seeley
Perhaps a ps2 parameter to match pf2?

-Yonik
http://www.lucidimagination.com

On Fri, Aug 13, 2010 at 2:11 PM, Ron Mayer  wrote:
> Jayendra Patil wrote:
>> We pretty much had the same issue, ended up customizing the ExtendedDismax
>> code.
>>
>> In your case its just a change of a single line
>>         addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
>>  tiebreaker, pslop);
>> to
>>         addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
>>  tiebreaker, 0);
>
> Thanks!!!  Indeed it seems to be providing better results for me (at first
> glance on a test system).
>
> Is there any way of lobbying to make this change in the official releases?
>
>
>> On Thu, Aug 12, 2010 at 1:04 PM, Ron Mayer  wrote:
>>> Short summary:
>>>
>>>   Is there any way I can specify that I want a lot
>>>   of phrase slop for the "pf" parameter, but none
>>>   at all for the "pf2" parameter?
>>>
>>> I find the 'pf' parameter with a pretty large 'ps' to do a very
>>> nice job for providing a modest boost to many documents that are
>>> quite well related to many queries in my system.
>>>
>>> In contrast, I find the 'pf2' parameter with zero 'ps' does
>>> extremely well at providing a high boost to documents that
>>> are often exactly what someone's searching for.
>>>
>>> Is there any way I can get both effects?
>>>
>>> Edismax's pf2 parameter is really nice for boosting exact [sub]phrases
>>> in queries like 'black jacket red cap white shoes'.   But as soon
>>> as even a little phrase slop (ps) is added, it seems like it starts
>>> boosting documents with red jackets and white caps just as much as
>>> those with black jackets and red caps.
>>>
>>> My gut feeling is that if I could have "pf" with a large phrase
>>> slop and the pf2 with zero phrase slop, it'd give me better overall
>>> results than any single phrase slop setting that gets applied to both.
>>>
>>> Is there any good way for me to test that?
>>>
>>>  Thanks,
>>>   Ron
>>>
>>>
>>
>
>


Re: edismax pf2 and ps

2010-08-13 Thread Ron Mayer
Jayendra Patil wrote:
> We pretty much had the same issue, ended up customizing the ExtendedDismax
> code.
> 
> In your case its just a change of a single line
> addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
>  tiebreaker, pslop);
> to
> addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
>  tiebreaker, 0);

Thanks!!!  Indeed it seems to be providing better results for me (at first
glance on a test system).

Is there any way of lobbying to make this change in the official releases?


> On Thu, Aug 12, 2010 at 1:04 PM, Ron Mayer  wrote:
>> Short summary:
>>
>>   Is there any way I can specify that I want a lot
>>   of phrase slop for the "pf" parameter, but none
>>   at all for the "pf2" parameter?
>>
>> I find the 'pf' parameter with a pretty large 'ps' to do a very
>> nice job for providing a modest boost to many documents that are
>> quite well related to many queries in my system.
>>
>> In contrast, I find the 'pf2' parameter with zero 'ps' does
>> extremely well at providing a high boost to documents that
>> are often exactly what someone's searching for.
>>
>> Is there any way I can get both effects?
>>
>> Edismax's pf2 parameter is really nice for boosting exact [sub]phrases
>> in queries like 'black jacket red cap white shoes'.   But as soon
>> as even a little phrase slop (ps) is added, it seems like it starts
>> boosting documents with red jackets and white caps just as much as
>> those with black jackets and red caps.
>>
>> My gut feeling is that if I could have "pf" with a large phrase
>> slop and the pf2 with zero phrase slop, it'd give me better overall
>> results than any single phrase slop setting that gets applied to both.
>>
>> Is there any good way for me to test that?
>>
>>  Thanks,
>>   Ron
>>
>>
> 



Re: edismax pf2 and ps

2010-08-12 Thread Jayendra Patil
We pretty much had the same issue, ended up customizing the ExtendedDismax
code.

In your case its just a change of a single line
addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
 tiebreaker, pslop);
to
addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
 tiebreaker, 0);

Regards,
Jayendra


On Thu, Aug 12, 2010 at 1:04 PM, Ron Mayer  wrote:

> Short summary:
>
>   Is there any way I can specify that I want a lot
>   of phrase slop for the "pf" parameter, but none
>   at all for the "pf2" parameter?
>
> I find the 'pf' parameter with a pretty large 'ps' to do a very
> nice job for providing a modest boost to many documents that are
> quite well related to many queries in my system.
>
> In contrast, I find the 'pf2' parameter with zero 'ps' does
> extremely well at providing a high boost to documents that
> are often exactly what someone's searching for.
>
> Is there any way I can get both effects?
>
> Edismax's pf2 parameter is really nice for boosting exact phrases
> in queries like 'black jacket red cap white shoes'.   But as soon
> as even a little phrase slop (ps) is added, it seems like it starts
> boosting documents with red jackets and white caps just as much as
> those with black jackets and red caps.
>
> My gut feeling is that if I could have "pf" with a large phrase
> slop and the pf2 with zero phrase slop, it'd give me better overall
> results than any single phrase slop setting that gets applied to both.
>
> Is there any good way for me to test that?
>
>  Thanks,
>   Ron
>
>