Re: Solutions for Multi-word Synonyms

2016-06-24 Thread Joe Lawson
I rounded up some of the discussion here:
http://opensourceconnections.com/blog/2016/06/23/solr-multi-word-synonym-solutions-2016/

Also my colleage pointed me to another project Querqy,
https://github.com/renekrie/querqy which "is a framework for query
preprocessing in Java-based search engines. It comes with a powerful,
rule-based preprocessor named 'Common Rules Preprocessor', which provides
query-time synonyms, query-dependent boosting and down-ranking, and
query-dependent filters. While the Common Rules
Preprocessor is not specific to any search engine, Querqy provides a plugin
to run it within the Solr search engine."

On Fri, Jun 10, 2016 at 2:25 AM, Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> As Doug said,
> you should really try to build your own solution for Multi-word Synonyms
> because every need is different and you can customize it for your special
> use case, like adding a Thesaurus.
>
>
> http://www.ub.uni-bielefeld.de/~befehl/base/solr/InsideBase_eurovocThesaurus.html
>
> Regards
> Bernd
>
> Am 09.06.2016 um 17:06 schrieb Doug Turnbull:
> > Mary Jo,
> >
> > Honestly half the time I run into this problem, I end up creating a
> > QParserPlugin because I need to do something specific. With a
> QParserPlugin
> > I can run whatever analysis, slicing and dicing of the query string to
> > manually construct whatever I need to
> >
> >
> http://www.supermind.org/blog/1134/custom-solr-queryparsers-for-fun-and-profit
> >
> > One thing I often do is repeat the functionality of Elasticsearch's match
> > query. Elasticsearch's match query does the following:
> >
> > - Analyze the query string using the field's query-time analyzer
> > - Create an OR query with the tokens that come out of the analysis
> >
> > You can look at the field query parser as something of a starting point
> for
> > this.
> >
> > I usually do this in the context of a boost query, not as the main
> edismax
> > query.
> >
> > If I have time, this is something I've been meaning to open source.
> >
> > Best
> > -Doug
> >
> > On Tue, Jun 7, 2016 at 2:51 PM Joe Lawson <
> jlaw...@opensourceconnections.com>
> > wrote:
> >
> >> I'm sorry I wasn't more specific, I meant we were hijacking the thread
> with
> >> the question, "Anyone used a different method of
> >> handling multi-term synonyms that isn't as global?" as the original
> thread
> >> was about getting synonym_edismax running.
> >>
> >> On Tue, Jun 7, 2016 at 2:24 PM, MaryJo Sminkey 
> >> wrote:
> >>
>  MaryJo you might want to start a new thread, I think we kinda hijacked
> >>> this
>  one. Also if you are interested in tuning queries check out
>  http://splainer.io/ and https://www.quepid.com which are interactive
> >>> tools
>  (both of which my company makes) to tune for search relevancy.
> 
> >>>
> >>>
> >>> Okay I changed the subject. But I don't need a tuning tool, I already
> >> know
> >>> WHY I'm not getting the results I need, the problem is how to fix it or
> >> get
> >>> around what the plugin is doing. Which is why I was inquiring if people
> >>> have had success with something other than this particularly plugin for
> >>> more advanced queries that it messes around with. It seems to do a good
> >> job
> >>> if you aren't doing anything particularly complicated with your search
> >>> logic, but I don't see a good way to solve the issue I'm having, and a
> >>> tuning tool isn't really going to help with that. We were pretty happy
> >> with
> >>> our search relevancy for the most part *other* than the problem with
> the
> >>> multi-term synonyms not working reliably but I definitely can't lose
> >>> relevancy that we had just to get those working.
> >>>
> >>> In reviewing your tools previously, the problem as I recall is that
> they
> >>> rely on querying Solr directly, while our searches go through multiple
> >>> levels of an application which includes a lot of additional logic in
> >> terms
> >>> of what the data that gets sent to Solr are, so they just aren't going
> to
> >>> be much use for us. It was easier for me to just write my own tool that
> >>> essentially does the same kind of thing, but with my application logic
> >>> built in.
> >>>
> >>> Mary Jo
> >>>
> >>
> >
>
> --
> *
> Bernd FehlingBielefeld University Library
> Dipl.-Inform. (FH)LibTec - Library Technology
> Universitätsstr. 25  and Knowledge Management
> 33615 Bielefeld
> Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de
>
> BASE - Bielefeld Academic Search Engine - www.base-search.net
> *
>


Re: Solutions for Multi-word Synonyms

2016-06-10 Thread Bernd Fehling
As Doug said,
you should really try to build your own solution for Multi-word Synonyms
because every need is different and you can customize it for your special
use case, like adding a Thesaurus.

http://www.ub.uni-bielefeld.de/~befehl/base/solr/InsideBase_eurovocThesaurus.html

Regards
Bernd

Am 09.06.2016 um 17:06 schrieb Doug Turnbull:
> Mary Jo,
> 
> Honestly half the time I run into this problem, I end up creating a
> QParserPlugin because I need to do something specific. With a QParserPlugin
> I can run whatever analysis, slicing and dicing of the query string to
> manually construct whatever I need to
> 
> http://www.supermind.org/blog/1134/custom-solr-queryparsers-for-fun-and-profit
> 
> One thing I often do is repeat the functionality of Elasticsearch's match
> query. Elasticsearch's match query does the following:
> 
> - Analyze the query string using the field's query-time analyzer
> - Create an OR query with the tokens that come out of the analysis
> 
> You can look at the field query parser as something of a starting point for
> this.
> 
> I usually do this in the context of a boost query, not as the main edismax
> query.
> 
> If I have time, this is something I've been meaning to open source.
> 
> Best
> -Doug
> 
> On Tue, Jun 7, 2016 at 2:51 PM Joe Lawson 
> wrote:
> 
>> I'm sorry I wasn't more specific, I meant we were hijacking the thread with
>> the question, "Anyone used a different method of
>> handling multi-term synonyms that isn't as global?" as the original thread
>> was about getting synonym_edismax running.
>>
>> On Tue, Jun 7, 2016 at 2:24 PM, MaryJo Sminkey 
>> wrote:
>>
 MaryJo you might want to start a new thread, I think we kinda hijacked
>>> this
 one. Also if you are interested in tuning queries check out
 http://splainer.io/ and https://www.quepid.com which are interactive
>>> tools
 (both of which my company makes) to tune for search relevancy.

>>>
>>>
>>> Okay I changed the subject. But I don't need a tuning tool, I already
>> know
>>> WHY I'm not getting the results I need, the problem is how to fix it or
>> get
>>> around what the plugin is doing. Which is why I was inquiring if people
>>> have had success with something other than this particularly plugin for
>>> more advanced queries that it messes around with. It seems to do a good
>> job
>>> if you aren't doing anything particularly complicated with your search
>>> logic, but I don't see a good way to solve the issue I'm having, and a
>>> tuning tool isn't really going to help with that. We were pretty happy
>> with
>>> our search relevancy for the most part *other* than the problem with the
>>> multi-term synonyms not working reliably but I definitely can't lose
>>> relevancy that we had just to get those working.
>>>
>>> In reviewing your tools previously, the problem as I recall is that they
>>> rely on querying Solr directly, while our searches go through multiple
>>> levels of an application which includes a lot of additional logic in
>> terms
>>> of what the data that gets sent to Solr are, so they just aren't going to
>>> be much use for us. It was easier for me to just write my own tool that
>>> essentially does the same kind of thing, but with my application logic
>>> built in.
>>>
>>> Mary Jo
>>>
>>
> 

-- 
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Solutions for Multi-word Synonyms

2016-06-09 Thread MaryJo Sminkey
Thanks, added my vote (which threw an error but looks like it did get
added).

MJ



On Thu, Jun 9, 2016 at 5:41 PM, Upayavira  wrote:

> Here's a recently created ticket that covers this issue:
>
> https://issues.apache.org/jira/browse/SOLR-9185
>
> Let's hope we see some traction on it soon, as many people suffer from
> this issue.
>
> Upayavira
>
> On Thu, 9 Jun 2016, at 09:10 PM, MaryJo Sminkey wrote:
> > On Thu, Jun 9, 2016 at 1:50 PM, Joe Lawson <
> > jlaw...@opensourceconnections.com> wrote:
> >
> > > The auth-phrasing-token (APT) filter is a two pronged solution that
> > > requires index and query time processes versus hon-lucene-synonyms
> (HLS)
> > > which is strictly a query time implementation. The primary take away
> from
> > > that is, APT requires reindexing your data when you update the
> autophrases
> > > and synonyms while HLS does not.
> > >
> >
> >
> > Yup, understood about the indexing, that is not a big issue for us as we
> > rarely change the synonym list and re-index frequently.
> >
> > MJ
> >
> >
> > Sent with MailTrack
> > <
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> >
>


Re: Solutions for Multi-word Synonyms

2016-06-09 Thread Upayavira
Here's a recently created ticket that covers this issue:

https://issues.apache.org/jira/browse/SOLR-9185

Let's hope we see some traction on it soon, as many people suffer from
this issue.

Upayavira

On Thu, 9 Jun 2016, at 09:10 PM, MaryJo Sminkey wrote:
> On Thu, Jun 9, 2016 at 1:50 PM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
> 
> > The auth-phrasing-token (APT) filter is a two pronged solution that
> > requires index and query time processes versus hon-lucene-synonyms (HLS)
> > which is strictly a query time implementation. The primary take away from
> > that is, APT requires reindexing your data when you update the autophrases
> > and synonyms while HLS does not.
> >
> 
> 
> Yup, understood about the indexing, that is not a big issue for us as we
> rarely change the synonym list and re-index frequently.
> 
> MJ
> 
> 
> Sent with MailTrack
> 


Re: Solutions for Multi-word Synonyms

2016-06-09 Thread MaryJo Sminkey
On Thu, Jun 9, 2016 at 1:50 PM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> The auth-phrasing-token (APT) filter is a two pronged solution that
> requires index and query time processes versus hon-lucene-synonyms (HLS)
> which is strictly a query time implementation. The primary take away from
> that is, APT requires reindexing your data when you update the autophrases
> and synonyms while HLS does not.
>


Yup, understood about the indexing, that is not a big issue for us as we
rarely change the synonym list and re-index frequently.

MJ


Sent with MailTrack



Re: Solutions for Multi-word Synonyms

2016-06-09 Thread Joe Lawson
>
> I'm wondering if anyone has experience using the autophrasing solution on
> the Lucidworks blog:
>
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
>
The auth-phrasing-token (APT) filter is a two pronged solution that
requires index and query time processes versus hon-lucene-synonyms (HLS)
which is strictly a query time implementation. The primary take away from
that is, APT requires reindexing your data when you update the autophrases
and synonyms while HLS does not.

APT is more precise while HLS is more flexible.

-Joe


Re: Solutions for Multi-word Synonyms

2016-06-09 Thread MaryJo Sminkey
On Thu, Jun 9, 2016 at 11:06 AM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Honestly half the time I run into this problem, I end up creating a
> QParserPlugin because I need to do something specific. With a QParserPlugin
> I can run whatever analysis, slicing and dicing of the query string to
> manually construct whatever I need to
>
>
> http://www.supermind.org/blog/1134/custom-solr-queryparsers-for-fun-and-profit
>
> One thing I often do is repeat the functionality of Elasticsearch's match
> query. Elasticsearch's match query does the following:
>


Thanks Doug... I was surprised at the lack of response on this as it seems
like it would be a lot more common issue. Looking over that page though, I
am not sure I would be able to figure out how to do that kind of custom
query parser on my own, without something fairly similar in respect to
adding synonym support to work from. I'm just a lowly self-taught web
developer after all, not a java programmer or someone with a lot of
experience writing source code, etc.

We did consider switching to ElasticSearch due to its support out of the
box for multi-term synonyms, but that would be a lot of work, and I'm not
sure it can support everything else we are doing, like all the nested
facets and grouping, etc. and it would take a fair amount of work to
convert everything we have to the point of finding that out.

I'm wondering if anyone has experience using the autophrasing solution on
the Lucidworks blog:

https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

I know I tried this one as well some months ago and couldn't seem to get it
to work but it's probably the one I'll be trying next and hopefully can
figure it out this time. Since it works as a filter, it should work better
for us in terms of being able to apply it selectively only to certain
fields.


Sent with MailTrack



Re: Solutions for Multi-word Synonyms

2016-06-09 Thread Doug Turnbull
Mary Jo,

Honestly half the time I run into this problem, I end up creating a
QParserPlugin because I need to do something specific. With a QParserPlugin
I can run whatever analysis, slicing and dicing of the query string to
manually construct whatever I need to

http://www.supermind.org/blog/1134/custom-solr-queryparsers-for-fun-and-profit

One thing I often do is repeat the functionality of Elasticsearch's match
query. Elasticsearch's match query does the following:

- Analyze the query string using the field's query-time analyzer
- Create an OR query with the tokens that come out of the analysis

You can look at the field query parser as something of a starting point for
this.

I usually do this in the context of a boost query, not as the main edismax
query.

If I have time, this is something I've been meaning to open source.

Best
-Doug

On Tue, Jun 7, 2016 at 2:51 PM Joe Lawson 
wrote:

> I'm sorry I wasn't more specific, I meant we were hijacking the thread with
> the question, "Anyone used a different method of
> handling multi-term synonyms that isn't as global?" as the original thread
> was about getting synonym_edismax running.
>
> On Tue, Jun 7, 2016 at 2:24 PM, MaryJo Sminkey 
> wrote:
>
> > > MaryJo you might want to start a new thread, I think we kinda hijacked
> > this
> > > one. Also if you are interested in tuning queries check out
> > > http://splainer.io/ and https://www.quepid.com which are interactive
> > tools
> > > (both of which my company makes) to tune for search relevancy.
> > >
> >
> >
> > Okay I changed the subject. But I don't need a tuning tool, I already
> know
> > WHY I'm not getting the results I need, the problem is how to fix it or
> get
> > around what the plugin is doing. Which is why I was inquiring if people
> > have had success with something other than this particularly plugin for
> > more advanced queries that it messes around with. It seems to do a good
> job
> > if you aren't doing anything particularly complicated with your search
> > logic, but I don't see a good way to solve the issue I'm having, and a
> > tuning tool isn't really going to help with that. We were pretty happy
> with
> > our search relevancy for the most part *other* than the problem with the
> > multi-term synonyms not working reliably but I definitely can't lose
> > relevancy that we had just to get those working.
> >
> > In reviewing your tools previously, the problem as I recall is that they
> > rely on querying Solr directly, while our searches go through multiple
> > levels of an application which includes a lot of additional logic in
> terms
> > of what the data that gets sent to Solr are, so they just aren't going to
> > be much use for us. It was easier for me to just write my own tool that
> > essentially does the same kind of thing, but with my application logic
> > built in.
> >
> > Mary Jo
> >
>


Re: Solutions for Multi-word Synonyms

2016-06-07 Thread Joe Lawson
I'm sorry I wasn't more specific, I meant we were hijacking the thread with
the question, "Anyone used a different method of
handling multi-term synonyms that isn't as global?" as the original thread
was about getting synonym_edismax running.

On Tue, Jun 7, 2016 at 2:24 PM, MaryJo Sminkey  wrote:

> > MaryJo you might want to start a new thread, I think we kinda hijacked
> this
> > one. Also if you are interested in tuning queries check out
> > http://splainer.io/ and https://www.quepid.com which are interactive
> tools
> > (both of which my company makes) to tune for search relevancy.
> >
>
>
> Okay I changed the subject. But I don't need a tuning tool, I already know
> WHY I'm not getting the results I need, the problem is how to fix it or get
> around what the plugin is doing. Which is why I was inquiring if people
> have had success with something other than this particularly plugin for
> more advanced queries that it messes around with. It seems to do a good job
> if you aren't doing anything particularly complicated with your search
> logic, but I don't see a good way to solve the issue I'm having, and a
> tuning tool isn't really going to help with that. We were pretty happy with
> our search relevancy for the most part *other* than the problem with the
> multi-term synonyms not working reliably but I definitely can't lose
> relevancy that we had just to get those working.
>
> In reviewing your tools previously, the problem as I recall is that they
> rely on querying Solr directly, while our searches go through multiple
> levels of an application which includes a lot of additional logic in terms
> of what the data that gets sent to Solr are, so they just aren't going to
> be much use for us. It was easier for me to just write my own tool that
> essentially does the same kind of thing, but with my application logic
> built in.
>
> Mary Jo
>