[ 
https://issues.apache.org/jira/browse/SOLR-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523482#comment-14523482
 ] 

Timothy Potter commented on SOLR-6878:
--------------------------------------

I started going through this patch and I have some questions about how to 
support the "equivalent" synonyms feature for managed synonyms.

NOTE: I'm using the term "equivalent" synonyms based on the doc here:
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Specifically, here are a couple of issues I see with supporting equivalent 
synonyms lists at the managed API level:

1) The default value for expand is true (in the patch), but what if the user 
changes it to false after already having added equivalent synonym lists? Or 
vice-versa. What do we do about existing equivalent mappings? We could store 
the equivalent lists in a separate data structure and then apply the correct 
behavior depending on the expand flag when the managed data is "viewed", i.e. 
either a GET request from the API or when updating the data used to initialize 
the underlying SynonymMap. This is similar to what we do with ignoreCase, 
however the ignoreCase was easily handled but I think allowing expand to be 
changed by the API is much more complicated.

Of course we could punt on this issue altogether and just make the expand flag 
immutable, i.e. you can set it initially to true or false, but cannot change it 
with the API. If we make it immutable, then we can apply the mapping on update 
and not have to maintain any additional data structures to remember the raw 
state of equiv lists.

2) Let's say we allow users to send in equivalent synonym lists to the API, 
such as:

{code}
curl -v -X PUT \
  -H 'Content-type:application/json' \
  --data-binary '["funny","entertaining","whimsical","jocular"]' \
  'http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english'
{code}

If expand is true, then you end up with the following mappings (pardon the Java 
code syntax as I didn't want to clean that up for this example):
{code}
    assertJQ(endpoint + "/funny",
        "/funny==['entertaining','jocular','whimiscal']");
    assertJQ(endpoint + "/entertaining",
        "/entertaining==['funny','jocular','whimiscal']");
    assertJQ(endpoint + "/jocular",
        "/jocular==['entertaining','funny','whimiscal']");
    assertJQ(endpoint + "/whimiscal",
        "/whimiscal==['entertaining','funny','jocular']");
{code}

What should the API do if the user then decides to update the specific mappings 
for "funny" by sending in a request such as:

{code}
curl -v -X PUT \
  -H 'Content-type:application/json' \
  --data-binary '{"funny":["hilarious"]}' \
  'http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english'
{code}

Does the API treat explicit mappings as having precedence over equivalent 
lists? Or does it fail with some weird error most users won't understand? Seems 
to get complicated pretty fast ...

I didn't go too far down the path of implementing this so there are probably 
more questions that will come up. To reiterate my original design assumption 
for managed synonyms, the API was not intended for humans to interact with 
directly, rather there should be some sort of UI layer on top of this API that 
translates synonym mappings into low-level API calls. For me, it's much more 
clear to send in explicit mappings for each synonym than it is to send some 
flat list and then interpret that list differently based on some flag.

The only advantage I can see is if the synonym list is huge, then expanding 
that out in the request makes the request larger. Other than that are there 
other use cases that require this expand functionality that cannot be achieved 
with the current implementation? If so, we need to decide if expand should be 
immutable and what the API should do if an explicit mapping is received for a 
term that is already used in an equivalent synonym list. [~Soolek] your 
thoughts on this?

> solr.ManagedSynonymFilterFactory all-to-all synonym switch (aka. expand)
> ------------------------------------------------------------------------
>
>                 Key: SOLR-6878
>                 URL: https://issues.apache.org/jira/browse/SOLR-6878
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 4.10.2
>            Reporter: Tomasz Sulkowski
>            Assignee: Timothy Potter
>              Labels: ManagedSynonymFilterFactory, REST, SOLR
>         Attachments: SOLR-6878.patch
>
>
> Hi,
> After switching from SynonymFilterFactory to ManagedSynonymFilterFactory I 
> have found out that there is no way to set an all-to-all synonyms relation. 
> Basically (judgind from google search) there is a need for "expand" 
> functionality switch (known from SynonymFilterFactory) which will treat all 
> synonyms with its keyword as equal.
> For example: if we define a "car":["wagen","ride"] relation it would 
> translate a query that includes one of the synonyms or keyword to "car or 
> wagen or ride" independently of which word was used from those three.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to