[
https://issues.apache.org/jira/browse/SOLR-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018409#comment-15018409
]
Koorosh Vakhshoori commented on SOLR-7136:
------------------------------------------
As far as the memory leaks issue, I looked at my version and identified couple
of areas that it could cause problems: 1) in AutoPhrasingQParserPluging I
updated the code so all resources associated with AutoPhrasingTokenFilter
instance is released by calling end() and close(), 2) when it came to
phraseSets, I make sure it is populated once, in the earlier version every time
the filter was instantiated the constructor would repopulated it. However, in
some cases you want that, I have a version of constructor that force-populate
the phraseSets! I call it in the ManagedAutophraseFilterFactory class.
As far as performance/scaling, no I have not done any formal evaluation. All I
can tell, we have it running in our QA and people who have tested it are
satisfied with the speed. However, our speed requires are in seconds and not
milliseconds. I love to hear the result of your A/B testing.
On the acronym topic, you hit the nail on the head. This falls under
personalized or context search. In our use case, our content is collections of
different corpus, i.e. carpi. This means different users depending on their
specialty want to see different results for the same query. This is a tough nut
to crack. Glad to hear you would be looking into this issue.
> Add an AutoPhrasing TokenFilter
> -------------------------------
>
> Key: SOLR-7136
> URL: https://issues.apache.org/jira/browse/SOLR-7136
> Project: Solr
> Issue Type: New Feature
> Reporter: Ted Sullivan
> Attachments: AutoPhaseFiniteStateDiagram.pdf, SOLR-7136.patch,
> SOLR-7136.patch, SOLR-7136.patch, SOLR-7136.patch
>
>
> Adds an 'autophrasing' token filter which is designed to enable noun phrases
> that represent a single entity to be tokenized in a singular fashion. Adds
> support for ManagedResources and Query parser auto-phrasing support given
> LUCENE-2605.
> The rationale for this Token Filter and its use in solving the long standing
> multi-term synonym problem in Lucene Solr has been documented online.
> http://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/
> https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]