[jira] [Commented] (SOLR-7136) Add an AutoPhrasing TokenFilter

Koorosh Vakhshoori (JIRA) Fri, 20 Nov 2015 09:47:24 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018409#comment-15018409
 ]


Koorosh Vakhshoori commented on SOLR-7136:
------------------------------------------

As far as the memory leaks issue, I looked at my version and identified couple 
of areas that it could cause problems: 1) in AutoPhrasingQParserPluging I 
updated the code so all resources associated with AutoPhrasingTokenFilter 
instance is released by calling end() and close(), 2) when it came to 
phraseSets, I make sure it is populated once, in the earlier version every time 
the filter was instantiated the constructor would repopulated it. However, in 
some cases you want that, I have a version of constructor that force-populate 
the phraseSets! I call it in the ManagedAutophraseFilterFactory class.

As far as performance/scaling, no I have not done any formal evaluation. All I 
can tell, we have it running in our QA and people who have tested it are 
satisfied with the speed. However, our speed requires are in seconds and not 
milliseconds. I love to hear the result of your A/B testing.

On the acronym topic, you hit the nail on the head. This falls under 
personalized or context search. In our use case, our content is collections of 
different corpus, i.e. carpi. This means different users depending on their 
specialty want to see different results for the same query. This is a tough nut 
to crack. Glad to hear you would be looking into this issue.


> Add an AutoPhrasing TokenFilter
> -------------------------------
>
>                 Key: SOLR-7136
>                 URL: https://issues.apache.org/jira/browse/SOLR-7136
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ted Sullivan
>         Attachments: AutoPhaseFiniteStateDiagram.pdf, SOLR-7136.patch, 
> SOLR-7136.patch, SOLR-7136.patch, SOLR-7136.patch
>
>
> Adds an 'autophrasing' token filter which is designed to enable noun phrases 
> that represent a single entity to be tokenized in a singular fashion. Adds 
> support for ManagedResources and Query parser auto-phrasing support given 
> LUCENE-2605.
> The rationale for this Token Filter and its use in solving the long standing 
> multi-term synonym problem in Lucene Solr has been documented online. 
> http://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/
> https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7136) Add an AutoPhrasing TokenFilter

Reply via email to