Re: Extracting fuzzy match terms

Graham Turner Wed, 29 Apr 2015 06:40:02 -0700

That sounds interesting - I'll have a look and see if we can pull something 
together


Cheers!

On Tuesday, 28 April 2015 21:26:12 UTC+1, ma...@elastic.co wrote:
>
> All Lucene queries implement extractTerms [1] and this API is used by 
> highlighter implementations to get the expanded set of terms in 
> wildcards/fuzzy etc.
> This set of terms isn't exposed directly in elasticsearch today but you 
> may be able to hack something together using scripts or a custom Java 
> plugin - look at SearchContext.current().query().extractTerms().
>
> Cheers
> Mark
>
> [1] 
> http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/Query.html#extractTerms(java.util.Set)
>
>
> On Tuesday, April 28, 2015 at 12:00:49 PM UTC+1, Graham Turner wrote:
>>
>> Thanks Mark.
>>
>> I did wonder about the highlighter, but using it would mean potentially 
>> retrieving every hit and parsing it, which feels pretty impractical for 
>> large searches.  
>>
>> Presumably the fuzzy query has to identify a full list of matching terms 
>> internally - is there any way we could somehow hook into this, or retrieve 
>> the list separately to the query results?  A mechanism similar to the 
>> suggester, just accepting a single fuzzy term or a wildcard term would be 
>> perfect.  I appreciate this probably isn't a common request, but I'm sure 
>> it would have other use cases.  Something to consider for a future release 
>> perhaps?  :-)
>>
>> Cheers
>>
>> Graham
>>
>>
>> On Monday, 27 April 2015 17:41:17 UTC+1, ma...@elastic.co wrote:
>>>
>>> Hi Graham,
>>> If you were to use the highlighter functionality you would essentially 
>>> "see what the search engine saw".
>>> With some client-side coding you could parse out the expanded search 
>>> terms because they would be surrounded by tags in matching docs.
>>> Of course this wouldn't provide a de-duped list of terms and would be 
>>> inefficient to return an exhaustive list of all expansions used but may be 
>>> an approach to investigate. 
>>>
>>> Cheers
>>> Mark
>>>
>>> On Monday, April 27, 2015 at 5:08:55 PM UTC+1, Graham Turner wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm working on a proof-of-concept for a client, replacing an existing 
>>>> legacy search system with an elastic based alternative.  One of the 
>>>> requirements that comes from the existing system is that, when performing 
>>>> a 
>>>> fuzzy or wildcard search, the user can view all the matching terms, and 
>>>> include/exclude them manually from the subsequent search.
>>>>
>>>> Thus, if a fuzzy search for 'graham' is submitted (or a wildcard like 
>>>> 'gr*m*'), it might match grayam, graeme, grahum, grahem, etc.  The users 
>>>> want to be able to see this list of matched terms, then, for instance, 
>>>> exclude 'grayam' from the expanded terms list, so that all the other 
>>>> expansions are used, but not the specifically excluded one. 
>>>>
>>>> I’m struggling to retrieve this list of terms in the first place.  
>>>> Ideally I’d like to submit a simple query for a fuzzy or wildcard term, 
>>>> and 
>>>> have it return just the possible matching terms (up to a given limit).
>>>>
>>>> I’ve had reasonable success using the term suggester for fuzzy-type 
>>>> responses, but can’t use this for wildcard expansions. 
>>>>
>>>> Is there a good way to do this using 'out-of-the-box' elastic 
>>>> functionality?  
>>>>
>>>> Any advice / hints gratefully accepted!
>>>>
>>>> Thanks
>>>>
>>>> Graham
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/29b83988-524f-47ff-bb3d-93f6685f58f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Extracting fuzzy match terms

Reply via email to