Howto: extract (misspelled?) keywords from a string

berco Sat, 02 May 2015 08:54:58 -0700

I have a large collection of strings that each contain information about a 
certain product. For example:


wine Bardolo red 1L 12b 12% 
La Tulipe, 13* box 3 bottles, 2005
Great Johnny Walker 7CL 22% red label
Wisky Jonny Walken .7 Red limited editon

The number of product names is limited, as are most other properties, but 
they might be misspelled. 

I would like to extract keywords from all those strings. Product name, 
product type, volume, etc. But I'm not sure what the best approach would be 
and if ElasticSearch would be the tool of  choice. I've looked at 
PostgreSQL's trigram plugin (pg_tgrm) since all data sits in a PostgreSQL 
db at the moment, but that seems limited. I was thinking about creating 
some kind of master list of proper keyword and try to match words from a 
string with those keywords. These words could be misspelled meaning they 
would have to be:

1. fuzzy matched
2. matched by hand
3. match by some sort of neural network trained with existing data

Someone suggested "analyzing the entire string as an ngram using the ngram 
tokenizer", but I'm not sure. Any pointers where I should direct my effort 
would be highly appreciated!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ebe7e730-7488-425e-af77-975d5679d0dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Howto: extract (misspelled?) keywords from a string

Reply via email to