Hi all,
We've got an internal Java library that allows us to do keyword extraction
that seems like a great thing to turn into an integrated elasticsearch
function.
Ultimately, I want to be able to access the result of this library from
search results/etc, but I wanted to do a sanity check to make sure my
approach was right - or if I should be looking at doing a custom analyzer
or something instead.
Given a string field, the type would become a multi-field, {name} and
keywords/phrases as subfields. A plugin would be written to handle this
keywords field, run the strings through the library and return a list of
strings like:
"my_data":"Jack and Jill went up the hill, Jack fell down and bumped his
crown, and Jill came tumbling after."
"my_data.keywords":["Jack", "Jack fell"]
That's a trivial example, of course, and the algorithm is more complex than
the standard stopword filtering.
Ultimately, I want to be able to expose the my_data.keywords field as an
actual list like above, so that we can use it in other things like facets
down the line.
So is a custom type plugin the right way to go here, or should I be looking
at developing a more complex analyzer/tokenizer/stopword combo?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/44d3ba64-f0b3-4727-9a49-745a2167d34d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.