On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz <jeffrey.steinm...@gmail.com> wrote:
Is there a better way to do this? > > Please see this gist (or even better yet, run the script locally see the > issue). > > https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae > > You must have scripting enabled in your elasticsearch config for this to > work. > > This was originally based on some comments I found here: > > http://stackoverflow.com/questions/17314123/search-by-size-of-object-type-field-elastic-search > > We would like to use a filtered query to only include documents that a > small count of items in the list [aka array], filtering where > values.size() < 10 > > "script": "doc['titles'].values.size() < 10" > > Turns out the values.size() actually either counts tokenized (analyzed) > words, or if the mapping turns off analysis, it still counts incorrectly if > there are duplicates. > If analyze is not turned off, it counts tokenized words, not the number of > elements in the list. > If analyze is turned off for a given field, it improves, but duplicates > are missed. > > For example, This comes back as size == 2 > "titles": ["one", "duplicate", "duplicate"] > This comes back as size == 3, should be 4 > "titles": ["http://bit.ly/abc", "http://bit.ly/abc", "http://bit.ly/def", > "http://bit.ly/ghi"] > > Is this a bug, is there a better way, or is this just something that we > don't understand about groovy and values.size()? > > > I think that's just the way doc[] works. Try (but don't actually deploy) _source['titles'].size() < 10. That should do what you expect. Don't deploy that because its too slow. Try indexing the size and filtering on it. You can use a transform to add the size of the array as an integer field and just filter on it using a range filter. That'd probably be the fastest option. Nik -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2d-KtOdV13trjnp3si_7%2B%2BAnOd%2BTTeTN75jkBuMsywyQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.