On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz <jeffrey.steinm...@gmail.com>
wrote:

Is there a better way to do this?
>
> Please see this gist (or even better yet, run the script locally see the
> issue).
>
> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>
> You must have scripting enabled in your elasticsearch config for this to
> work.
>
> This was originally based on some comments I found here:
>
> http://stackoverflow.com/questions/17314123/search-by-size-of-object-type-field-elastic-search
>
> We would like to use a filtered query to only include documents that a
> small count of items in the list [aka array], filtering where
>  values.size() < 10
>
> "script": "doc['titles'].values.size() < 10"
>
> Turns out the values.size() actually either counts tokenized (analyzed)
> words, or if the mapping turns off analysis, it still counts incorrectly if
> there are duplicates.
> If analyze is not turned off, it counts tokenized words, not the number of
> elements in the list.
> If analyze is turned off for a given field, it improves, but duplicates
> are missed.
>
> For example, This comes back as size == 2
> "titles": ["one", "duplicate", "duplicate"]
> This comes back as size == 3, should be 4
> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "http://bit.ly/def";,
> "http://bit.ly/ghi";]
>
> Is this a bug, is there a better way, or is this just something that we
> don't understand about groovy and values.size()?
>
>
>
I think that's just the way doc[] works.  Try (but don't actually deploy)
_source['titles'].size() < 10.  That should do what you expect.  Don't
deploy that because its too slow.  Try indexing the size and filtering on
it.  You can use a transform to add the size of the array as an integer
field and just filter on it using a range filter.  That'd probably be the
fastest option.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2d-KtOdV13trjnp3si_7%2B%2BAnOd%2BTTeTN75jkBuMsywyQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to