Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

Nikolas Everett Thu, 08 Jan 2015 22:00:08 -0800

Transform never saves to source. You have to transform on the application
side for that. It was designed for times when you wanted to index something
like this that would just take up extra space in the source document. I
imagine you could use a script field on the query if you need the result to
contain the count. Or just count it on the result side.


Nik
On Jan 9, 2015 12:43 AM, "Jeff Steinmetz" <jeffrey.steinm...@gmail.com>
wrote:

> Transform worked well.  Nice.
>
> Curious how to get it to save to source?  Tried this below, no go.  (I can
> however do range queries agains title_count, so transform was indexed and
> works well)
>
>     "transform" : {
>       "script" : "ctx._source['\'title_count\''] =
> ctx._source['\'titles\''].size()",
>       "lang": "groovy"
>     },
>      "properties": {
>      "titles": { "type": "string", "index": "not_analyzed" },
>      "title_count" : { "type": "integer", "store": "yes" }
>    }
> }'
>
>
> On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote:
>>
>> Source is going to be pretty sloe, yeah. If its a one off then its
>> probably fine but if you do it a lot probably best to index the count.
>> On Jan 9, 2015 12:04 AM, "Jeff Steinmetz" <jeffrey....@gmail.com> wrote:
>>
>>> Thank you, that worked.
>>>
>>> I was curious about the speed, is running a script using _source slower
>>> that doc[] ?
>>>
>>> Totally understand a dynamic script is slower regardless of _source vs
>>> doc[].
>>>
>>> Makes sense that having a count transformed up front during index to
>>> create a materialized value would certainly be much faster.
>>>
>>>
>>> On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
>>>>
>>>>
>>>>
>>>> On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz <jeffrey....@gmail.com>
>>>> wrote:
>>>>
>>>> Is there a better way to do this?
>>>>>
>>>>> Please see this gist (or even better yet, run the script locally see
>>>>> the issue).
>>>>>
>>>>> https://gist.github.com/jeffsteinmetz/2ea8329c667386c80fae
>>>>>
>>>>> You must have scripting enabled in your elasticsearch config for this
>>>>> to work.
>>>>>
>>>>> This was originally based on some comments I found here:
>>>>> http://stackoverflow.com/questions/17314123/search-by-size-
>>>>> of-object-type-field-elastic-search
>>>>>
>>>>> We would like to use a filtered query to only include documents that a
>>>>> small count of items in the list [aka array], filtering where
>>>>>  values.size() < 10
>>>>>
>>>>> "script": "doc['titles'].values.size() < 10"
>>>>>
>>>>> Turns out the values.size() actually either counts tokenized
>>>>> (analyzed) words, or if the mapping turns off analysis, it still counts
>>>>> incorrectly if there are duplicates.
>>>>> If analyze is not turned off, it counts tokenized words, not the
>>>>> number of elements in the list.
>>>>> If analyze is turned off for a given field, it improves, but
>>>>> duplicates are missed.
>>>>>
>>>>> For example, This comes back as size == 2
>>>>> "titles": ["one", "duplicate", "duplicate"]
>>>>> This comes back as size == 3, should be 4
>>>>> "titles": ["http://bit.ly/abc";, "http://bit.ly/abc";, "
>>>>> http://bit.ly/def";, "http://bit.ly/ghi";]
>>>>>
>>>>> Is this a bug, is there a better way, or is this just something that
>>>>> we don't understand about groovy and values.size()?
>>>>>
>>>>>
>>>>>
>>>> I think that's just the way doc[] works.  Try (but don't actually
>>>> deploy) _source['titles'].size() < 10.  That should do what you expect.
>>>> Don't deploy that because its too slow.  Try indexing the size and
>>>> filtering on it.  You can use a transform to add the size of the array as
>>>> an integer field and just filter on it using a range filter.  That'd
>>>> probably be the fastest option.
>>>>
>>>> Nik
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Z3H3xn255yTsvSoR-dhVRa7eGJCBcugt6oSb-MU9HHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: counting items in a list [array] returns (what we think) are incorrect counts via groovy

Reply via email to