You should sort over doc values (recommended, it will be the default in
next ES version). Sorting over not_analyzed / keyword analyzed fields is
old school.

Doc values for analyzed strings make not much sense in my opinion and lead
to unwanted results. If you use multifield, then you do not have to worry
because you can set up both doc values and analyzed field.

Example:

https://gist.github.com/jprante/da2980446108b5c112a8

> The kind of filtering I'm looking for would be something like, "only
consider terms in field1 from documents where field2=valueA".

This always needs a complete load of all values of a field into the field
cache, with an inverted index. There is no lunch for free. And that's why
doc values (columnar style) were invented, to avoid this field cache
loading, for example for high cardinality fields.

Jörg

On Mon, Mar 16, 2015 at 3:17 AM, Lindsey Poole <lpo...@gmail.com> wrote:

> Also, if I understand correctly, there are negative implications when
> sorting over a column that has been analyzed - in our case, to remove
> stop-words.
>
> Since the total cardinality of our sort field exceeds the heap available,
> we can't sort a single users documents when using stop word analysis since
> doc_values do not support analyzed fields.
>
> It seems like we'll have to preprocess the field to remove stop-words?
>
> On Sunday, March 15, 2015 at 7:01:21 PM UTC-7, Lindsey Poole wrote:
>>
>> Well, we have a field that is supporting a backward compatibility use
>> case. Clients are executing a partial match query on this field, so we used
>> the keyword tokenizer instead of not_analyzed. Since this is supporting
>> legacy functionality, the clients cannot be updated to change the
>> expectation that a partial match will return results.
>>
>> I can modify the schema and re-index so that we aggregate and sort over a
>> not_analyzed subfield instead, while executing any queries on the parent
>> field, but I wanted to verify that there is no other way to filter out
>> terms prior to loading them into the fielddata cache.
>>
>> The kind of filtering I'm looking for would be something like, "only
>> consider terms in field1 from documents where field2=valueA".
>>
>> -Lindsey
>>
>> On Sunday, March 15, 2015 at 4:43:56 PM UTC-7, Jörg Prante wrote:
>>>
>>> I mean, I do not understand what you mean by "I'm caught up on the
>>> advice to use doc_values where possible, but we have a use case where we do
>>> light analysis on a particular set of fields in our document" - what
>>> exactly prevents you from doc values?
>>>
>>> Jörg
>>>
>>> On Mon, Mar 16, 2015 at 12:41 AM, joerg...@gmail.com <joerg...@gmail.com
>>> > wrote:
>>>
>>>> Have you considered doc values?
>>>>
>>>> http://www.elastic.co/guide/en/elasticsearch/guide/
>>>> current/doc-values.html
>>>>
>>>> Jörg
>>>>
>>>> On Sun, Mar 15, 2015 at 11:11 PM, Lindsey Poole <lpo...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey guys,
>>>>>
>>>>> I have a question about the mechanics of aggregation and sorting
>>>>> w.r.t. the fielddata cache. I know this has been covered in some detail
>>>>> previously, and I'm caught up on the advice to use doc_values where
>>>>> possible, but we have a use case where we do light analysis on a 
>>>>> particular
>>>>> set of fields in our document, but also allow sorting on those fields.
>>>>>
>>>>> While we'll probably modify our schema to solve the issue, I was first
>>>>> wondering whether it is possible to filter the set of documents that ES
>>>>> aggregates / sorts over *before* pulling them into the fielddata cache? We
>>>>> have extremely high cardinality fields, but very selective queries, and it
>>>>> seems very inefficient to pull multiple gigabytes into the fielddata cache
>>>>> to select relatively few matching documents.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Lindsey
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/e32cf7c3-e2b3-48e9-bc7c-d7f2e0016835%
>>>>> 40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/e32cf7c3-e2b3-48e9-bc7c-d7f2e0016835%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8027c84c-dd00-490e-a845-7fb0bb2f6107%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/8027c84c-dd00-490e-a845-7fb0bb2f6107%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFg_F_UsgfN7DJfxQ-D%2BMhpiN%3D5%2BZ1-eiXg48hyA12osA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to