Well, we have a field that is supporting a backward compatibility use case. 
Clients are executing a partial match query on this field, so we used the 
keyword tokenizer instead of not_analyzed. Since this is supporting legacy 
functionality, the clients cannot be updated to change the expectation that 
a partial match will return results.

I can modify the schema and re-index so that we aggregate and sort over a 
not_analyzed subfield instead, while executing any queries on the parent 
field, but I wanted to verify that there is no other way to filter out 
terms prior to loading them into the fielddata cache.

The kind of filtering I'm looking for would be something like, "only 
consider terms in field1 from documents where field2=valueA".

-Lindsey

On Sunday, March 15, 2015 at 4:43:56 PM UTC-7, Jörg Prante wrote:
>
> I mean, I do not understand what you mean by "I'm caught up on the advice 
> to use doc_values where possible, but we have a use case where we do light 
> analysis on a particular set of fields in our document" - what exactly 
> prevents you from doc values?
>
> Jörg
>
> On Mon, Mar 16, 2015 at 12:41 AM, joerg...@gmail.com <javascript:> <
> joerg...@gmail.com <javascript:>> wrote:
>
>> Have you considered doc values?
>>
>> http://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html
>>
>> Jörg
>>
>> On Sun, Mar 15, 2015 at 11:11 PM, Lindsey Poole <lpo...@gmail.com 
>> <javascript:>> wrote:
>>
>>> Hey guys,
>>>
>>> I have a question about the mechanics of aggregation and sorting w.r.t. 
>>> the fielddata cache. I know this has been covered in some detail 
>>> previously, and I'm caught up on the advice to use doc_values where 
>>> possible, but we have a use case where we do light analysis on a particular 
>>> set of fields in our document, but also allow sorting on those fields.
>>>
>>> While we'll probably modify our schema to solve the issue, I was first 
>>> wondering whether it is possible to filter the set of documents that ES 
>>> aggregates / sorts over *before* pulling them into the fielddata cache? We 
>>> have extremely high cardinality fields, but very selective queries, and it 
>>> seems very inefficient to pull multiple gigabytes into the fielddata cache 
>>> to select relatively few matching documents.
>>>
>>> Thanks,
>>>
>>> Lindsey
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com <javascript:>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/e32cf7c3-e2b3-48e9-bc7c-d7f2e0016835%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/e32cf7c3-e2b3-48e9-bc7c-d7f2e0016835%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c9dc986-cfe1-42f9-ac83-d1ca40699c3d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to