Also, if I understand correctly, there are negative implications when 
sorting over a column that has been analyzed - in our case, to remove 
stop-words.

Since the total cardinality of our sort field exceeds the heap available, 
we can't sort a single users documents when using stop word analysis since 
doc_values do not support analyzed fields.

It seems like we'll have to preprocess the field to remove stop-words?

On Sunday, March 15, 2015 at 7:01:21 PM UTC-7, Lindsey Poole wrote:
>
> Well, we have a field that is supporting a backward compatibility use 
> case. Clients are executing a partial match query on this field, so we used 
> the keyword tokenizer instead of not_analyzed. Since this is supporting 
> legacy functionality, the clients cannot be updated to change the 
> expectation that a partial match will return results.
>
> I can modify the schema and re-index so that we aggregate and sort over a 
> not_analyzed subfield instead, while executing any queries on the parent 
> field, but I wanted to verify that there is no other way to filter out 
> terms prior to loading them into the fielddata cache.
>
> The kind of filtering I'm looking for would be something like, "only 
> consider terms in field1 from documents where field2=valueA".
>
> -Lindsey
>
> On Sunday, March 15, 2015 at 4:43:56 PM UTC-7, Jörg Prante wrote:
>>
>> I mean, I do not understand what you mean by "I'm caught up on the 
>> advice to use doc_values where possible, but we have a use case where we do 
>> light analysis on a particular set of fields in our document" - what 
>> exactly prevents you from doc values?
>>
>> Jörg
>>
>> On Mon, Mar 16, 2015 at 12:41 AM, joerg...@gmail.com <joerg...@gmail.com> 
>> wrote:
>>
>>> Have you considered doc values?
>>>
>>>
>>> http://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html
>>>
>>> Jörg
>>>
>>> On Sun, Mar 15, 2015 at 11:11 PM, Lindsey Poole <lpo...@gmail.com> 
>>> wrote:
>>>
>>>> Hey guys,
>>>>
>>>> I have a question about the mechanics of aggregation and sorting w.r.t. 
>>>> the fielddata cache. I know this has been covered in some detail 
>>>> previously, and I'm caught up on the advice to use doc_values where 
>>>> possible, but we have a use case where we do light analysis on a 
>>>> particular 
>>>> set of fields in our document, but also allow sorting on those fields.
>>>>
>>>> While we'll probably modify our schema to solve the issue, I was first 
>>>> wondering whether it is possible to filter the set of documents that ES 
>>>> aggregates / sorts over *before* pulling them into the fielddata cache? We 
>>>> have extremely high cardinality fields, but very selective queries, and it 
>>>> seems very inefficient to pull multiple gigabytes into the fielddata cache 
>>>> to select relatively few matching documents.
>>>>
>>>> Thanks,
>>>>
>>>> Lindsey
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elasticsearch/e32cf7c3-e2b3-48e9-bc7c-d7f2e0016835%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/elasticsearch/e32cf7c3-e2b3-48e9-bc7c-d7f2e0016835%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8027c84c-dd00-490e-a845-7fb0bb2f6107%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to