Also, if I understand correctly, there are negative implications when sorting over a column that has been analyzed - in our case, to remove stop-words.
Since the total cardinality of our sort field exceeds the heap available, we can't sort a single users documents when using stop word analysis since doc_values do not support analyzed fields. It seems like we'll have to preprocess the field to remove stop-words? On Sunday, March 15, 2015 at 7:01:21 PM UTC-7, Lindsey Poole wrote: > > Well, we have a field that is supporting a backward compatibility use > case. Clients are executing a partial match query on this field, so we used > the keyword tokenizer instead of not_analyzed. Since this is supporting > legacy functionality, the clients cannot be updated to change the > expectation that a partial match will return results. > > I can modify the schema and re-index so that we aggregate and sort over a > not_analyzed subfield instead, while executing any queries on the parent > field, but I wanted to verify that there is no other way to filter out > terms prior to loading them into the fielddata cache. > > The kind of filtering I'm looking for would be something like, "only > consider terms in field1 from documents where field2=valueA". > > -Lindsey > > On Sunday, March 15, 2015 at 4:43:56 PM UTC-7, Jörg Prante wrote: >> >> I mean, I do not understand what you mean by "I'm caught up on the >> advice to use doc_values where possible, but we have a use case where we do >> light analysis on a particular set of fields in our document" - what >> exactly prevents you from doc values? >> >> Jörg >> >> On Mon, Mar 16, 2015 at 12:41 AM, joerg...@gmail.com <joerg...@gmail.com> >> wrote: >> >>> Have you considered doc values? >>> >>> >>> http://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html >>> >>> Jörg >>> >>> On Sun, Mar 15, 2015 at 11:11 PM, Lindsey Poole <lpo...@gmail.com> >>> wrote: >>> >>>> Hey guys, >>>> >>>> I have a question about the mechanics of aggregation and sorting w.r.t. >>>> the fielddata cache. I know this has been covered in some detail >>>> previously, and I'm caught up on the advice to use doc_values where >>>> possible, but we have a use case where we do light analysis on a >>>> particular >>>> set of fields in our document, but also allow sorting on those fields. >>>> >>>> While we'll probably modify our schema to solve the issue, I was first >>>> wondering whether it is possible to filter the set of documents that ES >>>> aggregates / sorts over *before* pulling them into the fielddata cache? We >>>> have extremely high cardinality fields, but very selective queries, and it >>>> seems very inefficient to pull multiple gigabytes into the fielddata cache >>>> to select relatively few matching documents. >>>> >>>> Thanks, >>>> >>>> Lindsey >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elasticsearch/e32cf7c3-e2b3-48e9-bc7c-d7f2e0016835%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/elasticsearch/e32cf7c3-e2b3-48e9-bc7c-d7f2e0016835%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8027c84c-dd00-490e-a845-7fb0bb2f6107%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.