Hey, can you test with a more recent version of elasticsearch first? There were some dramatic improvements regarding facetting. Also, you should explain your setup a bit more. Facetting can need a lot of memory with lots of documents as it uses so-called fielddata, so you should configure and monitor elasticsearch appropriately.
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html#field-data --Alex On Wed, Dec 18, 2013 at 10:51 PM, Brian Jones <tbrianjo...@gmail.com> wrote: > I'm using the Terms Facet with Elasticsearch V0.20.2. The server has 8 x > Intel Xeon E5-2680 v2 processors and 15GB of memory. > > My Terms Facet queries work great as long as the number of documents in > the index is small ( eg. less than 20,000 ). When the system hits more, > pushing into the hundreds of thousands or millions of documents, my Terms > Facets never return results. Watching the server, I initially see a few > Java processes using a lot of CPU, but within a few seconds, this is > reduced to a half dozen processes each using ~2% cpu. I never see memory > usage increase on the server as a result of these queries. When these > queries fail to return results, they also sometimes seem to "freeze" > Elasticsearch and I often have to restart the ES server or even reboot the > physical server to get ES back online for other simple queries. > > The fields I'm trying to facet exist for nearly every document and can > have anywhere from 0 to hundreds of different values across the dataset. > All values are text strings and I'm using a custom analyzer that reduces > them to lowercase. I realize that increasing the number of potential > values in a field will dramatically increase the resources needed for the > Terms Facet Query. In testing, I would expect some of the smaller fields > should work fine even at scale with millions of documents. > > > > Questions: > > 1.) My test field ( industries ), can have no more than 32 unique values. > Each document could have none or all 32 values. Each value can be from 10 > to 100 characters of text. This Terms Facet never returns a result at > scale. Any thoughts on what is happening? Is my setup flawed? > > 2. Will I ever be able to run a facet on a field that can have millions of > unique text values? I have some data analysis cases like this where I'd > like to use Elasticsearch Facetting. > > 3.) Would reducing the fields I'm faceting on to integers ( and then > translating back to text outside ES ) make a big difference in performance > and required resources? > > > > Test Query: > > curl -X POST " > http://remote_host:9200/companies/company/_search?pretty=true" -d ' > { > "query" : { > "match_all" : { } > }, > "facets" : { > "industries" : { > "terms" : { > "field" : "industries.term.keyword_lowercase", > "size" : 100 > } > } > }, > "size" : 0 > } > ' > > > > > Index Configuration: > > { > "index" : { > "number_of_shards" : 5, > "number_of_replicas" : 1, > "analysis" : { > "analyzer" : { > "default" : { > "tokenizer" : "standard", > "filter" : ["standard", "word_delimiter", "lowercase", "stop"] > }, > "html_strip" : { > "tokenizer" : "standard", > "filter" : ["standard", "word_delimiter", "lowercase", "stop"], > "char_filter" : "html_strip" > }, > "keyword_lowercase" : { > "tokenizer" : "keyword", > "filter" : "lowercase" > } > } > } > } > } > > > > > Company Document Mapping: > > ** i've removed irrelevant fields > > { > "company" : { > "type" : "object", > "include_in_all" : false, > "path" : "full", > "dynamic" : "strict", > "properties" : { > "name" : { > "type" : "multi_field", > "fields" : { > "name" : { "type" : "string", "index" : "analyzed", "include_in_all" : > "true", "boost" : 10.0 }, > "keyword_lowercase" : { "type" : "string", "index" : "analyzed", > "analyzer" : "keyword_lowercase", "include_in_all" : "false" } > } > }, > "description" : { "type" : "string", "index" : "analyzed", > "include_in_all" : "true", "boost" : 6.0 }, > "industries" : { > "type" : "nested", > "include_in_root" : true, > "properties" : { > "term" : { > "type" : "multi_field", > "fields" : { > "term" : { "type" : "string", "index" : "analyzed", "include_in_all" : > true, "boost" : 3.0 }, > "keyword_lowercase" : { "type" : "string", "index" : "analyzed", > "analyzer" : "keyword_lowercase" } > } > }, > "description" : { "type" : "string", "index" : "analyzed", > "include_in_all" : true }, > "score" : { "type" : "integer" }, > "verified" : { "type" : "boolean" } > } > } > } > } > } > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/3e608b31-8569-49d3-b9fa-20d3a1e4a597%40googlegroups.com > . > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM80zPKeTo%3DrXEBinoatkZmX%2BbWqhx2itE4tuCBg87NEwQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.