Hey,

can you test with a more recent version of elasticsearch first? There were
some dramatic improvements regarding facetting.
Also, you should explain your setup a bit more. Facetting can need a lot of
memory with lots of documents as it uses so-called fielddata, so you should
configure and monitor elasticsearch appropriately.

See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html#field-data


--Alex


On Wed, Dec 18, 2013 at 10:51 PM, Brian Jones <tbrianjo...@gmail.com> wrote:

> I'm using the Terms Facet with Elasticsearch V0.20.2.  The server has 8 x
> Intel Xeon E5-2680 v2 processors and 15GB of memory.
>
> My Terms Facet queries work great as long as the number of documents in
> the index is small ( eg. less than 20,000 ).  When the system hits more,
> pushing into the hundreds of thousands or millions of documents, my Terms
> Facets never return results.  Watching the server, I initially see a few
> Java processes using a lot of CPU, but within a few seconds, this is
> reduced to a half dozen processes each using ~2% cpu.  I never see memory
> usage increase on the server as a result of these queries.  When these
> queries fail to return results, they also sometimes seem to "freeze"
> Elasticsearch and I often have to restart the ES server or even reboot the
> physical server to get ES back online for other simple queries.
>
> The fields I'm trying to facet exist for nearly every document and can
> have anywhere from 0 to hundreds of different values across the dataset.
>  All values are text strings and I'm using a custom analyzer that reduces
> them to lowercase.  I realize that increasing the number of potential
> values in a field will dramatically increase the resources needed for the
> Terms Facet Query.  In testing, I would expect some of the smaller fields
> should work fine even at scale with millions of documents.
>
>
>
> Questions:
>
> 1.) My test field ( industries ), can have no more than 32 unique values.
>  Each document could have none or all 32 values.  Each value can be from 10
> to 100 characters of text.  This Terms Facet never returns a result at
> scale.  Any thoughts on what is happening?  Is my setup flawed?
>
> 2. Will I ever be able to run a facet on a field that can have millions of
> unique text values?  I have some data analysis cases like this where I'd
> like to use Elasticsearch Facetting.
>
> 3.) Would reducing the fields I'm faceting on to integers ( and then
> translating back to text outside ES ) make a big difference in performance
> and required resources?
>
>
>
> Test Query:
>
> curl -X POST "
> http://remote_host:9200/companies/company/_search?pretty=true"; -d '
> {
>     "query" : {
>         "match_all" : {  }
>     },
>     "facets" : {
>         "industries" : {
>             "terms" : {
>                 "field" : "industries.term.keyword_lowercase",
>                 "size" : 100
>             }
>         }
>     },
>     "size" : 0
> }
> '
>
>
>
>
> Index Configuration:
>
> {
> "index" : {
> "number_of_shards" : 5,
> "number_of_replicas" : 1,
> "analysis" : {
> "analyzer" : {
> "default" : {
> "tokenizer" : "standard",
> "filter" : ["standard", "word_delimiter", "lowercase", "stop"]
> },
> "html_strip" : {
> "tokenizer" : "standard",
> "filter" : ["standard", "word_delimiter", "lowercase", "stop"],
> "char_filter" : "html_strip"
> },
> "keyword_lowercase" : {
> "tokenizer" : "keyword",
> "filter" : "lowercase"
> }
> }
> }
> }
> }
>
>
>
>
> Company Document Mapping:
>
> ** i've removed irrelevant fields
>
> {
> "company" : {
> "type" : "object",
> "include_in_all" : false,
> "path" : "full",
> "dynamic" : "strict",
> "properties" : {
> "name" : {
> "type" : "multi_field",
> "fields" : {
> "name" : { "type" : "string", "index" : "analyzed", "include_in_all" :
> "true", "boost" : 10.0 },
> "keyword_lowercase" : { "type" : "string", "index" : "analyzed",
> "analyzer" : "keyword_lowercase", "include_in_all" : "false" }
> }
> },
> "description" : { "type" : "string", "index" : "analyzed",
> "include_in_all" : "true", "boost" : 6.0 },
> "industries" : {
> "type" : "nested",
> "include_in_root" : true,
> "properties" : {
> "term" : {
> "type" : "multi_field",
> "fields" : {
> "term" : { "type" : "string", "index" : "analyzed", "include_in_all" :
> true, "boost" : 3.0 },
> "keyword_lowercase" : { "type" : "string", "index" : "analyzed",
> "analyzer" : "keyword_lowercase" }
> }
> },
> "description" : { "type" : "string", "index" : "analyzed",
> "include_in_all" : true },
> "score" : { "type" : "integer" },
> "verified" : { "type" : "boolean" }
> }
> }
> }
> }
> }
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3e608b31-8569-49d3-b9fa-20d3a1e4a597%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM80zPKeTo%3DrXEBinoatkZmX%2BbWqhx2itE4tuCBg87NEwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to