"When I kept size as Integer.MAX_VALUE, it caused all the problems"
Are you trying to return up to 2 billion documents at once? Even if that number was only 1 million, you will face problems. Or did I perhaps misunderstand you? Are you sorting the documents based on the score (the default)? Lucene/Elasticsearch would need to keep all the values in memory in order to start them, causing memory problems. In general, Lucene is not effective at deep pagination. Use scan/scroll: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html -- Ivan On Sat, Aug 23, 2014 at 6:46 AM, Narendra Yadala <narendra.yad...@gmail.com> wrote: > Hi Jörg, > > This query > { > "query" : { > "bool": { > "must": { > "match" : { "body" : "big" } > }, > "must_not": { > "match" : { "body" : "data" } > }, > "must": { > "match" : {"id": 521} > } > } > } > } > > and this query are performing exactly same > { > "query" : { > "bool": { > "must": { > "match" : { "body" : "big" } > }, > "must_not": { > "match" : { "body" : "data" } > } > } > }, > "filter" : { > "term" : { "id" : "521" } > } > } > > I am not able understand what makes a filtered query fast. Is there any > place where I can find documentation on the internals of how different > queries are processed by elasticsearch. > > On Saturday, 23 August 2014 18:20:23 UTC+5:30, Jörg Prante wrote: > >> Before firing queries, you should consider if the index design and query >> choice is optimal. >> >> Numeric range queries are not straightforward. They were a major issue on >> inverted index engines like Lucene/Elasticsearch and it has taken some time >> to introduce efficient implementations. See e.g. >> https://issues.apache.org/jira/browse/LUCENE-1673 >> >> ES tries to compensate the downsides of massive numeric range queries by >> loading all the field values into memory. To achieve effective queries, you >> have to carefully discretize the values you index. >> >> For example, a few hundred millions of different timestamps, with >> millisecond resolution, are a real burden for searching on inverted >> indices. A good discretization strategy for indexing is to reduce the total >> amount of values in such field to a few hundred or thousands. For >> timestamps, this means, indexing time-based series data in discrete >> intervals of days, hours, minutes, maybe seconds is much more efficient >> than e.g. millisecond resolution. >> >> Another topic is to use filters for boolean queries. They are much faster. >> >> Jörg >> >> >> >> On Sat, Aug 23, 2014 at 2:19 PM, Narendra Yadala <narendr...@gmail.com> >> wrote: >> >>> Hi Ivan, >>> >>> Thanks for the input about aggregating on strings, I do that, but those >>> queries take time but they do not crash node. >>> >>> The queries which caused problem were pretty straightforward queries >>> (such as a boolean query with two musts, one must is equal match and other >>> a range match on long) but the real problem was with the size. When I kept >>> size as Integer.MAX_VALUE, it caused all the problems. When I removed it, >>> it started working fine. I think it is worth mentioning somewhere about >>> this strange behavior (probably expected but strange). >>> >>> I did double up on the RAM though and now I have allocated 5*10G RAM to >>> the cluster. Things are looking ok as of now, except that the aggregations >>> (on strings) are quite slow. May be I would run these aggregations as batch >>> and cache the outputs in a different type and move on for now. >>> >>> Thanks >>> NY >>> >>> >>> On Fri, Aug 22, 2014 at 10:34 PM, Ivan Brusic <iv...@brusic.com> wrote: >>> >>>> How expensive are your queries? Are you using aggregations or sorting >>>> on string fields that could use up your field data cache? Are you using the >>>> defaults for the cache? Post the current usage. >>>> >>>> If you post an example query and mapping, perhaps the community can >>>> help optimize it. >>>> >>>> Cheers, >>>> >>>> Ivan >>>> >>>> >>>> On Fri, Aug 22, 2014 at 12:28 AM, Narendra Yadala < >>>> narendr...@gmail.com> wrote: >>>> >>>>> I have a cluster of size 240 GB including replica and it has 5 nodes >>>>> in it. I allocated 5 GB RAM (total 5*5 GB) to each node and started the >>>>> cluster. When I start continuously firing queries on the cluster the GC >>>>> starts kicking in and eventually node goes down because of OutOfMemory >>>>> exception. I add upto 200k documents everyday. The indexing part works >>>>> fine >>>>> but querying part is causing trouble. I have the cluster on ec2 and I use >>>>> ec2 discovery mode. >>>>> >>>>> What is ideal RAM size and are there any other parameters I need to >>>>> tune to get this cluster going? >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to elasticsearc...@googlegroups.com. >>>>> >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/5b659d11-d757-4f8e-b347-60b3807c2dfe% >>>>> 40googlegroups.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/5b659d11-d757-4f8e-b347-60b3807c2dfe%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "elasticsearch" group. >>>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>>> topic/elasticsearch/DdPD8MiquYQ/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> elasticsearc...@googlegroups.com. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/CALY%3DcQDQ9GTt%3Dcf1s1sXy57UMNB- >>>> 0MNgNgCWEQOLooXDX7yNUA%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDQ9GTt%3Dcf1s1sXy57UMNB-0MNgNgCWEQOLooXDX7yNUA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/CAOpeyMHfTmW06iSrximhD2F% >>> 2BxdeV2KhRy6AppO_JrcMgwXy2MA%40mail.gmail.com >>> <https://groups.google.com/d/msgid/elasticsearch/CAOpeyMHfTmW06iSrximhD2F%2BxdeV2KhRy6AppO_JrcMgwXy2MA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/4cafd135-eb98-490c-bb75-84010a92c778%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/4cafd135-eb98-490c-bb75-84010a92c778%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB95stJ%3DOhuBJSGM9%3DgpNsnrykb4kAwhpSDbvA8OT%3Ds4g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.