But I can do aggregation on 'banner' field on both cluster. Is that because values of 'banner' are not so unique compared to 'ip' field
2014-04-02 16:27 GMT+08:00 Adrien Grand <adrien.gr...@elasticsearch.com>: > Given your description of the problem, I think the issue is that your > Elasticsearch cluster doesn't have enough memory to load field data for the > ip field (which needs to be done for all documents, not only those that > match your query). So you either need to give more nodes to your cluster, > more memory to your nodes, or use doc values for your ip field[1] (the > latter option requires reindexing). > > [1] > http://www.elasticsearch.org/blog/disk-based-field-data-a-k-a-doc-values/ > > > On Wed, Apr 2, 2014 at 10:09 AM, <vir.ca...@gmail.com> wrote: > >> The smaller index have 1 million lines of data. They are the lines >> filtered by "prefix":{"ip":"100.1"} from the bigger one. >> >> 在 2014年4月2日星期三UTC+8下午4时04分27秒,vir....@gmail.com写道: >> >>> I do an *aggregation* search on my index(*6 nodes*). There are about *200 >>> million lines* of data(port scanning). Each line is same* like this >>> :**{"ip":"85.18.68.5", >>> "banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.* >>> So you can image I have these data sort into different type by port they >>> are scanning. Now, I want to know who open a lot of ports at the same time. >>> So, I choose to do aggregation on IP field, and I get an OOM error that may >>> be reasonable because of most of them open only one port so that there are >>> too many buckets? I guess. >>> >>> >>> And then, I use aggregation filter. >>> >>> { >>> "aggs":{ >>> "just_name1":{ >>> "filter":{ >>> "prefix":{ >>> "ip":"100.1" >>> } >>> }, >>> "aggs":{ >>> "just_name2":{ >>> "terms":{ >>> "field":"ip", >>> "execution_hint":"map" >>> } >>> } >>> } >>> } >>> } >>> }(yes, my ip field is set as string) >>> >>> I think this time, I could make ES narrow down the set for aggregation. But >>> I still get an OOM error. While It works on a smaller index(another >>> cluster, one node). Why would this happen? After filtering, 2 cluster >>> should have an equal-volume set. Why the bigger one failed? >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/d384bea8-4a60-4521-aa0e-34bb2fd61ec5%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/d384bea8-4a60-4521-aa0e-34bb2fd61ec5%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Adrien Grand > > -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/cf6dpcV7G3w/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6kOx7RXmBzU9wfhesUYiz-2Qx8mrZStb_rCGdQv%2BpqNQ%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6kOx7RXmBzU9wfhesUYiz-2Qx8mrZStb_rCGdQv%2BpqNQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp1%3DtwM3KJ1QYvsKGcXi4bDfjwDF-bRviSsYX6jUBEg6w5qgQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.