Re: Elasticsearch shuts down for no reason
Thanks Mark, auth.log doesnt show any login or sudo at the time of the elastic stopping... nothing else is running on that machine - it is a dedicated ES server. what i did find in the auth log is that someone is trying to hack into the system, yet i dont see how it got to do with elastic stopping? On Sunday, August 24, 2014 4:35:41 AM UTC+3, Mark Walkom wrote: Something is stopping the service. If you are on linux check the auth log, if anyone is using sudo to stop it then you will see that logged. Otherwise, what else runs on the machine? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 24 August 2014 06:15, Eitan Vesely eita...@gmail.com javascript: wrote: Hi Guys, i've installed ES a month ago and its working just fine. today, for some reason, ES just went down for no visible reason: here is what i see in the log file : [2014-08-23 16:47:11,272][DEBUG][action.search.type ] [Plunderer] [g30nm0bi2j663tgu6ud][1], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad] lastShard [true] org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]] at org.elasticsearch.search.SearchService.parseSource(SearchService.java:649) at org.elasticsearch.search.SearchService.createContext(SearchService.java:511) at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483) at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252) at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206) at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203) at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet [0]: (key) field [@timestamp] not found at org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160) at org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93) at org.elasticsearch.search.SearchService.parseSource(SearchService.java:633) ... 9 more [2014-08-23 16:47:11,273][DEBUG][action.search.type ] [Plunderer] [g30nm0bi2j663tgu6ud][0], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad] org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]] at org.elasticsearch.search.SearchService.parseSource(SearchService.java:649) at org.elasticsearch.search.SearchService.createContext(SearchService.java:511) at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483) at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252) at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206) at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203) at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet [0]: (key) field [@timestamp] not found at org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160) at org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93) at org.elasticsearch.search.SearchService.parseSource(SearchService.java:633) ... 9 more
Re: Elasticsearch shuts down for no reason
I did find the shutdown request in the syslog: Aug 23 16:49:01 medisafelog2 kernel: [3361057.489168] hv_utils: Shutdown request received - graceful shutdown initiated yet i have no idea who or what initiated it... how can i dig in? On Sunday, August 24, 2014 12:01:50 PM UTC+3, Eitan Vesely wrote: Thanks Mark, auth.log doesnt show any login or sudo at the time of the elastic stopping... nothing else is running on that machine - it is a dedicated ES server. what i did find in the auth log is that someone is trying to hack into the system, yet i dont see how it got to do with elastic stopping? On Sunday, August 24, 2014 4:35:41 AM UTC+3, Mark Walkom wrote: Something is stopping the service. If you are on linux check the auth log, if anyone is using sudo to stop it then you will see that logged. Otherwise, what else runs on the machine? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 24 August 2014 06:15, Eitan Vesely eita...@gmail.com wrote: Hi Guys, i've installed ES a month ago and its working just fine. today, for some reason, ES just went down for no visible reason: here is what i see in the log file : [2014-08-23 16:47:11,272][DEBUG][action.search.type ] [Plunderer] [g30nm0bi2j663tgu6ud][1], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad] lastShard [true] org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]] at org.elasticsearch.search.SearchService.parseSource(SearchService.java:649) at org.elasticsearch.search.SearchService.createContext(SearchService.java:511) at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483) at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252) at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206) at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203) at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet [0]: (key) field [@timestamp] not found at org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160) at org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93) at org.elasticsearch.search.SearchService.parseSource(SearchService.java:633) ... 9 more [2014-08-23 16:47:11,273][DEBUG][action.search.type ] [Plunderer] [g30nm0bi2j663tgu6ud][0], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@5531dfad] org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:user_count,interval:1h},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:*}},filter:{bool:{must:[{range:{@timestamp:{from:1407602785182,to:1408812385182}}},{range:{@timestamp:{from:1408516424602,to:1408811520255}}}],size:0}]] at org.elasticsearch.search.SearchService.parseSource(SearchService.java:649) at org.elasticsearch.search.SearchService.createContext(SearchService.java:511) at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:483) at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252) at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206) at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203) at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet [0]: (key) field [@timestamp] not found at
Re: Elasticsearch shuts down for no reason
The company which is providing the hosting service had shut down the virtual machine, hv_utils is a message from the hypervisor. This is not related to Elasticsearch at all. Jörg On Sun, Aug 24, 2014 at 11:19 AM, Mark Walkom ma...@campaignmonitor.com wrote: What version of ES are you running? Are you running on a hosting service and if so do you have a firewall protecting the host - ie it's not open to the entire internet? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 24 August 2014 19:05, Eitan Vesely eitan...@gmail.com wrote: I did find the shutdown request in the syslog: Aug 23 16:49:01 medisafelog2 kernel: [3361057.489168] hv_utils: Shutdown request received - graceful shutdown initiated yet i have no idea who or what initiated it... how can i dig in? On Sunday, August 24, 2014 12:01:50 PM UTC+3, Eitan Vesely wrote: Thanks Mark, auth.log doesnt show any login or sudo at the time of the elastic stopping... nothing else is running on that machine - it is a dedicated ES server. what i did find in the auth log is that someone is trying to hack into the system, yet i dont see how it got to do with elastic stopping? On Sunday, August 24, 2014 4:35:41 AM UTC+3, Mark Walkom wrote: Something is stopping the service. If you are on linux check the auth log, if anyone is using sudo to stop it then you will see that logged. Otherwise, what else runs on the machine? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 24 August 2014 06:15, Eitan Vesely eita...@gmail.com wrote: Hi Guys, i've installed ES a month ago and its working just fine. today, for some reason, ES just went down for no visible reason: here is what i see in the log file : [2014-08-23 16:47:11,272][DEBUG][action.search.type ] [Plunderer] [g30nm0bi2j663tgu6ud][1], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: Failed to execute [org.elasticsearch.action. search.SearchRequest@5531dfad] lastShard [true] org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{facets:{0:{date_histogram:{key_field:@ timestamp,value_field:user_count,interval:1h}, global:true,facet_filter:{fquery:{query:{filtered:{ query:{query_string:{query:*}},filter:{bool: {must:[{range:{@timestamp:{from:1407602785182,to: 1408812385182}}},{range:{@timestamp:{from:1408516424602,to: 1408811520255}}}],size:0}]] at org.elasticsearch.search.SearchService.parseSource( SearchService.java:649) at org.elasticsearch.search.SearchService.createContext( SearchService.java:511) at org.elasticsearch.search.SearchService.createAndPutContext( SearchService.java:483) at org.elasticsearch.search.SearchService.executeQueryPhase( SearchService.java:252) at org.elasticsearch.search.action.SearchServiceTransportAction$ 5.call(SearchServiceTransportAction.java:206) at org.elasticsearch.search.action.SearchServiceTransportAction$ 5.call(SearchServiceTransportAction.java:203) at org.elasticsearch.search.action.SearchServiceTransportAction$ 23.run(SearchServiceTransportAction.java:517) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet [0]: (key) field [@timestamp] not found at org.elasticsearch.search.facet.datehistogram. DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160) at org.elasticsearch.search.facet.FacetParseElement.parse( FacetParseElement.java:93) at org.elasticsearch.search.SearchService.parseSource( SearchService.java:633) ... 9 more [2014-08-23 16:47:11,273][DEBUG][action.search.type ] [Plunderer] [g30nm0bi2j663tgu6ud][0], node[Vc4xSuh1S1qQOvQdv-wD_A], [P], s[STARTED]: Failed to execute [org.elasticsearch.action. search.SearchRequest@5531dfad] org.elasticsearch.search.SearchParseException: [g30nm0bi2j663tgu6ud][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{facets:{0:{date_histogram:{key_field:@ timestamp,value_field:user_count,interval:1h}, global:true,facet_filter:{fquery:{query:{filtered:{ query:{query_string:{query:*}},filter:{bool: {must:[{range:{@timestamp:{from:1407602785182,to: 1408812385182}}},{range:{@timestamp:{from:1408516424602,to: 1408811520255}}}],size:0}]] at org.elasticsearch.search.SearchService.parseSource( SearchService.java:649) at org.elasticsearch.search.SearchService.createContext( SearchService.java:511) at org.elasticsearch.search.SearchService.createAndPutContext( SearchService.java:483) at org.elasticsearch.search.SearchService.executeQueryPhase( SearchService.java:252) at
date_histogram facet float possible overflow
HI all, I am using ELK stack to visualising our monitoring data, yesterday i came across a weird problem: ElasticSearch date_histogram facet returned floating results that look like an overflow (min : 4.604480259023595*E* 18). Our dataflow is : collectd (cpu/memory) - sends it to riemann - logstash - elasticsearch At first the values were correct, after a few days the values became huge (see attached snapshot of kibana graph) *filtered query + Result:* *query:* url -XGET 'http://localhost:9200/logstash-2014.08.24/_search?pretty' -d '{ query: { filtered: { query: { bool: { should: [ { query_string: { query: subservice.raw:\processes-cpu_percent/gauge-collectd\ AND (plugin_instance:\cpu_percent\) } } ] } }, filter: { bool: { must: [ { range: { @timestamp: { from: 1408884312966, to: 1408884612966 } } }, { range: { @timestamp: { from: 1408884311948, to: 1408884327941 } } }, { fquery: { query: { query_string: { query: subservice:(\processes-cpu_percent/gauge-collectd\) } }, _cache: false } } ] } } } }, size: 500, sort: [ { metric: { order: desc, ignore_unmapped: false } }, { @timestamp: { order: desc, ignore_unmapped: false } } ] }' *result:* { took : 47, timed_out : false, _shards : { total : 5, successful : 5, failed : 0 }, hits : { total : 2, max_score : null, hits : [ { _index : logstash-2014.08.24, _type : gauge, _id : SlzG8bGJQziU0LMoN7nrbQ, _score : null, _source:{host:host1,service: instance-2014-08-24T1106/processes-cpu_percent/gauge-collectd,state:null ,description:null,metric:0.7,tags:[collectd],time: 2014-08-24T12:45:25.000Z,ttl:20.0,type:gauge,source:host1, ds_type:gauge,plugin_instance:cpu_percent,ds_name:value, type_instance:collectd,plugin:processes,ds_index:0,@version: 1,@timestamp:2014-08-24T12:45:15.079Z}, sort : [ 4604480259023595110, 1408884325088 ] }, { _index : logstash-2014.08.24, _type : gauge, _id : 8hxToMjpQ5WQIw15DQqIGA, _score : null, _source:{host:host1,service: instance-2014-08-24T1106/processes-cpu_percent/gauge-collectd,state:null ,description:null,metric:0.5,tags:[collectd],time: 2014-08-24T12:45:15.000Z,ttl:20.0,type:gauge,source:host1, ds_type:gauge,plugin_instance:cpu_percent,ds_name:value, type_instance:collectd,plugin:processes,ds_index:0,@version: 1,@timestamp:2014-08-24T12:45:15.079Z}, sort : [ 4602678819172646912, 1408884315079 ] } ] } } *date histogram Facet + Results:query:* curl -XGET 'http://localhost:9200/logstash-2014.08.24/_search?pretty' -d '{ facets: { 0: { date_histogram: { key_field: @timestamp, value_field: metric, interval: 1s }, global: true, facet_filter: { fquery: { query: { filtered: { query: { query_string: { query: subservice.raw:\processes-cpu_percent/gauge-collectd\ AND (plugin_instance:cpu_percent) AND * } }, filter: { bool: { must: [ { range: { @timestamp: { from: 1408884199622, to: 1408884499623 } } }, { range: { @timestamp: { from: 1408884311948, to: 1408884327941 } } }, { fquery: { query: { query_string: { query: subservice:(\processes-cpu_percent/gauge-collectd\) } }, _cache: true } } ] } } } } } } } }, size: 0 }' *result:* { took : 24, timed_out : false, _shards : { total : 5, successful : 5, failed : 0 }, hits : { total : 1197141, max_score : 0.0, hits : [ ] }, facets : { 0 : { _type : date_histogram, entries : [ {
Re: Json Data not getting parsed when sent to Elasticsearch
what is your logstash configuration? did you tried the json codec http://logstash.net/docs/1.4.2/codecs/json? On Sunday, August 24, 2014 4:54:08 PM UTC+3, Didjit wrote: Hi, The following is a debug from Logstash: { message = {\EventTime\:\2014-08-24T09:44:46-0400\,\URI\:\ http://ME/rest/venue/ME/hours/2014-08-24\ ,\uri_payload\:{\value\:[{\open\:\2014-08-24T13:00:00.000+\,\close\:\2014-08-24T23:00:00.000+\,\isOpen\:true,\date\:\2014-08-24\}],\Count\:1}}\r, @version = 1, @timestamp = 2014-08-24T13:44:48.036Z, host = 127.0.0.1:60778, type = MY_Detail, EventTime = 2014-08-24T09:44:46-0400, URI = http://ME/rest/venue/ME//hours/2014-08-24;, uri_payload = { value = [ [0] { open = 2014-08-24T13:00:00.000+, close = 2014-08-24T23:00:00.000+, isOpen = true, date = 2014-08-24 } ], Count = 1, 0 = {} }, MYId = ME } ___ When i look into Elasticsearch, the fields under URI Payload are not parsed. It shows: uri_payload.value as the field with {open:2014-08-21T13:00:00.000+,close:2014-08-21T23:00:00.000+,isOpen:true,date:2014-08-21} How can I get all the parsed values as fields in elasticsearch? In my example, fields Open, Close, IsOpen. Initially I thought Logstash was not parsing all the json, but looking at the debug it is. Thank you, Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fe60df4d-cd36-43c9-a08c-7213abc2dd18%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Optimizing queries for a 5 node cluster with 250 M documents (causes OutOfMemory exceptions and GC pauses)
I ran into the same issue when using Integer.MAX_VALUE as the size parameter (migrating from a DB-based search). Perhaps someone can come up with a proper reference, I cannot, but according to a comment in this SO http://stackoverflow.com/questions/8829468/elasticsearch-query-to-return-all-records question, Elasticsearch/Lucene tries to allocate memory for that many scores. When I switched those queries to a count/search duo, things improved dramatically, as you've already noticed. On Saturday, August 23, 2014 12:17:47 PM UTC-4, Narendra Yadala wrote: I am not returning 2 billion documents :) I am returning all documents that match. Actual number can be anywhere between 0 to 50k. I am just fetching documents between a given time interval such as one hour, one day so on and then do batch processing them. I fixed this by making 2 queries, one to fetch count and other for actual data. It is mentioned in some other thread that scroll api is performance intensive so I did not go for it. On Saturday, 23 August 2014 21:32:59 UTC+5:30, Ivan Brusic wrote: When I kept size as Integer.MAX_VALUE, it caused all the problems Are you trying to return up to 2 billion documents at once? Even if that number was only 1 million, you will face problems. Or did I perhaps misunderstand you? Are you sorting the documents based on the score (the default)? Lucene/Elasticsearch would need to keep all the values in memory in order to start them, causing memory problems. In general, Lucene is not effective at deep pagination. Use scan/scroll: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html -- Ivan On Sat, Aug 23, 2014 at 6:46 AM, Narendra Yadala narendr...@gmail.com wrote: Hi Jörg, This query { query : { bool: { must: { match : { body : big } }, must_not: { match : { body : data } }, must: { match : {id: 521} } } } } and this query are performing exactly same { query : { bool: { must: { match : { body : big } }, must_not: { match : { body : data } } } }, filter : { term : { id : 521 } } } I am not able understand what makes a filtered query fast. Is there any place where I can find documentation on the internals of how different queries are processed by elasticsearch. On Saturday, 23 August 2014 18:20:23 UTC+5:30, Jörg Prante wrote: Before firing queries, you should consider if the index design and query choice is optimal. Numeric range queries are not straightforward. They were a major issue on inverted index engines like Lucene/Elasticsearch and it has taken some time to introduce efficient implementations. See e.g. https://issues.apache.org/jira/browse/LUCENE-1673 ES tries to compensate the downsides of massive numeric range queries by loading all the field values into memory. To achieve effective queries, you have to carefully discretize the values you index. For example, a few hundred millions of different timestamps, with millisecond resolution, are a real burden for searching on inverted indices. A good discretization strategy for indexing is to reduce the total amount of values in such field to a few hundred or thousands. For timestamps, this means, indexing time-based series data in discrete intervals of days, hours, minutes, maybe seconds is much more efficient than e.g. millisecond resolution. Another topic is to use filters for boolean queries. They are much faster. Jörg On Sat, Aug 23, 2014 at 2:19 PM, Narendra Yadala narendr...@gmail.com wrote: Hi Ivan, Thanks for the input about aggregating on strings, I do that, but those queries take time but they do not crash node. The queries which caused problem were pretty straightforward queries (such as a boolean query with two musts, one must is equal match and other a range match on long) but the real problem was with the size. When I kept size as Integer.MAX_VALUE, it caused all the problems. When I removed it, it started working fine. I think it is worth mentioning somewhere about this strange behavior (probably expected but strange). I did double up on the RAM though and now I have allocated 5*10G RAM to the cluster. Things are looking ok as of now, except that the aggregations (on strings) are quite slow. May be I would run these aggregations as batch and cache the outputs in a different type and move on for now. Thanks NY On Fri, Aug 22, 2014 at 10:34 PM, Ivan Brusic iv...@brusic.com wrote: How expensive are your queries? Are you using aggregations or sorting on string fields that could use up your field data cache? Are you using the defaults for the cache? Post the current usage. If you
Re: Json Data not getting parsed when sent to Elasticsearch
Pretty simple (below). . I just added to json codec and tried again and received the same results. Thank you! elasticsearch { host = localhost cluster = cjceswin node_name = cjcnode codec = json index = logstash-dwhse-%{+.MM.dd} workers = 3 } } On Sunday, August 24, 2014 10:11:44 AM UTC-4, moshe zada wrote: what is your logstash configuration? did you tried the json codec http://logstash.net/docs/1.4.2/codecs/json? On Sunday, August 24, 2014 4:54:08 PM UTC+3, Didjit wrote: Hi, The following is a debug from Logstash: { message = {\EventTime\:\2014-08-24T09:44:46-0400\,\URI\:\ http://ME/rest/venue/ME/hours/2014-08-24\ ,\uri_payload\:{\value\:[{\open\:\2014-08-24T13:00:00.000+\,\close\:\2014-08-24T23:00:00.000+\,\isOpen\:true,\date\:\2014-08-24\}],\Count\:1}}\r, @version = 1, @timestamp = 2014-08-24T13:44:48.036Z, host = 127.0.0.1:60778, type = MY_Detail, EventTime = 2014-08-24T09:44:46-0400, URI = http://ME/rest/venue/ME//hours/2014-08-24;, uri_payload = { value = [ [0] { open = 2014-08-24T13:00:00.000+, close = 2014-08-24T23:00:00.000+, isOpen = true, date = 2014-08-24 } ], Count = 1, 0 = {} }, MYId = ME } ___ When i look into Elasticsearch, the fields under URI Payload are not parsed. It shows: uri_payload.value as the field with {open:2014-08-21T13:00:00.000+,close:2014-08-21T23:00:00.000+,isOpen:true,date:2014-08-21} How can I get all the parsed values as fields in elasticsearch? In my example, fields Open, Close, IsOpen. Initially I thought Logstash was not parsing all the json, but looking at the debug it is. Thank you, Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0afd4105-a521-487a-8889-4bcabee419b6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Optimizing queries for a 5 node cluster with 250 M documents (causes OutOfMemory exceptions and GC pauses)
Exactly. Filters do not use scores. They also use bit sets which makes them reusable and fast. I wasn't talking about a filter added to a query, I mean filtered queries. This is a huge difference. This query { query : { bool: { must: { match : { body : big } }, must_not: { match : { body : data } }, must: { match : {id: 521} } } } } can be turned into this filtered query { query : { constant_score: { filter: { bool: { must: [ { match : { body : big } }, {match : {id: 521} } ], must_not: { match : { body : data } } } } } } } (plus fixing the double key must which is a potential source of errors) Jörg On Sun, Aug 24, 2014 at 4:30 PM, Jonathan Foy the...@gmail.com wrote: I ran into the same issue when using Integer.MAX_VALUE as the size parameter (migrating from a DB-based search). Perhaps someone can come up with a proper reference, I cannot, but according to a comment in this SO http://stackoverflow.com/questions/8829468/elasticsearch-query-to-return-all-records question, Elasticsearch/Lucene tries to allocate memory for that many scores. When I switched those queries to a count/search duo, things improved dramatically, as you've already noticed. On Saturday, August 23, 2014 12:17:47 PM UTC-4, Narendra Yadala wrote: I am not returning 2 billion documents :) I am returning all documents that match. Actual number can be anywhere between 0 to 50k. I am just fetching documents between a given time interval such as one hour, one day so on and then do batch processing them. I fixed this by making 2 queries, one to fetch count and other for actual data. It is mentioned in some other thread that scroll api is performance intensive so I did not go for it. On Saturday, 23 August 2014 21:32:59 UTC+5:30, Ivan Brusic wrote: When I kept size as Integer.MAX_VALUE, it caused all the problems Are you trying to return up to 2 billion documents at once? Even if that number was only 1 million, you will face problems. Or did I perhaps misunderstand you? Are you sorting the documents based on the score (the default)? Lucene/Elasticsearch would need to keep all the values in memory in order to start them, causing memory problems. In general, Lucene is not effective at deep pagination. Use scan/scroll: http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/search-request-scroll.html -- Ivan On Sat, Aug 23, 2014 at 6:46 AM, Narendra Yadala narendr...@gmail.com wrote: Hi Jörg, This query { query : { bool: { must: { match : { body : big } }, must_not: { match : { body : data } }, must: { match : {id: 521} } } } } and this query are performing exactly same { query : { bool: { must: { match : { body : big } }, must_not: { match : { body : data } } } }, filter : { term : { id : 521 } } } I am not able understand what makes a filtered query fast. Is there any place where I can find documentation on the internals of how different queries are processed by elasticsearch. On Saturday, 23 August 2014 18:20:23 UTC+5:30, Jörg Prante wrote: Before firing queries, you should consider if the index design and query choice is optimal. Numeric range queries are not straightforward. They were a major issue on inverted index engines like Lucene/Elasticsearch and it has taken some time to introduce efficient implementations. See e.g. https://issues.apache.org/jira/browse/LUCENE-1673 ES tries to compensate the downsides of massive numeric range queries by loading all the field values into memory. To achieve effective queries, you have to carefully discretize the values you index. For example, a few hundred millions of different timestamps, with millisecond resolution, are a real burden for searching on inverted indices. A good discretization strategy for indexing is to reduce the total amount of values in such field to a few hundred or thousands. For timestamps, this means, indexing time-based series data in discrete intervals of days, hours, minutes, maybe seconds is much more efficient than e.g. millisecond resolution. Another topic is to use filters for boolean queries. They are much faster. Jörg On Sat, Aug 23, 2014 at 2:19 PM, Narendra Yadala narendr...@gmail.com wrote: Hi Ivan, Thanks for the input about aggregating on strings, I do that, but those queries take time but they do not crash node. The queries which caused problem were pretty straightforward queries (such as a boolean query with two musts, one must is equal match and other a
indices.memory.index_buffer_size
Hi, Is the indices.memory.index_buffer_size configuration a cluster wide configuration or per node configuration? Do I need to set it on every node? Or just the master (eligible) node? Thanks. Yongtao -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Topics/Entities with relevancy scores and searching
Interesting. so, set a payload on the term, in this case the topic/entity, and the payload is the relevancy value. Then, you can do your function score on the query of the main documents themselves, no need for parent/child. Have you done this? any concerns to performance with this sort of scoring, or, it is just as fast if you were doing base lucene scoring if we override the score function and just use our own? -- we will of course try it and run our own performance tests, just looking to see if you all ready have any insights. Super helpful! Scott On Saturday, August 23, 2014 7:50:18 AM UTC-7, Clinton Gormley wrote: Have a look at: * http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-delimited-payload-tokenfilter.html * http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html On 23 August 2014 15:04, Scott Decker sc...@publishthis.com javascript: wrote: Hey all, a question on possible search paths/structure. If we have a text document, and we have run our magic over it and come away with Topics and Entities (Like, Barack Obama and Apple Inc.) and we have a relevancy score for each one, what would be the best way to store and query against them? we currently are trying a parent/child relationship, where the children are the terms with their relevancy score and the scoring of the parent text document gets done from the relevancy scores of the children. That works. Just worried about speed of parent/child against millions of documents. Another way we could think of was, build our own scorer/analyzer. If we are reading in tokens like BarackObama.93345|AppleInc.0034 where it has the topic and the relevancy score to the document in it, i can build an analyzer to read those sorts of tokens, but is there any way to build a scorer that can use that token match data to score? and third, is there any other way to normalize this data into one document so we can score on it. That seems like it would be the fastest way to query, but my #2 option here is the only way I can think of doing it. Anyone else tagging their documents with relevancy scores to topics, on the document and then letting people search for those topics and pulling back the relevant docs based on the per document relevancy scores? Thanks, Scott -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9434db79-363f-4470-bf91-b960908c2de6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/9434db79-363f-4470-bf91-b960908c2de6%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b3dd847f-99dc-4bad-9a2c-da9b6337ed8c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Boost the first word in a multi-word query
Thanks Vineeth, I can certainly build something with the query string :-) On Fri, Aug 22, 2014 at 8:50 PM, vineeth mohan vm.vineethmo...@gmail.com wrote: Hello Jeremy , You can try query_string then. Query as Brown^2 dog http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-dsl-query-string-query Thanks Vineeth On Sat, Aug 23, 2014 at 12:11 AM, Jérémy mer...@gmail.com wrote: Thanks for your answer! Unfortunately the phrase query is not enough, because I still want to keep words optional. In my understanding, the phrase query requires all the words of the query to be present. Cheers, Jeremy On Fri, Aug 22, 2014 at 8:20 PM, vineeth mohan vm.vineethmo...@gmail.com wrote: Hello Jeremy , I feel what you are looking for is a phrase query . It takes into consideration the order of words - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase Thanks Vineeth On Fri, Aug 22, 2014 at 3:28 PM, Jeremy mer...@gmail.com wrote: In case of a multi-word query, is there a way to boost the first terms of the query? For example, in the following query: GET /my_index/my_type/_search { query: { match: { title: BROWN DOG! } } } Brown should be prioritized over dog, therefore searching for brown dog will not return the same scores as searching for dog brown. I'm ideally looking for a solution which work with N words and put weight accordingly the number of words. Regards, Jeremy -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a53f5752-3da0-41de-b970-f84573b8f5a3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a53f5752-3da0-41de-b970-f84573b8f5a3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/ojEtydA4zAw/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D51EiC_SmiDXD0k2Yj0YacnvXVzaqUOshdkD81HFpgsA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D51EiC_SmiDXD0k2Yj0YacnvXVzaqUOshdkD81HFpgsA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGNSLEwjxRwLgfHAmNWxoGa0BX5ZSEtk6J0QFBvWCBcW8wX42Q%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAGNSLEwjxRwLgfHAmNWxoGa0BX5ZSEtk6J0QFBvWCBcW8wX42Q%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/ojEtydA4zAw/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5k5M8vasScWqjx%2BwHUD%2B-EGof2cLGJGH3YueMKpW0hYFQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAGdPd5k5M8vasScWqjx%2BwHUD%2B-EGof2cLGJGH3YueMKpW0hYFQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGNSLEwSTADYKzJo0HsxrLhpTteDYMDT-dkFcDHB8GZSmJ3_MA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: One large index vs. many smaller indexes
Adrien, Thanks so much for the response. It was very helpful. I will check out those links on capacity planning for sure. One followup question. You mention that tens of shards per node would be ok. Are you meaning tens of shards from tens of indexes? Or tens of shards for a single index? Right now I have two servers configured with the index getting 2 shards (one per server), and 1 replica (per server). Chris On Fri, Aug 22, 2014 at 5:58 PM, Adrien Grand adrien.gr...@elasticsearch.com wrote: Hi Chris, Usually, the problem is not that much in terms of indices but shards, which are the physical units of data storage (an index being a logical view over several shards). Something to beware of is that shards typically have some constant overhead (disk space, file descriptors, memory usage) that does not depend on the amount of data that they store. Although it would be ok to have up to a few tens of shards per nodes, you should avoid to have eg. thousands of shards per node. if you plan on always adding a filter for a specific application in your search requests, then splitting by application makes sense since this will make the filter useless at search time, you will just need to query the application-specific index. On the other hand if you don't filter by application, then splitting data by yourself into smaller indices would be pretty equivalent to storing everything in a single index with a higher number of shards. You might want to check out the following resources that talk about capacity planning: - http://www.elasticsearch.org/videos/big-data-search-and-analytics/ - http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html On Fri, Aug 22, 2014 at 9:08 PM, Chris Neal chris.n...@derbysoft.net wrote: Hi all, As the subject says, I'm wondering about index size vs. number of indexes. I'm indexing many application log files, currently with an index by day for all logs, which will make a very large index. For just a few applications in Development, the index is 55GB a day (across 2 servers). In prod with all applications, it will be much more than that. 1TB a day maybe? I'm wondering if there is value in splitting the indexes by day and by application, which would produce more indexes per day, but they would be smaller, vs. value in having a single, mammoth index by day alone. Is it just a resource question? If I have enough RAM/disk/CPU to support a mammoth index, then I'm fine? Or are there other reasons to (or to not) split up indexes? Very much appreciate your time. Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3DphfsYx0LW0M-yvLWGauRSzVWG0etaBkiTrN7zVafq7tMA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAND3DphfsYx0LW0M-yvLWGauRSzVWG0etaBkiTrN7zVafq7tMA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5i7AAnasMYZgR83aTXvELan%3DkR6OLvGYKfs9d5Subi4A%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5i7AAnasMYZgR83aTXvELan%3DkR6OLvGYKfs9d5Subi4A%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3Dph9Z1My%2B2%2BQ-NM-sWNn2vT1qktDi6%2BmR-b9rFN-Xc-_pw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: indices.memory.index_buffer_size
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-indices.html states It is a global setting that bubbles down to all the different shards allocated on a specific node. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 25 August 2014 03:12, Yongtao You yongtao@gmail.com wrote: Hi, Is the indices.memory.index_buffer_size configuration a cluster wide configuration or per node configuration? Do I need to set it on every node? Or just the master (eligible) node? Thanks. Yongtao -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZrhCzaXc4qAHeiOH8RKr6bKB-oRGH8YDFpz%3D34m29Y3Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: What fields does ElasticSearch map by default?
Hello Albert , Few things here 1. Yes , you cal tell Elasticsearch which fileds to index and which field not to index. You can use index : yes/no property for each field in the schema to specify this. - http://stackoverflow.com/questions/13626617/specify-which-fields-are-indexed-in-elasticsearch 2. There is a concept of _all in Elasticsearch. This would be a super-set of all field values and to search on the entire document , you can simply search on _all field. - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html#mapping-all-field Thanks Vineeth On Sun, Aug 24, 2014 at 4:07 AM, Albert Lim albertlim...@gmail.com wrote: I'm trying to create an image metadata store, and obviously a single image can have 20 or more metadata fields. So if I enter this document into ElasticSearch, will it index/map all those fields? Such that I can query for every field? Or can I tell ElasticSearch what to index or not? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7d8963cd-b9cd-40b3-97c6-e45b65d2760c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7d8963cd-b9cd-40b3-97c6-e45b65d2760c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5m6f3PGizZwKh0S3GbrdXHs%3DL5KcPHHOh%3DYRSzxRztZhA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch Function Score not working with object type
Hey guys, I am trying to use the function score but I am getting the following error: ElasticsearchIllegalArgumentException[No field found for [fsot] in mapping with types [tst]]; I have used function score before and it worked like a charm so I started digging what was wrong. I found out that it does not work with object type. Am I doing something wrong? What am I missing here? The following gist contains an example and the error I received. https://gist.github.com/pmusa/ef9a02210d736ee020d9 Thanks in advance, Pablo Musa -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: DOS attack Elasticsearch with Mappings
If the cluster is that open to users I don't think it'd be easy to prevent a malicious user from intentionally DOSing it. But in this case I think you could make the default for all fields be non-dynamic. That way users have to intentionally send all mapping updates. It'd prevent this short of unintentional DOS. I think this is a setting that you can change and I think that it would only effect new indexes but I admit to not having done it and going from a vague memory of seeing a setting somewhere. Nik On Aug 24, 2014 11:08 PM, Joshua Montgomery josh1s4l...@gmail.com wrote: So an Elasticsearch clusters I help run had an interesting issue last week around mappings and I wanted to get the communities thoughts about how to handle it. *Issue:* Our cluster one morning went into utter chaos for no apparent reason. We had nodes dropping constantly (master and data type nodes) and lots of network exceptions in a our log files. The cluster kept going red from all the dropped nodes and the cluster was totally unresponsive to external commands. *Some Backgound:* Our cluster is fairly open to our users, meaning they can index what ever they want without needing approval (this may have to change based on what happened). The content stored is usually generated from .Net objects and serialized using the Netwonsoft json serializer. *Cause:* After 6hrs of investigation while trying to get our cluster stable, this is what we found: We had a new document type (around 30,000 documents) indexed into the cluster over a 1 hour window containing the .Net equivalent of a dictionary in json format. When a dictionary is serialized to json, it ends up with a json object containing a list of properties and values. The current behavior of Elasticsearch is to generate a mapping definition for each field name in a json object. So when you serialize a dictionary, it means every 'key' in the dictionary gets its own mapping definition. It turns out this can lead to nasty consequences when indexed in Elasticsearch... Essentially, every document contained its own list of unique keys which resulted in Elasticsearch generating mapping definitions for all the keys. We found this out by noticing that the json type with the dictionary continuously kept having is mappings updated (based on the master node log files). The continual updating of the mappings (which is part of the overall state file) caused the master nodes to lock up on the updates, effectively stopping all other cluster operations. The state file upon further investigation was over 70MB large by the time we ended up stopping the cluster. Stopping the cluster was the only way to stop updates to the mappings. The large mapping file we suspect was one of the major reasons for nodes dropping; connections would timeout during the large file copy (i'm assuming the state is passed around the nodes in the cluster). *Solution:* As previously mentioned we had to stop the cluster. We then had to make sure that all indexing operations were stopped. Upon restarting the cluster we deleted all documents of the poisonous document type (which took a while). This resulted is a much smaller state file and a stable cluster. *Prevention:* So this is my real question for the community, what is the correct action for preventing this in the future (or does it already exist). We could obviously start more closely reviewing what goes into our cluster, but should there be a feature in Elasticsearch to prevent this (assuming it doesn't already exist)? I'm assuming that there are a number of users who have clusters where they don't review everything that goes into their cluster. So would it make sense to have Elasticsearch provide some feature to prevent this issue, which is the equivalent to a DOS attack on the cluster? Thanks for reading this and I look forward to your responses! -Josh Montgomery -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/23f0cc94-1cc7-4c8c-995c-c266dfbd40de%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0b-Q1y1152vA%3D%2BCYERGZxuk92iLDG3U-0L18q1oc1oxg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Need some advice to build a log central.
Hello Sang , As this is a question answer forum , we highly recommend you to take a shot yourself and post questions if you have hit a dead end. Thanks Vineeth On Mon, Aug 25, 2014 at 7:56 AM, Sang Dang zkid...@gmail.com wrote: Hi All, I am going to build a log central using ElasticSearch. I need some advice from anyone who have built it already. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5af88543-5806-4021-83a5-41abc5b2bed6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/5af88543-5806-4021-83a5-41abc5b2bed6%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DFpaQo8togCSKFR5J6bsqMBw_8-39uNPPj_Q6H2ag%2Bow%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch Function Score not working with object type
Hello Pablo , Lucene ( the underlying library library on which ES is build upon) has only key value concept and it does not keep object level information. This means that on Lucene side , data would be stored as fsot.testObjects : [ test1 , test2 ] And there is not field names as fsot on lucene side. This means that you need to give the field name as fsot.testObjects rather than fsot. Thanks Vineeth On Mon, Aug 25, 2014 at 7:57 AM, Pablo Musa pablitom...@gmail.com wrote: Hey guys, I am trying to use the function score but I am getting the following error: ElasticsearchIllegalArgumentException[No field found for [fsot] in mapping with types [tst]]; I have used function score before and it worked like a charm so I started digging what was wrong. I found out that it does not work with object type. Am I doing something wrong? What am I missing here? The following gist contains an example and the error I received. https://gist.github.com/pmusa/ef9a02210d736ee020d9 Thanks in advance, Pablo Musa -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5miTeN1xCHHz%2ByOLxoOX%2BwkOq8TAFgXFYrx6ziaJa%3DJ%2Bw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch Function Score not working with object type
It worked. Thank you very much. * copying the final code for future referece: POST test/tst/_search { query: { function_score: { boost_mode: replace, query: { filtered: { query: { match_all: {} }, filter: { exists: { field: fsot } } } }, functions: [ { script_score: { script: if ( doc.get('fsot.testobj') == null ) 0; else 1; } } ] } } } On Monday, August 25, 2014 12:33:47 AM UTC-3, vineeth mohan wrote: Hello Pablo , Lucene ( the underlying library library on which ES is build upon) has only key value concept and it does not keep object level information. This means that on Lucene side , data would be stored as fsot.testObjects : [ test1 , test2 ] And there is not field names as fsot on lucene side. This means that you need to give the field name as fsot.testObjects rather than fsot. Thanks Vineeth On Mon, Aug 25, 2014 at 7:57 AM, Pablo Musa pabli...@gmail.com javascript: wrote: Hey guys, I am trying to use the function score but I am getting the following error: ElasticsearchIllegalArgumentException[No field found for [fsot] in mapping with types [tst]]; I have used function score before and it worked like a charm so I started digging what was wrong. I found out that it does not work with object type. Am I doing something wrong? What am I missing here? The following gist contains an example and the error I received. https://gist.github.com/pmusa/ef9a02210d736ee020d9 Thanks in advance, Pablo Musa -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAF6PhF%2B5A3XYCHhFm_j-rFuXLAz3tE_xAUCcdV2UbFASwCx52Q%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bf051fbb-a860-4898-864b-fedb54c8500c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Error running ES DSL in hadoop mapreduce
Hi Adrien, My elasticsearch version is : elasticsearch-1.2.1 The Maven dependency for hadoop: dependency groupIdorg.elasticsearch/groupId artifactIdelasticsearch-hadoop-mr/artifactId version2.0.1/version /dependency The full stack trace is given below: [2014-08-25 09:31:58,892][DEBUG][action.search.type ] [Thane Ector] [mr][4], node[1ZbXSvkKQC-kDvgMXuC8iQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@6ed78f6d] org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][4]: query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed [Failed to execute main query] at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162) at org.elasticsearch.search.SearchService.executeScan(SearchService.java:215) at org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444) at org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441) at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: 97 at org.elasticsearch.common.util.BigArrays$IntArrayWrapper.set(BigArrays.java:185) at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus$Hashset.values(HyperLogLogPlusPlus.java:499) at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.upgradeToHll(HyperLogLogPlusPlus.java:307) at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLcEncoded(HyperLogLogPlusPlus.java:245) at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLc(HyperLogLogPlusPlus.java:239) at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collect(HyperLogLogPlusPlus.java:231) at org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator$DirectCollector.collect(CardinalityAggregator.java:204) at org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.collect(CardinalityAggregator.java:118) at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucketNoCounts(BucketsAggregator.java:74) at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:63) at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.collect(GlobalOrdinalsStringTermsAggregator.java:98) at org.elasticsearch.search.aggregations.AggregationPhase$AggregationsCollector.collect(AggregationPhase.java:157) at org.elasticsearch.common.lucene.MultiCollector.collect(MultiCollector.java:60) at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:193) at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163) at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309) at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:116) ... 7 more [2014-08-25 09:31:58,894][DEBUG][action.search.type ] [Thane Ector] All shards failed for phase: [init_scan] Thanks, Sona On Friday, August 22, 2014 5:07:33 PM UTC+5:30, Sona Samad wrote: Hi, I was trying to run the below query from hadoop mapreduce: { aggs: { group_by_body_part: { terms: { field: body_part, size: 5, order : { examcount : desc } }, aggs: { examcount: { cardinality: { field: ExamRowKey } } } } } } The query is returning more than 5 records, even when the size is given as 5. Also, the result was not aggregated, rather it returns the entire record from the index as value to mapper. Also the following error is logged: [2014-08-22 16:06:21,459][DEBUG][action.search.type ] [Algrim the Strong] All shards failed for phase: [init_scan] [2014-08-22 16:26:38,875][DEBUG][action.search.type ] [Algrim the Strong] [mr][0], node[r9u9daW_TkqTBBeazKJQNw], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@31b5b771] org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][0]: query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed [Failed to execute main query] at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162) at