Upgrade to 1.4.2 from 1.1.1 causes spike in Disk IO
Hi there, We just completed this past Monday an ES upgrade from 1.1.1 to 1.4.2 and a Java upgrade from 7 to 8. No other changes. Same boxes, same specs, no new queries or filters, no increased requests. What I see in Marvel since the upgrade is a large spike in Disk IO: https://lh6.googleusercontent.com/-i6OYBiqputk/VNuJC-47eII/A_g/TWxTcres-Uk/s1600/Screen%2BShot%2B2015-02-11%2Bat%2B11.37.58%2BAM.png and a large dip in field data, filter cache and overall index memory consumption: https://lh5.googleusercontent.com/-2DRu7af1npc/VNuJNqsvOQI/A_o/56trKYYMiQw/s1600/Screen%2BShot%2B2015-02-11%2Bat%2B11.37.47%2BAM.png Should I be looking at the new circuit breaker http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html#circuit-breaker settings? Any pointers are welcome! -GS -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5e6dbf8a-4dd9-46d3-8cdb-a45c79a6b43c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Marvel - empty node stats section in Dashboard - JS error in console
- Running 0.90.11 - Marvel monitor machine and the cluster it monitors are separated - Installed Marvel the day after it was released and it was originally running fine (all panels filled with data) Checked the dashboard again today (nothing has changed since it was installed) and some panels in the dashboard like the node stats and the indices stats are blank. I tried re-installing marvel on the monitoring box, cleaned out all marvel-* indexes and even wiped out the data path directory and let it re-build and the problem persisted. So I took a look at the JS console and it shows the following errors (host name redacted): event.returnValue is deprecated. Please use the standard event.preventDefault() instead. app.js?r=face658:5 POST http://some_internal_host_name_/.marvel-2014.02.20/_search?search_type=count 400 (Bad Request) app.js?r=face658:9 TypeError: Cannot read property 'timestamp' of undefined at http://some_internal_host_name_/_plugin/marvel/kibana/app/panels/marvel/stats_table/module.js?r=face658:4:10741 at i (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:458) at http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:1014 at Object.f.$eval (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:6963) at Object.f.$digest (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:5755) at Object.f.$apply (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:7111) at f (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:11507) at r (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:13216) at XMLHttpRequest.v.onreadystatechange (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:13851) app.js?r=face658:8 POST http://some_internal_host_name_/.marvel-2014.02.20/_search?search_type=count 400 (Bad Request) app.js?r=face658:9 TypeError: Cannot read property 'timestamp' of undefined at http://some_internal_host_name_/_plugin/marvel/kibana/app/panels/marvel/stats_table/module.js?r=face658:4:10741 at i (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:458) at http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:1014 at Object.f.$eval (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:6963) at Object.f.$digest (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:5755) at Object.f.$apply (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:7111) at f (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:11507) at r (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:13216) at XMLHttpRequest.v.onreadystatechange (http://some_internal_host_name_/_plugin/marvel/kibana/app/app.js?r=face658:9:13851) app.js?r=face658:8 Retrying call from Thu Feb 20 2014 11:46:56 GMT-0500 (EST) module.js?r=face658:4 Retrying call from Thu Feb 20 2014 11:46:56 GMT-0500 (EST) module.js?r=face658:4 Retrying call from Thu Feb 20 2014 11:47:16 GMT-0500 (EST) module.js?r=face658:4 Retrying call from Thu Feb 20 2014 11:47:16 GMT-0500 (EST) module.js?r=face658:4 I don't understand how this error just showed up without having changed anything on the monitor machine or the cluster it monitors. Any clues are welcome. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ed39bf1-2e93-4a6a-979f-c42718440a53%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Marvel - empty node stats section in Dashboard - JS error in console
Found the issue. The monitoring machine log was filled with these: [2014-02-20 14:05:07,137][DEBUG][action.search.type ] [elasticsearch.marvel.1.traackr.com] [.marvel-2014.02.20][0], node[SwW74zsHRSGZd7OIpukWVQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@516c99db] org.elasticsearch.search.SearchParseException: [.marvel-2014.02.20][0]: query[ConstantScore(BooleanFilter(+no_cache(@timestamp:[1392919507058 TO 1392923107136]) +cache(_type:index_stats) +cache(@timestamp:[139292250 TO 139292316])))],from[-1],size[0]: Parse Failure [Failed to parse source [{size:0,query:{filtered:{query:{match_all:{}},filter:{bool:{must:[{range:{@timestamp:{from:1392919507058,to:now}}},{term:{_type:index_stats}},{range:{@timestamp:{from:now-10m/m,to:now/m}}}],facets:{timestamp:{terms_stats:{key_field:index.raw,value_field:@timestamp,order:term,size:2000}},primaries.docs.count:{terms_stats:{key_field:index.raw,value_field:primaries.docs.count,order:term,size:2000}},primaries.indexing.index_total:{terms_stats:{key_field:index.raw,value_field:primaries.indexing.index_total,order:term,size:2000}},total.search.query_total:{terms_stats:{key_field:index.raw,value_field:total.search.query_total,order:term,size:2000}},total.merges.total_size_in_bytes:{terms_stats:{key_field:index.raw,value_field:total.merges.total_size_in_bytes,order:term,size:2000}},total.fielddata.memory_size_in_bytes:{terms_stats:{key_field:index.raw,value_field:total.fielddata.memory_size_in_bytes,order:term,size:2000]] at org.elasticsearch.search.SearchService.parseSource(SearchService.java:581) at org.elasticsearch.search.SearchService.createContext(SearchService.java:484) at org.elasticsearch.search.SearchService.createContext(SearchService.java:469) at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:462) at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:234) at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202) at org.elasticsearch.action.search.type.TransportSearchCountAction$AsyncAction.sendExecuteFirstPhase(TransportSearchCountAction.java:70) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203) at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet [timestamp]: failed to find mapping for index.raw at org.elasticsearch.search.facet.termsstats.TermsStatsFacetParser.parse(TermsStatsFacetParser.java:119) at org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:94) at org.elasticsearch.search.SearchService.parseSource(SearchService.java:569) ... 12 more [2014-02-20 14:05:07,137][DEBUG][action.search.type ] [elasticsearch.marvel.1.traackr.com] All shards failed for phase: [query] Looking at failed to find mapping made me think of the marvel mappings. Sure enough, I did overwrite the settings but I messed up: instead of re-naming my custom template I accidentally named it marvel essentially clobbering the default template. No wonder it wasn't able to index the fields properly. I restored the template and it all works fine now. Should have looked at the logs first. Thanks for point that out. On Thursday, February 20, 2014 1:31:00 PM UTC-5, Boaz Leskes wrote: Hi George, I'm wondering why these search request return 400 bad request. My guess is that it misses data. Can you look at the ES logs on you server and see whether you have errors? Also please run the following (note that it doesn't have a body, that's intentional): curl http://...:9200/.marvel-2014.02.20/_search?search_type=count Does this return correctly? Cheers, Boaz On Thursday, February 20, 2014 6:04:10 PM UTC+1, George Stathis wrote: - Running 0.90.11 - Marvel monitor machine and the cluster it monitors are separated - Installed Marvel the day after it was released and it was originally running fine (all panels filled with data) Checked the dashboard again today (nothing has changed since it was installed) and some panels in the dashboard like the node stats and the indices stats are blank. I tried re-installing marvel on the monitoring box, cleaned out all marvel-* indexes and even wiped out the data path directory and let it re-build and the problem persisted. So I took a look at the JS console and it shows the following errors (host name redacted
Optimizing shard size and count per node
I'm going through a cluster sizing exercise and I'm looking for some pointers. I have tested the target hardware with a single shard loading production data and simulating some of our actual queries using Gatling. Our heaviest queries are a blend of top_children with filters, match_all with query_string filters and query_string queries with has_parent query_string filters. I've narrowed down an optimal shard size based on those tests which should give me the target number of shards for my index (num_shards = index_size / max_shard_size). Some other notes are: - memory on the box is split 50/50 between JVM heap and OS. - the available OS page cache memory can maybe fit two shards worth of data - I have plenty of room left for field and filter caches - heap and GC seem to be behaving nicely with a slow and consistent step curve in Bigdesk - tests seem to show the box being CPU bound but it's not disk IO: iostat does not show any disk reads during the high load times, Bigdesk shows most time spent in user space and the highest activity seems to be around young generation collections At this point, what are some of the most important things to consider in terms of how many of these shards to co-locate on the same node? - Would adding more shards on the same box negate the tests I just ran? - Do I limit the number of shards to how much can fit in page cache on the node? That really blows up the number of nodes I will need which is pretty costly. Is this approach really worth it? - Since I'm CPU bound, shouldn't I be worried about more cores if I want more shards per box? Any pointers are welcome. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c2f49e2-1cac-4668-a7ae-b029f2ae6950%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.