I think the biggest different with bigdesk and relatives is that they lack of history, which is why Marvel stores data - so you can always go back and find out what went wrong at night.
If you don't mind me chasing this more (I do want to know what went wrong :) ) - in your production cluster, how many nodes and indices do you have? I'm asking this to get a grip of your 37GB of data (if you prefer to share it privately, you should be able to via the groups interface, otherwise I'm bleskes on freenode at the #elasticsearch where I'm online for most of european waking hours). Cheers, Boaz On Mon, Apr 21, 2014 at 9:45 AM, T Vinod Gupta <tvi...@readypulse.com>wrote: > Thanks Boaz for the reply.. I was using the latest marvel 1.1 by the way. > Looks like you need marvel for marvel! > Actually, my marvel cluster got so messed up that no matter what i did it > would show shard failures in the dashboard and nothing was functional. i > actually had a 2 node cluster for marvel monitoring. and after restart, > they never got out of red state. > so i just gave up on my experimentation with marvel and abandoned it > fully.. > > i probably will go back to bigdesk. any other alternatives that are good? > > thanks > > ps - my feedback to the marvel team would be to provide marvel as a > service.. that will be huge! I noticed that the size of my data dir on > marvel node was 37G just from a few days of monitoring. thats heavy. > > > On Sat, Apr 19, 2014 at 1:05 AM, Boaz Leskes <b.les...@gmail.com> wrote: > >> Hi, >> >> Regarding monitoring node sizing - you have to go through pretty much the >> same procedure as with your main cluster. See how much data it generates >> per day and montior the memory usage of the node while using marvel on a >> single day index. That would be the basis for you calculation. Based on >> that and the number of days of data you want to retain you can decide how >> many nodes you need and how much memory each should get. BTW - make sure >> you use the latest version of marvel (1.1) - it has a way smaller data >> signature. >> >> Regarding error on you main production cluster. I'm a bit puzzled but the >> log output as the events are pretty far apart. It starts by a timeout of >> the marvel agent, 6 hours later it failed to connect (in between it seems >> everything is fine). Almsot 13 hours later the node has had an OOM (after >> which you have restarted it right? it has a different name). Then 40m later >> the log shows that another node (10.183.42.216) is under pressure and >> rejecting searchers. >> >> I'm not sure the first part is related to the second part. Can you share >> your marvel chart of JVM memory regarding the Darkoth node? it seems your >> main cluster is also under memory pressure. >> >> Cheers, >> Boaz >> >> On Thursday, April 17, 2014 10:08:04 PM UTC+2, T Vinod Gupta wrote: >>> >>> hi, >>> in my setup, marvel node is different from production cluster.. the >>> production nodes send data to marvel node.. marvel node had OOM exception. >>> this brings me to the quesiton, how much heap does it need? i ran with >>> default config. >>> >>> in my prod cluster, i have a load balancer which is no data node. it >>> runs with just 2GB heap. due to marvel failure, this node was getting >>> timeouts and for some strange reason went down. >>> >>> what are the best practices here? how can i avoid this in the future? >>> >>> marvel node - >>> [2014-04-17 09:13:33,715][WARN ][index.engine.internal ] >>> [Gorilla-Man] [.marvel-2014.04.17][0] failed engine >>> java.lang.OutOfMemoryError: Java heap space >>> [2014-04-17 09:13:46,890][ERROR][index.engine.internal ] >>> [Gorilla-Man] [.marvel-2014.04.17][0] failed to acquire searcher, source >>> search_factory >>> org.apache.lucene.store.AlreadyClosedException: this ReferenceManager >>> is closed >>> at org.apache.lucene.search.ReferenceManager.acquire( >>> ReferenceManager.java:98) >>> ... >>> >>> >>> ES LB node - >>> [2014-04-17 00:01:00,567][ERROR][marvel.agent.exporter ] [Darkoth] >>> create fai >>> lure (index:[.marvel-2014.04.16] type: [node_stats]): >>> UnavailableShardsException >>> [[.marvel-2014.04.16][0] [2] shardIt, [0] active : Timeout waiting for >>> [1m], req >>> uest: org.elasticsearch.action.bulk.BulkShardRequest@5d9be928] >>> [2014-04-17 06:41:46,975][ERROR][marvel.agent.exporter ] [Darkoth] >>> error conn >>> ecting to [ip-10-68-145-124.ec2.internal:9200] >>> java.net.SocketTimeoutException: connect timed out >>> [2014-04-17 18:53:09,969][DEBUG][action.admin.cluster.node.info] >>> [Darkoth] faile >>> d to execute on node [L1f57myxQLK1SSRHRFcvFQ] >>> java.lang.OutOfMemoryError: Java heap space >>> [2014-04-17 19:35:05,805][DEBUG][action.search.type ] [Witchfire] >>> [twitter >>> _072013][0], node[5GNeFfbPTGi-1EccVvR7Nw], [P], s[STARTED]: Failed to >>> execute [o >>> rg.elasticsearch.action.search.SearchRequest@2f94d571] lastShard [true] >>> org.elasticsearch.transport.RemoteTransportException: [Mauvais][inet[/ >>> 10.183.42. >>> 216:9300]][search/phase/query] >>> Caused by: org.elasticsearch.common.util.concurrent. >>> EsRejectedExecutionException >>> : rejected execution (queue capacity 1000) on >>> org.elasticsearch.transport.netty. >>> MessageChannelHandler$RequestHandler@4c75d754 >>> at org.elasticsearch.common.util.concurrent.EsAbortPolicy. >>> rejectedExecut >>> ion(EsAbortPolicy.java:62) >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/614c3f0e-6aa4-4848-9f47-1a9b93e536f5%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/614c3f0e-6aa4-4848-9f47-1a9b93e536f5%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/Syi85qoZ3Uo/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAHau4yvOpujFvW%2BqDkjE4j0xTpqdejR_-py5Nx_H6%2BzaQP5Vkw%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAHau4yvOpujFvW%2BqDkjE4j0xTpqdejR_-py5Nx_H6%2BzaQP5Vkw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKzwz0rmdvto8SOOLNz6yFqOUMVhx6FcGV9Rk3y%3Di-%2B_e7UcJg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.