Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9
"nc - Total number of nodes in the grid" count server + client nodes. I can't find metrics for server nodes. I tried to check the heap dump in "mat" and found a large size of TcpCommunicationSpi#recoveryDescs. Does anyone have an idea why this happened? -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12446.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9
Thanks Andrey, is there an option to monitor the number of server nodes in the grid? I found "nc - Total number of nodes in the grid.", seems counting server + client nodes, correct? -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12445.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9
Thank Evgenii! By running the `${IGNITE_HOME}/bin/ignitevisorcmd.sh -e="'open -cpath=${IGNITE_HOME}/config/default-config.xml;node'"`, it shows "Ignite node stopped OK" at the end. Is it an indicator of visor stopped properly? We use the visor output to check the number of Ignite servers running, this checking is trigger by a cron job + shell script, so it starts a new visor each time. How could a shell script use an already started visor? -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12444.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9
Hi tysli2016, You can run connected to cluster visorcmd in background and register an alert. When this alert is triggered, it may call custom user script. More info can be found here: https://apacheignite-tools.readme.io/docs/alerts-configuration -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12417.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9
Hi, As i see, you run visor in internal mode, so, it creates a node each time. Are you sure that you stop them properly? Why do you need to start new visor each time? Just use already started visor. Evgenii 2017-05-04 11:57 GMT+03:00 tysli2016 : > Got "OutOfMemoryError: Java heap space" with 2-node cluster with a `visor` > running repeatedly. > > The server nodes are running on CentOS 7 inside Oracle VirtualBox VM with > the same config: > - 2 vCPUs > - 3.5GB memory > - Oracle JDK 2.8.0_121 > > `default-config.xml` was modified to use non-default multicast group and 1 > backup: > http://www.springframework.org/schema/beans"; >xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; >xsi:schemaLocation=" >http://www.springframework.org/schema/beans >http://www.springframework.org/schema/beans/spring-beans.xsd";> > class="org.apache.ignite.configuration.IgniteConfiguration"> > > class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"> > > class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast. > TcpDiscoveryMulticastIpFinder"> > value="228.10.10.158"/> > > > > > > class="org.apache.ignite.configuration.CacheConfiguration"> > > > > > > > > The `visor` was running repeatedly in one of the nodes by a shell script: > #!/bin/bash > IGNITE_HOME=/root/apache-ignite-fabric-1.9.0-bin > while true > do > ${IGNITE_HOME}/bin/ignitevisorcmd.sh -e="'open > -cpath=${IGNITE_HOME}/config/default-config.xml;node'" > done > > > The OOME thrown after the above settings running for 1 day. > I have put ignite log, gc log, heap dump in `dee657c8.tgz`, which could be > downloaded from > https://drive.google.com/drive/folders/0BwY2dxDlRYhBSFJhS0ZWOVBiNk0? > usp=sharing. > `507f0201.tgz` contains ignite log and gc log from another node in the > cluster, for reference just in case. > > Running `visor` repeatedly is just to reproduce the OOME more quickly, in > production we run the `visor` once per 10 minutes to monitor the > healthiness > of the cluster. > > Questions: > 1. Anything wrong with the configuration? Anything can be tuned to avoid > OOME? > 2. Is there any other built-in tools allow one to monitor the cluster, > showing no. of server nodes is good enough. > > > > -- > View this message in context: http://apache-ignite-users. > 70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor- > running-repeatedly-Ignite-1-9-tp12409.html > Sent from the Apache Ignite Users mailing list archive at Nabble.com. >
OOME on 2-node cluster with visor running repeatedly, Ignite 1.9
Got "OutOfMemoryError: Java heap space" with 2-node cluster with a `visor` running repeatedly. The server nodes are running on CentOS 7 inside Oracle VirtualBox VM with the same config: - 2 vCPUs - 3.5GB memory - Oracle JDK 2.8.0_121 `default-config.xml` was modified to use non-default multicast group and 1 backup: http://www.springframework.org/schema/beans"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd";> The `visor` was running repeatedly in one of the nodes by a shell script: #!/bin/bash IGNITE_HOME=/root/apache-ignite-fabric-1.9.0-bin while true do ${IGNITE_HOME}/bin/ignitevisorcmd.sh -e="'open -cpath=${IGNITE_HOME}/config/default-config.xml;node'" done The OOME thrown after the above settings running for 1 day. I have put ignite log, gc log, heap dump in `dee657c8.tgz`, which could be downloaded from https://drive.google.com/drive/folders/0BwY2dxDlRYhBSFJhS0ZWOVBiNk0?usp=sharing. `507f0201.tgz` contains ignite log and gc log from another node in the cluster, for reference just in case. Running `visor` repeatedly is just to reproduce the OOME more quickly, in production we run the `visor` once per 10 minutes to monitor the healthiness of the cluster. Questions: 1. Anything wrong with the configuration? Anything can be tuned to avoid OOME? 2. Is there any other built-in tools allow one to monitor the cluster, showing no. of server nodes is good enough. -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.