Re: frequet disconnection in ignite cluster
Hi Rishi, seems it's not a good idea to connect ignite repeatedly, I observed a similar memory issue. would you mind to share your server configurations (cores, memory)? http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-td9443i20.html http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tt12409.html Tom -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/frequet-disconnection-in-ignite-cluster-tp14411p14442.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
Hi Val, sorry for this late reply. yes, we have a client node constantly joining and leaving topology. does the new versions of Ignite have improvement on this issue? the reason client node constantly joining and leaving topology is twofold: 1. we used a visor cli called by cron job to check there should be more than 2 server nodes, it send out an email alert otherwise. it runs every 10 minutes. it collects the server count like this: IGNITE_COUNT=`/usr/local/apache-ignite-fabric-1.6.0-bin/bin/ignitevisorcmd.sh -e="'open -cpath=/usr/local/apache-ignite-fabric-1.6.0-bin/config/default-config.xml;node;c'" |grep -e "Server " |wc -l` 2. we have another cron job to do some cleanup for the application, which also connects to the topology and leave after done, also runs every few minutes. many thanks Tom -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p14370.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: How to monitor and alert for server counts
we run the visor in 10 every mins, the Ignite servers would OOME in a couple weeks. Ignite server allocated 1g memory. would you mind to share you server's config.xml? any JVM parameters changes? thanks Tom ignite_user2016 wrote > yes you are correct, we run visor in the batch mode. > > we have 2 host hosting 2 client and 2 server for ignite, our conf is bare > minimal 4 cores 8 GB RAM. > > Now with OOME, how often you run the visor command ? we run it every 5 > mins.should you try that ? > > On Mon, May 8, 2017 at 9:01 PM, tysli2016 [via Apache Ignite Users] < > ml+s70518n12557h21@.nabble >> wrote: > >> thanks Rishi, can you share more about that? >> what's the version of Ignite? how many Ignite servers? how many >> CPU/memory? >> are you using the Visor in batch mode (https://apacheignite-tools. >> readme.io/v1.9/docs/batch-mode)? >> or Visor alert? >> >> I have tried Visor batch mode, but it lead to OOME eventually ( >> http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2- >> node-cluster-with-visor-running-repeatedly-Ignite-1-9-td12409.html). >> >> -- >> If you reply to this email, your message will be added to the discussion >> below: >> http://apache-ignite-users.70518.x6.nabble.com/How-to- >> monitor-and-alert-for-server-counts-tp12533p12557.html >> To start a new topic under Apache Ignite Users, email >> > ml+s70518n1h85@.nabble >> To unsubscribe from Apache Ignite Users, click here >> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cmlzaGl5YWduaWtAZ21haWwuY29tfDF8MTMwNTI4OTg1Mw==>; >> . >> NAML >> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>; >> > > > > -- > Rishi Yagnik -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-monitor-and-alert-for-server-counts-tp12533p12582.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: How to monitor and alert for server counts
thanks Rishi, can you share more about that? what's the version of Ignite? how many Ignite servers? how many CPU/memory? are you using the Visor in batch mode (https://apacheignite-tools.readme.io/v1.9/docs/batch-mode)? or Visor alert? I have tried Visor batch mode, but it lead to OOME eventually (http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-td12409.html). -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-monitor-and-alert-for-server-counts-tp12533p12557.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: How to monitor and alert for server counts
thanks Denis, can you share what's the Ignite version using? Can the Visor GUI be started with the alert setting automatically as a background job. I have tried to use visor in batch mode (https://apacheignite-tools.readme.io/v1.9/docs/batch-mode), so that a visor process was created when a cron job run. However this lead to OOME eventually (http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-td12409.html) -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-monitor-and-alert-for-server-counts-tp12533p12556.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: How to monitor and alert for server counts
thanks Andrew, can you share more about how to use Nagios\Icinga or Zabbix to monitor? for what kind of metrics and how to connect/setup? -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-monitor-and-alert-for-server-counts-tp12533p12555.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
How to monitor and alert for server counts
we have a couple of Ignite servers serve as key-value store and want to get email notification when any server went down. anyone are having the same need? what is your solution? -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-monitor-and-alert-for-server-counts-tp12533.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
reproduced the OOME, the heap dump here https://drive.google.com/drive/folders/0BwY2dxDlRYhBMEhmckpWeHg1bjg?usp=sharing -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p12529.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9
Thanks Andrey, is there an option to monitor the number of server nodes in the grid? I found "nc - Total number of nodes in the grid.", seems counting server + client nodes, correct? -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12445.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9
Thank Evgenii! By running the `${IGNITE_HOME}/bin/ignitevisorcmd.sh -e="'open -cpath=${IGNITE_HOME}/config/default-config.xml;node'"`, it shows "Ignite node stopped OK" at the end. Is it an indicator of visor stopped properly? We use the visor output to check the number of Ignite servers running, this checking is trigger by a cron job + shell script, so it starts a new visor each time. How could a shell script use an already started visor? -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12444.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
OOME on 2-node cluster with visor running repeatedly, Ignite 1.9
Got "OutOfMemoryError: Java heap space" with 2-node cluster with a `visor` running repeatedly. The server nodes are running on CentOS 7 inside Oracle VirtualBox VM with the same config: - 2 vCPUs - 3.5GB memory - Oracle JDK 2.8.0_121 `default-config.xml` was modified to use non-default multicast group and 1 backup: http://www.springframework.org/schema/beans"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd";> The `visor` was running repeatedly in one of the nodes by a shell script: #!/bin/bash IGNITE_HOME=/root/apache-ignite-fabric-1.9.0-bin while true do ${IGNITE_HOME}/bin/ignitevisorcmd.sh -e="'open -cpath=${IGNITE_HOME}/config/default-config.xml;node'" done The OOME thrown after the above settings running for 1 day. I have put ignite log, gc log, heap dump in `dee657c8.tgz`, which could be downloaded from https://drive.google.com/drive/folders/0BwY2dxDlRYhBSFJhS0ZWOVBiNk0?usp=sharing. `507f0201.tgz` contains ignite log and gc log from another node in the cluster, for reference just in case. Running `visor` repeatedly is just to reproduce the OOME more quickly, in production we run the `visor` once per 10 minutes to monitor the healthiness of the cluster. Questions: 1. Anything wrong with the configuration? Anything can be tuned to avoid OOME? 2. Is there any other built-in tools allow one to monitor the cluster, showing no. of server nodes is good enough. -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: Ignite Shutdown Hook
hemanta did you call close() on the Ignite instance? Ignite ignite = Ignition.start(); // do something with ignite ignite.close() I have a similar problem before, and found that Ignite have some threads running which prevent the jvm from stopping even the main thread ended. -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Re-Ignite-Shutdown-Hook-tp9661p9670.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
Thanks for your effort in patiently and promptly replies, we are trying to reproduce the issue, will keep you posted. Tom -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p9669.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
How about this OOME? http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-GridDhtPartitionMap2-td9504.html We found it on another client node in the same cluster, however it seems to exhibit a different pattern of memory leak. - Tom -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p9636.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
> Are you sure that all of them are in this map? What is the size of rmtCfgs map? This map contains about 1/6 of the total CacheConfigurations, the size of rmtCfg map is 227,306,792: Class Name | Shallow Heap | Retained Heap - | | java.util.HashMap$Node[65536] @ 0xcd657f20 | 262,160 | 227,306,744 '- table java.util.HashMap @ 0x8423c790 | 48 | 227,306,792 '- rmtCfgs org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @ 0x81a78dc8 | 56 | 227,306,848 The Leak Suspects report shows there are 6 DynamicCacheDescriptors, each size about 216MB: https://drive.google.com/open?id=0BwY2dxDlRYhBbllWZ3pEMW1Tc00 All 6 DynamicCacheDescriptors exhibit the same pattern, they all contains a whole lot of CacheConfigurations in a map, the numbers does add up: Class Name |Objects | Shallow Heap |Retained Heap - || | java.util.HashMap$Node[] | 33,252 |10,886,584 | >= 1,446,877,296 java.util.HashMap |205,140 | 9,846,720 | >= 1,446,860,856 java.util.HashMap$Node | 1,116,555 |35,729,760 | >= 1,441,725,448 org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor | 6 | 336 | >= 681,000,984 org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor | 6 | 336 | >= 680,977,904 org.apache.ignite.configuration.CacheConfiguration | 88,421 |25,465,248 | >= 677,386,632 org.apache.ignite.configuration.CacheConfiguration | 88,418 |25,464,384 | >= 677,363,648 > Actually this map can be non-empty only on a node which is not fully > started yet. Basically, when a new node joins a topology, it collects the > configuration from all nodes for validation check, does the check and > cleans the map. All this is part of the join process. Is this the case? > How many nodes do you have? There are 2 client nodes associated with the heap dump. But they should already joined the cluster, cause these 2 client nodes could be found in the topology with the visor cli. Would these CacheConfiguration objects created for disconnect process as well? because I found they lead to a GridCacheProcessor.cachesOnDisconnect: Class Name | Shallow Heap | Retained Heap - | | java.util.HashMap$Node[65536] @ 0xcd657f20 | 262,160 | 227,306,744 '- table java.util.HashMap @ 0x8423c790 | 48 | 227,306,792 '- rmtCfgs org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @ 0x81a78dc8 | 56 | 227,306,848 '- value java.util.HashMap$Node @ 0xe066c018 | 32 |32 '- [11] java.util.HashMap$Node[16] @ 0xe066bf48 | 80 | 272 '- table java.util.HashMap @ 0xe066bf08 |
Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
ic, so the embedded instance is the IgniteKernal object which implement the Ignite interface. > So these are instances are not really used by Ignite, but are saved > somewhere, most likely in your code. Can you use heap dump to trace it? Yes, there are a whole lot of CacheConfiguration objects created, but it's not from my code. Frankly I think Ignite is having a very well designed API with with I don't have to code much to use it, and I don't have a good reason to create CacheConfiguration objects. What my code do is call Ignition.start() with a IgniteConfiguration object like this: IgniteConfiguration configuration = new IgniteConfiguration(); configuration.setGridLogger(new Slf4jLogger()); configuration.setDiscoverySpi(discoverySpi); configuration.setCommunicationSpi(commumicationSpi); Ignition.setClientMode(true); Ignite ignite = Ignition.start(configuration); And what I found from the heap dump is those CacheConfiguration objects are not saved "somewhere", but in a java.util.HashMap @ 0x8423c790, which is ultimately held by a GridKernalContextImpl: Class Name | Shallow Heap | Retained Heap - | | java.util.HashMap$Node[65536] @ 0xcd657f20 | 262,160 | 227,306,744 '- table java.util.HashMap @ 0x8423c790 | 48 | 227,306,792 '- rmtCfgs org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @ 0x81a78dc8 | 56 | 227,306,848 '- value java.util.HashMap$Node @ 0xe066c018 | 32 |32 '- [11] java.util.HashMap$Node[16] @ 0xe066bf48 | 80 | 272 '- table java.util.HashMap @ 0xe066bf08 | 48 | 320 '- cachesOnDisconnect org.apache.ignite.internal.processors.cache.GridCacheProcessor @ 0x81bacb08 | 80 | 2,640 '- cacheProc org.apache.ignite.internal.GridKernalContextImpl @ 0x818f6900 | 248 | 1,596,328 |- ctx org.apache.ignite.internal.IgniteKernal @ 0x818f68a0 | 96 | 200 | |- ignite org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi @ 0x819c19c0 | 240 | 2,000 | | |- this$0 org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker @ 0x81b97128 tcp-comm-worker-#1%null% Thread| 144 | 2,640 | | |- this$0 org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$3 @ 0x81b97098 | 16 |16 | | '- Total: 2 entries | | | |- ignite org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi @ 0x81968500 | 256 | 104,176 | |- grid org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance @ 0x81967d28 | 88 | 344 | '- Total: 3 entries | | |- ctx org.apache.ignite.internal.processors.clock.GridCl
Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
What do you mean by "embedded instances"? if it's org.apache.ignite.Ignite object then we have two, one for each .war application deployed. And what do you mean by "not properly stopped and / or disconnected"? The application call org.apache.ignite.Ignite.close() only when Tomcat shutdown. What if there are intermittent network failures? There are 2 GridKernalContextImpl found in the heap: Class Name | Objects | Shallow Heap | Retained Heap --- .*GridKernalContextImpl.* | | | org.apache.ignite.internal.GridKernalContextImpl| 1 | 248 | >= 1,596,360 org.apache.ignite.internal.GridKernalContextImpl| 1 | 248 | >= 1,599,712 Total: 2 entries (23,992 filtered) | 2 | 496 | --- you might find the fully expanded histogram here: https://drive.google.com/open?id=0BwY2dxDlRYhBR3pnYUM1cjZiNDg thx, Tom -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p9551.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
and we have the iptables open on ports 3-5, wondering if it's related to the symtom. because it shows `cachesOnDisconnect org.apache.ignite.internal.processors.cache.GridCacheProcessor` holds the objects. -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p9520.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
> How many caches do you have? there are only 2 caches > Any idea why you have so many CacheConfiguration objects? Who holds > references to them? the shortest paths to the accumulation point for java.util.HashMap @ 0x8423c790 shows who holds the references, as you can see below, it's an org.apache.ignite.internal.GridKernalContextImpl @ 0x818f6900, which is held by some other ignite spi / internal objects Class Name | Shallow Heap | Retained Heap - | | java.util.HashMap$Node[65536] @ 0xcd657f20 | 262,160 | 227,306,744 '- table java.util.HashMap @ 0x8423c790 | 48 | 227,306,792 '- rmtCfgs org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @ 0x81a78dc8 | 56 | 227,306,848 '- value java.util.HashMap$Node @ 0xe066c018 | 32 |32 '- [11] java.util.HashMap$Node[16] @ 0xe066bf48 | 80 | 272 '- table java.util.HashMap @ 0xe066bf08 | 48 | 320 '- cachesOnDisconnect org.apache.ignite.internal.processors.cache.GridCacheProcessor @ 0x81bacb08 | 80 | 2,640 '- cacheProc org.apache.ignite.internal.GridKernalContextImpl @ 0x818f6900 | 248 | 1,596,328 |- ctx org.apache.ignite.internal.IgniteKernal @ 0x818f68a0 | 96 | 200 | |- ignite org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi @ 0x819c19c0 | 240 | 2,000 | | |- this$0 org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker @ 0x81b97128 tcp-comm-worker-#1%null% Thread| 144 | 2,640 | | |- this$0 org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$3 @ 0x81b97098 | 16 |16 | | '- Total: 2 entries | | | |- ignite org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi @ 0x81968500 | 256 | 104,176 | |- grid org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance @ 0x81967d28 | 88 | 344 | '- Total: 3 entries | | |- ctx org.apache.ignite.internal.processors.clock.GridClockServer @ 0x819680e0 | 32 |32 |- ctx org.apache.ignite.internal.managers.discovery.GridDiscoveryManager @ 0x81968468 | 128 |45,944 |- ctx org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor @ 0x81b97c98 | 32 | 512 |- ctx org.apache.ignite.internal.processors.rest.GridRestProcessor @ 0x81bad1a0 | 64 | 1,056
Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
Hi Val, sorry I m afraid I cannot provide the heap dump because it might contain some sensitive data. -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p9507.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: Ignite 1.6.0 suspected memory leak from GridDhtPartitionMap2
shortest paths to the accumulation point Class Name | Shallow Heap | Retained Heap | | java.util.concurrent.LinkedBlockingDeque @ 0x86cfb670 | 40 | 893,476,328 '- futQ org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker @ 0x86cf6e80| 56 | 893,476,400 |- , target org.apache.ignite.thread.IgniteThread @ 0x86cf6eb8 exchange-worker-#67%null% Thread | 128 |34,856 |- exchWorker org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager @ 0x86cf6d28 | 96 |25,800 | '- exchMgr org.apache.ignite.internal.processors.cache.GridCacheSharedContext @ 0x818a3fd0 | 72 | 872 '- Total: 2 entries | | -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-GridDhtPartitionMap2-tp9504p9505.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Ignite 1.6.0 suspected memory leak from GridDhtPartitionMap2
OOME found on the same cluster as mentioned in http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-td9443.html Recap the setup here: We have 2 machines (M1, M2) In M1 running a Ignite server (I1) and a Tomcat server (T1), which hosted 2 Ignite clients (C1a, C1b), similarly in M2 running a Ignite server (I2) and a Tomcat server (T2), which hosted 2 Ignite clients (C2a, C2b). After restarting I1, I2, T1, T2 on 2016-12-07, we got an OOME from T2 on 2016-12-10. However the heap dump seems show a different pattern. The histogram shows the memory were held by some java.util.concurrent.LinkedBlockingDeque objects Class Name |Objects | Shallow Heap |Retained Heap --- || | java.util.concurrent.LinkedBlockingDeque | 17 | 680 | >= 1,787,746,904 java.util.concurrent.LinkedBlockingDeque$Node | 50,954 | 1,222,896 | >= 1,426,229,816 java.util.ArrayList |254,999 | 6,119,976 | >= 1,339,620,328 java.lang.Object[] |257,817 |69,919,176 | >= 1,339,571,872 java.util.HashMap |127,003 | 6,096,144 | >= 1,300,721,992 java.util.HashMap$Node[] | 83,427 | 163,804,808 | >= 1,297,201,592 java.util.HashMap$Node | 19,930,286 | 637,769,152 | >= 1,249,981,888 org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker | 2 | 112 | >= 893,476,648 org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker | 2 | 112 | >= 893,322,888 org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture | 21,656 | 3,464,960 | >= 743,337,608 org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture | 21,656 | 3,464,960 | >= 742,465,888 org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$5| 768 |18,432 | >= 450,735,512 org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$5| 768 |18,432 | >= 450,735,512 org.apache.ignite.internal.processors.cache.CacheAffinityChangeMessage |768 |30,720 | >= 450,717,088 org.apache.ignite.internal.processors.cache.CacheAffinityChangeMessage |768 |30,720 | >= 450,717,088 org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsFullMessage |768 |49,152 | >= 450,594,256 org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsFullMessage |768 |49,152 | >= 450,594,256 java.util.Collections$UnmodifiableMap | 15,084 | 482,688 | >= 371,291,912 java.lang.String | 3,517,909 |84,429,816 | >= 353,946,616 org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap | 4,613 | 295,232 | >= 307,249,264 org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap | 4,613 | 295,232 | >= 307,249,264 org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionMap2 | 9,226 | 369,040 | >= 306,363,640 org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionMap2 | 9,226 | 369,040 | >= 306,363,640 char[] | 3,506,511 | 276,762,784 | >= 276,762,784 org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode |
Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
Thx Val for your reply, let me check if I can give you the .hprof file later. This is the dominator tree showing all 23,850 items from java.util.HashMap$Node[65535] @ 0xcd657f20, It's too many of them so I just expand some of them and they are all holding CacheConfiguration objects. https://drive.google.com/file/d/0BwY2dxDlRYhBWXR1NHdWelMtY0E/view We actually found that there are 88,421 + 88418 CacheConfiguration objects holding 677,386,632 + 677,363,648 retained heap Class Name |Objects | Shallow Heap |Retained Heap - || | java.util.HashMap$Node[] | 33,252 |10,886,584 | >= 1,446,877,296 java.util.HashMap |205,140 | 9,846,720 | >= 1,446,860,856 java.util.HashMap$Node | 1,116,555 |35,729,760 | >= 1,441,725,448 org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor | 6 | 336 | >= 681,000,984 org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor | 6 | 336 | >= 680,977,904 org.apache.ignite.configuration.CacheConfiguration | 88,421 |25,465,248 | >= 677,386,632 org.apache.ignite.configuration.CacheConfiguration | 88,418 |25,464,384 | >= 677,363,648 org.apache.ignite.internal.processors.cache.GridCacheDefaultAffinityKeyMapper| 88,419 | 2,122,056 | >= 627,421,920 org.apache.ignite.internal.processors.cache.GridCacheDefaultAffinityKeyMapper| 88,416 | 2,121,984 | >= 627,400,632 org.apache.ignite.internal.util.GridReflectionCache | 88,426 | 2,829,632 | >= 625,349,432 org.apache.ignite.internal.util.GridReflectionCache | 88,423 | 2,829,536 | >= 625,328,216 org.apache.ignite.internal.util.GridBoundedConcurrentLinkedHashMap |176,854 |14,148,320 | >= 620,153,264 org.apache.ignite.internal.util.GridBoundedConcurrentLinkedHashMap |176,860 |14,148,800 | >= 620,098,736 org.jsr166.ConcurrentLinkedHashMap$Segment[] |176,862 |14,151,184 | >= 581,610,816 org.jsr166.ConcurrentLinkedHashMap$Segment[] |176,868 |14,151,664 | >= 581,554,984 org.jsr166.ConcurrentLinkedHashMap$Segment | 2,830,348 | 158,499,488 | >= 567,459,640 org.jsr166.ConcurrentLinkedHashMap$Segment | 2,830,444 | 158,504,864 | >= 567,403,328 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync | 5,661,690 | 271,761,120 | >= 362,348,560 org.jsr166.ConcurrentLinkedHashMap$HashEntry[] | 2,830,444 | 137,029,536 | >= 137,037,776 org.jsr166.ConcurrentLinkedHashMap$HashEntry[] | 2,830,348 | 137,024,840 | >= 137,033,280 java.lang.String |979,752 |23,514,048 |>= 92,539,136 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock | 5,661,691 |90,587,056 |>= 90,587,064 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock | 5,661,691 |90,587,056 |>= 90,587,064 java.util.concurrent.locks.ReentrantReadWriteLock$Sync$ThreadLocalHoldCounter| 5,661,691 |90,587,056 |>= 90,587,056 char[] |984,330 |80,884,824 |>= 80,884,824 org.apache.catalina.loader.WebappClassLoader | 8 | 1,088 |>= 58,022,056 java.util.concurrent.ConcurrentSkipListMap$HeadIndex |171 | 5,472 |>= 52,983,616 java.util.concurrent.ConcurrentSkipListMap | 58 | 2,784 |>= 52,871,176 java.lang.Class | 24,003 | 213,864 |>= 48,210,712 java.util.HashSet |187,879 | 3,006,064 |>= 46,999,120 java.util.concurrent.ConcurrentSkipListMap$Node | 10,534 | 252,816 |>= 37,540,176 java.util.Collections$UnmodifiableSet | 2,857 |45,712 |>= 33,773,632
Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor
We have 2 machines (M1, M2) In M1 running a Ignite server (I1) and a Tomcat server (T1), which hosted 2 Ignite clients (C1a, C1b), similarly in M2 running a Ignite server (I2) and a Tomcat server (T2), which hosted 2 Ignite clients (C2a, C2b). OutOfMemoryError were found in both T1 and T2 yesterday, we have a heap dump from T1, but failed to get a heap dump from T2. After the error, by using visor we found only T1, C1a, and C1b were still in the cluster, all Ignite nodes on M2 were disconnected. We've lost the Tomcat server log due to some reason. >From the heap dump of T1 we found 6 org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor holding total 1.2GB memory: Class Name | Shallow Heap | Retained Heap | Percentage -- | | | org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @ 0x81a78dc8 | 56 | 227,306,848 | 14.05% '- java.util.HashMap @ 0x8423c790 | 48 | 227,306,792 | 14.05% '- java.util.HashMap$Node[65536] @ 0xcd657f20 | 262,160 | 227,306,744 | 14.05% org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @ 0x8449bed0 | 56 | 227,299,144 | 14.05% org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @ 0x81ae5c10 | 56 | 227,071,080 | 14.04% org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @ 0x8449f158 | 56 | 227,063,384 | 14.04% org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @ 0x81adfa38 | 56 | 226,599,544 | 14.01% org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @ 0x8449f090 | 56 | 226,591,864 | 14.01% org.apache.catalina.loader.WebappClassLoader @ 0x80167cd8 | 136 |29,521,360 | 1.83% org.apache.catalina.loader.WebappClassLoader @ 0x8363a078 | 136 |28,471,928 | 1.76% com.cityline.cps.admin.api.controller.LoginController @ 0x819c54d0 | 48 |21,042,952 | 1.30% org.apache.ignite.spi.discovery.tcp.ClientImpl @ 0x84477b20 | 96 |14,456,360 | 0.89% org.apache.ignite.internal.processors.cache.GridCacheContext @ 0xade21488 | 152 |10,462,696 | 0.65% org.apache.ignite.internal.processors.cache.GridCacheContext @ 0xae8ec420 | 152 | 9,800,264 | 0.61% org.apache.ignite.internal.processors.cache.GridCacheContext @ 0x86aa66a0 | 152 | 9,677,848 | 0.60% org.apache.ignite.internal.processors.cache.GridCacheContext @ 0x86ac6b58 | 152 | 9,634,288 | 0.60% org.apache.ignite.internal.processors.cache.GridCacheContext @ 0x83013720 | 152 | 6,075,032 | 0.38% org.apache.ignite.internal.processors.cache.GridCacheContext @ 0x86008f08 | 152 | 5,967,304 | 0.37% class java.beans.ThreadGroupContext @ 0x81070c30 System Class |8 | 2,634,888 | 0.16% org.hibernate.internal.SessionFactoryImpl @ 0x811ec9f0 | 136 | 2,406,176 | 0.15% org.hibernate.internal.SessionFactoryImpl @ 0x85de4c98 | 136 | 2,082,128 | 0.13% org.hibernate.internal.SessionFactoryImpl @ 0x83b30890 | 136 | 2,036,976 | 0.13% org.apache.ignite.internal.GridKernalContextImpl @ 0x841d4090 | 248 | 1,599,680 | 0.10% org.apache.ignite.internal.GridKernalContextImpl @ 0x818f6900 | 248 | 1,596,328 | 0.10% java.net.URLClassLoader @ 0x800fe890 | 80 | 1,451,640 | 0.09% org.apache.catalina.webresources.JarResourceSet @ 0x8019d4d0 | 72 | 1,302,400 | 0.08% org.apache.ignite.internal.processors.cache.GridCacheSharedContext @ 0x8427bbc0 | 72 | 1,242,936 | 0.08% org.apache.ignite.internal.