Re: frequet disconnection in ignite cluster

2017-07-06 Thread tysli2016
Hi Rishi,

seems it's not a good idea to connect ignite repeatedly, I observed a
similar memory issue.
would you mind to share your server configurations (cores, memory)?

http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-td9443i20.html

http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tt12409.html

Tom



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/frequet-disconnection-in-ignite-cluster-tp14411p14442.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2017-07-05 Thread tysli2016
Hi Val,

sorry for this late reply.
yes, we have a client node constantly joining and leaving topology.
does the new versions of Ignite have improvement on this issue?


the reason client node constantly joining and leaving topology is twofold:
1. we used a visor cli called by cron job to check there should be more than
2 server nodes, it send out an email alert otherwise. it runs every 10
minutes.

it collects the server count like this:
   
IGNITE_COUNT=`/usr/local/apache-ignite-fabric-1.6.0-bin/bin/ignitevisorcmd.sh
-e="'open
-cpath=/usr/local/apache-ignite-fabric-1.6.0-bin/config/default-config.xml;node;c'"
|grep -e "Server  " |wc -l`

2. we have another cron job to do some cleanup for the application, which
also connects to the topology and leave after done, also runs every few
minutes.

many thanks 
Tom




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p14370.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: How to monitor and alert for server counts

2017-05-10 Thread tysli2016
we run the visor in 10 every mins, the Ignite servers would OOME in a couple
weeks.
Ignite server allocated 1g memory.

would you mind to share you server's config.xml? any JVM parameters changes?

thanks 
Tom


ignite_user2016 wrote
> yes you are correct, we run visor in the batch mode.
> 
> we have 2 host hosting 2 client and 2 server for ignite, our conf is bare
> minimal 4 cores 8 GB RAM.
> 
> Now with OOME, how often you run the visor command ? we run it every 5
> mins.should you try that ?
> 
> On Mon, May 8, 2017 at 9:01 PM, tysli2016 [via Apache Ignite Users] <

> ml+s70518n12557h21@.nabble

>> wrote:
> 
>> thanks Rishi, can you share more about that?
>> what's the version of Ignite? how many Ignite servers? how many
>> CPU/memory?
>> are you using the Visor in batch mode (https://apacheignite-tools.
>> readme.io/v1.9/docs/batch-mode)?
>> or Visor alert?
>>
>> I have tried Visor batch mode, but it lead to OOME eventually (
>> http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-
>> node-cluster-with-visor-running-repeatedly-Ignite-1-9-td12409.html).
>>
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-ignite-users.70518.x6.nabble.com/How-to-
>> monitor-and-alert-for-server-counts-tp12533p12557.html
>> To start a new topic under Apache Ignite Users, email
>> 

> ml+s70518n1h85@.nabble

>> To unsubscribe from Apache Ignite Users, click here
>> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cmlzaGl5YWduaWtAZ21haWwuY29tfDF8MTMwNTI4OTg1Mw==>;
>> .
>> NAML
>> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>;
>>
> 
> 
> 
> -- 
> Rishi Yagnik





--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/How-to-monitor-and-alert-for-server-counts-tp12533p12582.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: How to monitor and alert for server counts

2017-05-08 Thread tysli2016
thanks Rishi, can you share more about that?
what's the version of Ignite? how many Ignite servers? how many CPU/memory?
are you using the Visor in batch mode
(https://apacheignite-tools.readme.io/v1.9/docs/batch-mode)?
or Visor alert?

I have tried Visor batch mode, but it lead to OOME eventually
(http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-td12409.html).



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/How-to-monitor-and-alert-for-server-counts-tp12533p12557.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: How to monitor and alert for server counts

2017-05-08 Thread tysli2016
thanks Denis, can you share what's the Ignite version using? Can the Visor
GUI be started with the alert setting automatically as a background job.

I have tried to use visor in batch mode
(https://apacheignite-tools.readme.io/v1.9/docs/batch-mode), so that a visor
process was created when a cron job run. However this lead to OOME
eventually
(http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-td12409.html)




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/How-to-monitor-and-alert-for-server-counts-tp12533p12556.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: How to monitor and alert for server counts

2017-05-08 Thread tysli2016
thanks Andrew, can you share more about how to use Nagios\Icinga or Zabbix to
monitor? for what kind of metrics and how to connect/setup?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/How-to-monitor-and-alert-for-server-counts-tp12533p12555.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


How to monitor and alert for server counts

2017-05-08 Thread tysli2016
we have a couple of Ignite servers serve as key-value store and want to get
email notification when any server went down.

anyone are having the same need? what is your solution?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/How-to-monitor-and-alert-for-server-counts-tp12533.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2017-05-08 Thread tysli2016
reproduced the OOME, the heap dump here
https://drive.google.com/drive/folders/0BwY2dxDlRYhBMEhmckpWeHg1bjg?usp=sharing



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p12529.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9

2017-05-04 Thread tysli2016
Thanks Andrey, is there an option to monitor the number of server nodes in
the grid?

I found "nc - Total number of nodes in the grid.", seems counting server +
client nodes, correct?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12445.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: OOME on 2-node cluster with visor running repeatedly, Ignite 1.9

2017-05-04 Thread tysli2016
Thank Evgenii!

By running the `${IGNITE_HOME}/bin/ignitevisorcmd.sh -e="'open
-cpath=${IGNITE_HOME}/config/default-config.xml;node'"`, it shows "Ignite
node stopped OK" at the end. Is it an indicator of visor stopped properly?

We use the visor output to check the number of Ignite servers running, this
checking is trigger by a cron job + shell script, so it starts a new visor
each time.

How could a shell script use an already started visor?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409p12444.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


OOME on 2-node cluster with visor running repeatedly, Ignite 1.9

2017-05-04 Thread tysli2016
Got "OutOfMemoryError: Java heap space" with 2-node cluster with a `visor`
running repeatedly.

The server nodes are running on CentOS 7 inside Oracle VirtualBox VM with
the same config:
- 2 vCPUs
- 3.5GB memory
- Oracle JDK 2.8.0_121

`default-config.xml` was modified to use non-default multicast group and 1
backup:
http://www.springframework.org/schema/beans";
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
   xsi:schemaLocation="
   http://www.springframework.org/schema/beans
   http://www.springframework.org/schema/beans/spring-beans.xsd";>



















The `visor` was running repeatedly in one of the nodes by a shell script:
#!/bin/bash
IGNITE_HOME=/root/apache-ignite-fabric-1.9.0-bin
while true
do
  ${IGNITE_HOME}/bin/ignitevisorcmd.sh -e="'open
-cpath=${IGNITE_HOME}/config/default-config.xml;node'"
done


The OOME thrown after the above settings running for 1 day.
I have put ignite log, gc log, heap dump in `dee657c8.tgz`, which could be
downloaded from 
https://drive.google.com/drive/folders/0BwY2dxDlRYhBSFJhS0ZWOVBiNk0?usp=sharing.
`507f0201.tgz` contains ignite log and gc log from another node in the
cluster, for reference just in case.

Running `visor` repeatedly is just to reproduce the OOME more quickly, in
production we run the `visor` once per 10 minutes to monitor the healthiness
of the cluster.

Questions:
1. Anything wrong with the configuration? Anything can be tuned to avoid
OOME?
2. Is there any other built-in tools allow one to monitor the cluster,
showing no. of server nodes is good enough.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tp12409.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite Shutdown Hook

2016-12-20 Thread tysli2016
hemanta did you call close() on the Ignite instance?

Ignite ignite = Ignition.start();
// do something with ignite
ignite.close()

I have a similar problem before, and found that Ignite have some threads
running which prevent the jvm from stopping even the main thread ended.





--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Re-Ignite-Shutdown-Hook-tp9661p9670.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2016-12-20 Thread tysli2016
Thanks for your effort in patiently and promptly replies, we are trying to
reproduce the issue, will keep you posted.

Tom



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p9669.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2016-12-19 Thread tysli2016
How about this OOME?
http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-GridDhtPartitionMap2-td9504.html

We found it on another client node in the same cluster, however it seems to
exhibit a different pattern of memory leak.

- Tom



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p9636.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2016-12-18 Thread tysli2016
> Are you sure that all of them are in this map? What is the size of rmtCfgs
map?

This map contains about 1/6 of the total CacheConfigurations, the size of
rmtCfg map is 227,306,792:

Class Name  
 
| Shallow Heap | Retained Heap 
   
-
 

 
|  |   
java.util.HashMap$Node[65536] @ 0xcd657f20  
 
|  262,160 |   227,306,744 
'- table java.util.HashMap @ 0x8423c790 
 
|   48 |   227,306,792 
   '- rmtCfgs
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @
0x81a78dc8 |   56 |   227,306,848 


The Leak Suspects report shows there are 6 DynamicCacheDescriptors, each
size about 216MB:
https://drive.google.com/open?id=0BwY2dxDlRYhBbllWZ3pEMW1Tc00

All 6 DynamicCacheDescriptors exhibit the same pattern, they all contains a
whole lot of CacheConfigurations in a map, the numbers does add up:

Class Name  
|Objects |  Shallow Heap |Retained Heap 
   
-
 

||   | 
java.util.HashMap$Node[]
| 33,252 |10,886,584 | >= 1,446,877,296 
java.util.HashMap   
|205,140 | 9,846,720 | >= 1,446,860,856 
java.util.HashMap$Node  
|  1,116,555 |35,729,760 | >= 1,441,725,448 
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor  
|  6 |   336 |   >= 681,000,984 
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor  
|  6 |   336 |   >= 680,977,904 
org.apache.ignite.configuration.CacheConfiguration  
| 88,421 |25,465,248 |   >= 677,386,632 
org.apache.ignite.configuration.CacheConfiguration  
| 88,418 |25,464,384 |   >= 677,363,648 



> Actually this map can be non-empty only on a node which is not fully
> started yet. Basically, when a new node joins a topology, it collects the
> configuration from all nodes for validation check, does the check and
> cleans the map. All this is part of the join process. Is this the case?
> How many nodes do you have? 

There are 2 client nodes associated with the heap dump.
But they should already joined the cluster, cause these 2 client nodes could
be found in the topology with the visor cli. Would these CacheConfiguration
objects created for disconnect process as well? because I found they lead to
a GridCacheProcessor.cachesOnDisconnect:

Class Name  

 
| Shallow Heap | Retained Heap 
   
-
 


 
|  |   
java.util.HashMap$Node[65536] @ 0xcd657f20  

 
|  262,160 |   227,306,744 
'- table java.util.HashMap @ 0x8423c790 

 
|   48 |   227,306,792 
   '- rmtCfgs
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @
0x81a78dc8
|   56 |   227,306,848 
  '- value java.util.HashMap$Node @ 0xe066c018  

 
|   32 |32 
 '- [11] java.util.HashMap$Node[16] @ 0xe066bf48

 
|   80 |   272 
'- table java.util.HashMap @ 0xe066bf08 

 
|   

Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2016-12-15 Thread tysli2016
ic, so the embedded instance is the IgniteKernal object which implement the
Ignite interface.

> So these are instances are not really used by Ignite, but are saved
> somewhere, most likely in your code. Can you use heap dump to trace it? 

Yes, there are a whole lot of CacheConfiguration objects created, but it's
not from my code.
Frankly I think Ignite is having a very well designed API with with I don't
have to code much to use it, and I don't have a good reason to create
CacheConfiguration objects. What my code do is call Ignition.start() with a
IgniteConfiguration object like this:

IgniteConfiguration configuration = new IgniteConfiguration();
configuration.setGridLogger(new Slf4jLogger());
configuration.setDiscoverySpi(discoverySpi);
configuration.setCommunicationSpi(commumicationSpi);

Ignition.setClientMode(true);
Ignite ignite = Ignition.start(configuration);  


And what I found from the heap dump is those CacheConfiguration objects are
not saved "somewhere", but in a java.util.HashMap @ 0x8423c790, which is
ultimately held by a GridKernalContextImpl:

Class Name  

 
| Shallow Heap | Retained Heap 
   
-
 


 
|  |   
java.util.HashMap$Node[65536] @ 0xcd657f20  

 
|  262,160 |   227,306,744 
'- table java.util.HashMap @ 0x8423c790 

 
|   48 |   227,306,792 
   '- rmtCfgs
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @
0x81a78dc8
|   56 |   227,306,848 
  '- value java.util.HashMap$Node @ 0xe066c018  

 
|   32 |32 
 '- [11] java.util.HashMap$Node[16] @ 0xe066bf48

 
|   80 |   272 
'- table java.util.HashMap @ 0xe066bf08 

 
|   48 |   320 
   '- cachesOnDisconnect
org.apache.ignite.internal.processors.cache.GridCacheProcessor @ 0x81bacb08 

|   80 | 2,640 
  '- cacheProc
org.apache.ignite.internal.GridKernalContextImpl @ 0x818f6900   
  
|  248 | 1,596,328 
 |- ctx org.apache.ignite.internal.IgniteKernal @
0x818f68a0  
   
|   96 |   200 
 |  |- ignite
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi @ 0x819c19c0
   
|  240 | 2,000 
 |  |  |- this$0
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker
@ 0x81b97128  tcp-comm-worker-#1%null% Thread|  144 | 2,640 
 |  |  |- this$0
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$3 @ 0x81b97098  

|   16 |16 
 |  |  '- Total: 2 entries  

 
|  |   
 |  |- ignite
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi @ 0x81968500
   
|  256 |   104,176 
 |  |- grid
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance @ 0x81967d28  
 
|   88 |   344 
 |  '- Total: 3 entries 

 
|  |   
 |- ctx
org.apache.ignite.internal.processors.clock.GridCl

Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2016-12-14 Thread tysli2016
What do you mean by "embedded instances"? if it's org.apache.ignite.Ignite
object then we have two, one for each .war application deployed.

And what do you mean by "not properly stopped and / or disconnected"? The
application call org.apache.ignite.Ignite.close() only when Tomcat shutdown.
What if there are intermittent network failures?

There are 2 GridKernalContextImpl found in the heap:

Class Name  |   Objects | Shallow
Heap | Retained Heap
   
---
.*GridKernalContextImpl.*   |   | 
|  
org.apache.ignite.internal.GridKernalContextImpl| 1 | 
248 |  >= 1,596,360
org.apache.ignite.internal.GridKernalContextImpl| 1 | 
248 |  >= 1,599,712
Total: 2 entries (23,992 filtered)  | 2 | 
496 |  
   
---

you might find the fully expanded histogram here:
https://drive.google.com/open?id=0BwY2dxDlRYhBR3pnYUM1cjZiNDg

thx, Tom




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p9551.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2016-12-13 Thread tysli2016
and we have the iptables open on ports 3-5, wondering if it's related
to the symtom.
because it shows `cachesOnDisconnect
org.apache.ignite.internal.processors.cache.GridCacheProcessor` holds the
objects.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p9520.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2016-12-13 Thread tysli2016
> How many caches do you have? 
there are only 2 caches

> Any idea why you have so many CacheConfiguration objects? Who holds
> references to them? 

the shortest paths to the accumulation point for java.util.HashMap @
0x8423c790 shows who holds the references, as you can see below, it's an
org.apache.ignite.internal.GridKernalContextImpl @ 0x818f6900, which is held
by some other ignite spi / internal objects

Class Name  

 
| Shallow Heap | Retained Heap 
   
-
 


 
|  |   
java.util.HashMap$Node[65536] @ 0xcd657f20  

 
|  262,160 |   227,306,744 
'- table java.util.HashMap @ 0x8423c790 

 
|   48 |   227,306,792 
   '- rmtCfgs
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @
0x81a78dc8
|   56 |   227,306,848 
  '- value java.util.HashMap$Node @ 0xe066c018  

 
|   32 |32 
 '- [11] java.util.HashMap$Node[16] @ 0xe066bf48

 
|   80 |   272 
'- table java.util.HashMap @ 0xe066bf08 

 
|   48 |   320 
   '- cachesOnDisconnect
org.apache.ignite.internal.processors.cache.GridCacheProcessor @ 0x81bacb08 

|   80 | 2,640 
  '- cacheProc
org.apache.ignite.internal.GridKernalContextImpl @ 0x818f6900   
  
|  248 | 1,596,328 
 |- ctx org.apache.ignite.internal.IgniteKernal @
0x818f68a0  
   
|   96 |   200 
 |  |- ignite
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi @ 0x819c19c0
   
|  240 | 2,000 
 |  |  |- this$0
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker
@ 0x81b97128  tcp-comm-worker-#1%null% Thread|  144 | 2,640 
 |  |  |- this$0
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$3 @ 0x81b97098  

|   16 |16 
 |  |  '- Total: 2 entries  

 
|  |   
 |  |- ignite
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi @ 0x81968500
   
|  256 |   104,176 
 |  |- grid
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance @ 0x81967d28  
 
|   88 |   344 
 |  '- Total: 3 entries 

 
|  |   
 |- ctx
org.apache.ignite.internal.processors.clock.GridClockServer @ 0x819680e0
 
|   32 |32 
 |- ctx
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager @
0x81968468   | 
128 |45,944 
 |- ctx
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor @
0x81b97c98   |  
32 |   512 
 |- ctx
org.apache.ignite.internal.processors.rest.GridRestProcessor @ 0x81bad1a0   
 
|   64 | 1,056 
 

Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2016-12-13 Thread tysli2016
Hi Val, sorry I m afraid I cannot provide the heap dump because it might
contain some sensitive data.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-tp9443p9507.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite 1.6.0 suspected memory leak from GridDhtPartitionMap2

2016-12-12 Thread tysli2016
shortest paths to the accumulation point

Class Name  

| Shallow Heap | Retained Heap



|  |  
java.util.concurrent.LinkedBlockingDeque @ 0x86cfb670   

|   40 |   893,476,328
'- futQ
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker
@ 0x86cf6e80|   56 |   893,476,400
   |- , target org.apache.ignite.thread.IgniteThread @
0x86cf6eb8  exchange-worker-#67%null% Thread  |  128 |34,856
   |- exchWorker
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager
@ 0x86cf6d28  |   96 |25,800
   |  '- exchMgr
org.apache.ignite.internal.processors.cache.GridCacheSharedContext @
0x818a3fd0 |   72 |   872
   '- Total: 2 entries  

|  |  





--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-GridDhtPartitionMap2-tp9504p9505.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Ignite 1.6.0 suspected memory leak from GridDhtPartitionMap2

2016-12-12 Thread tysli2016
OOME found on the same cluster as mentioned in
http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-td9443.html

Recap the setup here:
We have 2 machines (M1, M2) 
In M1 running a Ignite server (I1) and a Tomcat server (T1), which hosted 2
Ignite clients (C1a, C1b), 
similarly in M2 running a Ignite server (I2) and a Tomcat server (T2), which
hosted 2 Ignite clients (C2a, C2b).

After restarting I1, I2, T1, T2 on 2016-12-07, we got an OOME from T2 on
2016-12-10.
However the heap dump seems show a different pattern.

The histogram shows the memory were held by some
java.util.concurrent.LinkedBlockingDeque objects

Class Name  
  
|Objects |  Shallow Heap |Retained Heap
   
---

  
||   | 
java.util.concurrent.LinkedBlockingDeque
  
| 17 |   680 | >= 1,787,746,904
java.util.concurrent.LinkedBlockingDeque$Node   
  
| 50,954 | 1,222,896 | >= 1,426,229,816
java.util.ArrayList 
  
|254,999 | 6,119,976 | >= 1,339,620,328
java.lang.Object[]  
  
|257,817 |69,919,176 | >= 1,339,571,872
java.util.HashMap   
  
|127,003 | 6,096,144 | >= 1,300,721,992
java.util.HashMap$Node[]
  
| 83,427 |   163,804,808 | >= 1,297,201,592
java.util.HashMap$Node  
  
| 19,930,286 |   637,769,152 | >= 1,249,981,888
   
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker
  
|  2 |   112 |   >= 893,476,648
   
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker
  
|  2 |   112 |   >= 893,322,888
   
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture
 
| 21,656 | 3,464,960 |   >= 743,337,608
   
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture
 
| 21,656 | 3,464,960 |   >= 742,465,888
   
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$5|
   
768 |18,432 |   >= 450,735,512
   
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$5|
   
768 |18,432 |   >= 450,735,512
org.apache.ignite.internal.processors.cache.CacheAffinityChangeMessage  
  
|768 |30,720 |   >= 450,717,088
org.apache.ignite.internal.processors.cache.CacheAffinityChangeMessage  
  
|768 |30,720 |   >= 450,717,088
   
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsFullMessage

|768 |49,152 |   >= 450,594,256
   
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsFullMessage

|768 |49,152 |   >= 450,594,256
java.util.Collections$UnmodifiableMap   
  
| 15,084 |   482,688 |   >= 371,291,912
java.lang.String
  
|  3,517,909 |84,429,816 |   >= 353,946,616
   
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap
 
|  4,613 |   295,232 |   >= 307,249,264
   
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap
 
|  4,613 |   295,232 |   >= 307,249,264
   
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionMap2

|  9,226 |   369,040 |   >= 306,363,640
   
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionMap2

|  9,226 |   369,040 |   >= 306,363,640
char[]  
  
|  3,506,511 |   276,762,784 |   >= 276,762,784
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode   
  
|

Re: Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2016-12-12 Thread tysli2016
Thx Val for your reply, let me check if I can give you the .hprof file later.

This is the dominator tree showing all 23,850 items from
java.util.HashMap$Node[65535] @ 0xcd657f20, It's too many of them so I just
expand some of them and they are all holding CacheConfiguration objects.
https://drive.google.com/file/d/0BwY2dxDlRYhBWXR1NHdWelMtY0E/view

We actually found that there are 88,421 + 88418 CacheConfiguration objects
holding 677,386,632 + 677,363,648 retained heap


Class Name  
|Objects |  Shallow Heap |Retained Heap
   
-

||   | 
java.util.HashMap$Node[]
| 33,252 |10,886,584 | >= 1,446,877,296
java.util.HashMap   
|205,140 | 9,846,720 | >= 1,446,860,856
java.util.HashMap$Node  
|  1,116,555 |35,729,760 | >= 1,441,725,448
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor  
|  6 |   336 |   >= 681,000,984
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor  
|  6 |   336 |   >= 680,977,904
org.apache.ignite.configuration.CacheConfiguration  
| 88,421 |25,465,248 |   >= 677,386,632
org.apache.ignite.configuration.CacheConfiguration  
| 88,418 |25,464,384 |   >= 677,363,648
   
org.apache.ignite.internal.processors.cache.GridCacheDefaultAffinityKeyMapper|  
  
88,419 | 2,122,056 |   >= 627,421,920
   
org.apache.ignite.internal.processors.cache.GridCacheDefaultAffinityKeyMapper|  
  
88,416 | 2,121,984 |   >= 627,400,632
org.apache.ignite.internal.util.GridReflectionCache 
| 88,426 | 2,829,632 |   >= 625,349,432
org.apache.ignite.internal.util.GridReflectionCache 
| 88,423 | 2,829,536 |   >= 625,328,216
org.apache.ignite.internal.util.GridBoundedConcurrentLinkedHashMap  
|176,854 |14,148,320 |   >= 620,153,264
org.apache.ignite.internal.util.GridBoundedConcurrentLinkedHashMap  
|176,860 |14,148,800 |   >= 620,098,736
org.jsr166.ConcurrentLinkedHashMap$Segment[]
|176,862 |14,151,184 |   >= 581,610,816
org.jsr166.ConcurrentLinkedHashMap$Segment[]
|176,868 |14,151,664 |   >= 581,554,984
org.jsr166.ConcurrentLinkedHashMap$Segment  
|  2,830,348 |   158,499,488 |   >= 567,459,640
org.jsr166.ConcurrentLinkedHashMap$Segment  
|  2,830,444 |   158,504,864 |   >= 567,403,328
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync   
|  5,661,690 |   271,761,120 |   >= 362,348,560
org.jsr166.ConcurrentLinkedHashMap$HashEntry[]  
|  2,830,444 |   137,029,536 |   >= 137,037,776
org.jsr166.ConcurrentLinkedHashMap$HashEntry[]  
|  2,830,348 |   137,024,840 |   >= 137,033,280
java.lang.String
|979,752 |23,514,048 |>= 92,539,136
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock  
|  5,661,691 |90,587,056 |>= 90,587,064
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock 
|  5,661,691 |90,587,056 |>= 90,587,064
   
java.util.concurrent.locks.ReentrantReadWriteLock$Sync$ThreadLocalHoldCounter| 
5,661,691 |90,587,056 |>= 90,587,056
char[]  
|984,330 |80,884,824 |>= 80,884,824
org.apache.catalina.loader.WebappClassLoader
|  8 | 1,088 |>= 58,022,056
java.util.concurrent.ConcurrentSkipListMap$HeadIndex
|171 | 5,472 |>= 52,983,616
java.util.concurrent.ConcurrentSkipListMap  
| 58 | 2,784 |>= 52,871,176
java.lang.Class 
| 24,003 |   213,864 |>= 48,210,712
java.util.HashSet   
|187,879 | 3,006,064 |>= 46,999,120
java.util.concurrent.ConcurrentSkipListMap$Node 
| 10,534 |   252,816 |>= 37,540,176
java.util.Collections$UnmodifiableSet   
|  2,857 |45,712 |>= 33,773,632

Ignite 1.6.0 suspected memory leak from DynamicCacheDescriptor

2016-12-07 Thread tysli2016
We have 2 machines (M1, M2)
In M1 running a Ignite server (I1) and a Tomcat server (T1), which hosted 2
Ignite clients (C1a, C1b),
similarly in M2 running a Ignite server (I2) and a Tomcat server (T2), which
hosted 2 Ignite clients (C2a, C2b).

OutOfMemoryError were found in both T1 and T2 yesterday, we have a heap dump
from T1, but failed to get a heap dump from T2.

After the error, by using visor we found only T1, C1a, and C1b were still in
the cluster, all Ignite nodes on M2 were disconnected.

We've lost the Tomcat server log due to some reason.

>From the heap dump of T1 we found 6
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor holding
total 1.2GB memory:

Class Name  
 
| Shallow Heap | Retained Heap | Percentage
   
--

 
|  |   |   
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @
0x81a78dc8   |   56 |   227,306,848 | 14.05%
'- java.util.HashMap @ 0x8423c790   
 
|   48 |   227,306,792 | 14.05%
   '- java.util.HashMap$Node[65536] @ 0xcd657f20
 
|  262,160 |   227,306,744 | 14.05%
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @
0x8449bed0   |   56 |   227,299,144 | 14.05%
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @
0x81ae5c10   |   56 |   227,071,080 | 14.04%
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @
0x8449f158   |   56 |   227,063,384 | 14.04%
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @
0x81adfa38   |   56 |   226,599,544 | 14.01%
org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor @
0x8449f090   |   56 |   226,591,864 | 14.01%
org.apache.catalina.loader.WebappClassLoader @ 0x80167cd8   
 
|  136 |29,521,360 |  1.83%
org.apache.catalina.loader.WebappClassLoader @ 0x8363a078   
 
|  136 |28,471,928 |  1.76%
com.cityline.cps.admin.api.controller.LoginController @ 0x819c54d0  
 
|   48 |21,042,952 |  1.30%
org.apache.ignite.spi.discovery.tcp.ClientImpl @ 0x84477b20 
 
|   96 |14,456,360 |  0.89%
org.apache.ignite.internal.processors.cache.GridCacheContext @
0xade21488 |  152 |10,462,696 |  0.65%
org.apache.ignite.internal.processors.cache.GridCacheContext @
0xae8ec420 |  152 | 9,800,264 |  0.61%
org.apache.ignite.internal.processors.cache.GridCacheContext @
0x86aa66a0 |  152 | 9,677,848 |  0.60%
org.apache.ignite.internal.processors.cache.GridCacheContext @
0x86ac6b58 |  152 | 9,634,288 |  0.60%
org.apache.ignite.internal.processors.cache.GridCacheContext @
0x83013720 |  152 | 6,075,032 |  0.38%
org.apache.ignite.internal.processors.cache.GridCacheContext @
0x86008f08 |  152 | 5,967,304 |  0.37%
class java.beans.ThreadGroupContext @ 0x81070c30 System Class   
 
|8 | 2,634,888 |  0.16%
org.hibernate.internal.SessionFactoryImpl @ 0x811ec9f0  
 
|  136 | 2,406,176 |  0.15%
org.hibernate.internal.SessionFactoryImpl @ 0x85de4c98  
 
|  136 | 2,082,128 |  0.13%
org.hibernate.internal.SessionFactoryImpl @ 0x83b30890  
 
|  136 | 2,036,976 |  0.13%
org.apache.ignite.internal.GridKernalContextImpl @ 0x841d4090   
 
|  248 | 1,599,680 |  0.10%
org.apache.ignite.internal.GridKernalContextImpl @ 0x818f6900   
 
|  248 | 1,596,328 |  0.10%
java.net.URLClassLoader @ 0x800fe890
 
|   80 | 1,451,640 |  0.09%
org.apache.catalina.webresources.JarResourceSet @ 0x8019d4d0
 
|   72 | 1,302,400 |  0.08%
org.apache.ignite.internal.processors.cache.GridCacheSharedContext @
0x8427bbc0   |   72 | 1,242,936 |  0.08%
org.apache.ignite.internal.