[ https://issues.apache.org/jira/browse/PHOENIX-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138621#comment-16138621 ]
Samarth Jain commented on PHOENIX-4110: --------------------------------------- Looks like this change didn't help. I ran the suite locally and monitored the java heap of the forked processes. And I saw that even though we are shutting down the mini-cluster more often, the heap memory keeps growing as tests progress. So I took a heap dump of one of the JVMs and ran a profiler. I saw that instances of three objects - MetricsSystemImpl, HRegion and Configuration are occupying most of the memory (93%) {code} One instance of "org.apache.hadoop.metrics2.impl.MetricsSystemImpl" loaded by "sun.misc.Launcher$AppClassLoader @ 0x75a025670" occupies 201,306,192 (10.95%) bytes. 717 instances of "org.apache.hadoop.hbase.regionserver.HRegion", loaded by "sun.misc.Launcher$AppClassLoader @ 0x75a025670" occupy 1,218,750,256 (66.30%) bytes. 2,040 instances of "org.apache.hadoop.conf.Configuration", loaded by "sun.misc.Launcher$AppClassLoader @ 0x75a025670" occupy 287,352,096 (15.63%) bytes. {code} - MetricsSystemImpl is a singleton i.e. supposed to be created once. It doesn't get shutdown when the mini cluster is shutdown. An option would be for us to shut it down ourselves when we are shutting down the mini cluster. - The bulk of the heap is occupied by HRegion objects. It looks like in certain cases when region server is being stopped, not all the regions are getting closed. On inspecting the path of strong references to HRegion, it seems to be coming from thread objects of the class JVMClusterUtil$RegionServerThread. Looking at the hbase code I see that that when region server starts, it registers it's thread to the jvm's shutdown hook mechanism. This reference sticks around even though the thread itself has terminated. So when the regions are not closed, this thread object keeps the HRegions in memory resulting in memory leak. I will file an HBase JIRA for this. Note, this was for 0.98. I need to try it out with 1.3 also. Worst case, I think we may have to resort to halting the JVM after every test. Or maybe come up with a mechanism (with some help of surefire plugin) to do the JVM halt after every few runs. Or maybe just call System.gc() and hope for the best :) Will keep digging. > ParallelRunListener should monitor number of tables and not number of tests > --------------------------------------------------------------------------- > > Key: PHOENIX-4110 > URL: https://issues.apache.org/jira/browse/PHOENIX-4110 > Project: Phoenix > Issue Type: Bug > Reporter: Samarth Jain > Assignee: Samarth Jain > Attachments: PHOENIX-4110.patch > > > ParallelRunListener today monitors the number of tests that have been run to > determine when mini cluster should be shut down. This helps prevent our test > JVM forks running in OOM. A better heuristic would be to instead check the > number of tables that were created by tests. This way when a particular test > class has created lots of tables, we can shut down the mini cluster sooner. -- This message was sent by Atlassian JIRA (v6.4.14#64029)