[ 
https://issues.apache.org/jira/browse/HBASE-14262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705530#comment-14705530
 ] 

stack commented on HBASE-14262:
-------------------------------

bq. So I suggest we verify the jstack result first to see if we really create 
too many threads? Maybe we are reaching the precipice, so sometimes it fails 
and sometimes not...

Looking. In TestDistributedLogReplay (I am looking at this test first because 
INFRA-10150 flags it as a zombie), we start 6 servers so unit test is running 
with 450+ threads. Let me dig in more (seems to be a shutdown issue here too).

> Big Trunk unit tests failing with "OutOfMemoryError: unable to create new 
> native thread"
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-14262
>                 URL: https://issues.apache.org/jira/browse/HBASE-14262
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: stack
>            Assignee: stack
>
> The bit unit tests are coming in with OOME, can't create native threads.
> I was also getting the OOME locally running on MBP. git bisect got me to 
> HBASE-13065, where we upped the test heap for TestDistributedLogSplitting 
> back in feb. Around the time that this went in, we had similar OOME issues 
> but then it was because we were doing 32bit JVMs. It does not seem to be the 
> case here.
> A recent run failed all the below and most are OOME:
> {code}
>      {color:red}-1 core tests{color}.  The patch failed these unit tests:
>                        
> org.apache.hadoop.hbase.replication.TestReplicationEndpoint
>                   
> org.apache.hadoop.hbase.replication.TestPerTableCFReplication
>                   
> org.apache.hadoop.hbase.wal.TestBoundedRegionGroupingProvider
>                   
> org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed
>                   
> org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpointNoMaster
>                   
> org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS
>                   
> org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint
>                   org.apache.hadoop.hbase.replication.TestMasterReplication
>                   org.apache.hadoop.hbase.mapred.TestTableMapReduce
>                   
> org.apache.hadoop.hbase.regionserver.TestRegionMergeTransactionOnCluster
>                   org.apache.hadoop.hbase.regionserver.TestRegionFavoredNodes
>                   
> org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
>                   org.apache.hadoop.hbase.zookeeper.TestZKLeaderManager
>                   
> org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
>                   org.apache.hadoop.hbase.TestGlobalMemStoreSize
>                   org.apache.hadoop.hbase.wal.TestWALFiltering
>                   
> org.apache.hadoop.hbase.replication.TestReplicationSmallTests
>                   
> org.apache.hadoop.hbase.replication.TestReplicationSyncUpTool
>                   org.apache.hadoop.hbase.replication.TestReplicationWithTags
>                   
> org.apache.hadoop.hbase.master.procedure.TestTruncateTableProcedure
>                   
> org.apache.hadoop.hbase.replication.TestReplicationChangingPeerRegionservers
>                   
> org.apache.hadoop.hbase.wal.TestDefaultWALProviderWithHLogKey
>                   
> org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush
>                   
> org.apache.hadoop.hbase.snapshot.TestMobRestoreFlushSnapshotFromClient
>                   
> org.apache.hadoop.hbase.master.procedure.TestCreateTableProcedure
>                   org.apache.hadoop.hbase.wal.TestWALFactory
>                   
> org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure
>                   
> org.apache.hadoop.hbase.replication.TestReplicationDisableInactivePeer
>                   
> org.apache.hadoop.hbase.master.procedure.TestAddColumnFamilyProcedure
>                   
> org.apache.hadoop.hbase.mapred.TestMultiTableSnapshotInputFormat
>                   
> org.apache.hadoop.hbase.master.procedure.TestEnableTableProcedure
>                   
> org.apache.hadoop.hbase.master.TestMasterFailoverBalancerPersistence
>                   org.apache.hadoop.hbase.TestStochasticBalancerJmxMetrics
>      {color:red}-1 core zombie tests{color}.  There are 16 zombie test(s):    
>   at 
> org.apache.hadoop.hbase.security.visibility.TestVisibilityLabels.testVisibilityLabelsWithComplexLabels(TestVisibilityLabels.java:216)
>         at 
> org.apache.hadoop.hbase.mapred.TestTableInputFormat.testTableRecordReaderScannerFail(TestTableInputFormat.java:281)
>         at 
> org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.testMultiSlaveReplication(TestMultiSlaveReplication.java:129)
>         at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileScanning(TestHRegion.java:3799)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to