[ https://issues.apache.org/jira/browse/HBASE-14420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985983#comment-14985983 ]
stack edited comment on HBASE-14420 at 11/2/15 8:45 PM: -------------------------------------------------------- Here are the longest running tests: {code} $ grep -h "<testcase" `find . -iname "TEST-*.xml"` | sed 's/<testcase name="\(.*\)" classname="\(.*\)" time="\(.*\)".*/\3\t\2 \1/' |sort -rn |head -100 177.358 org.apache.hadoop.hbase.client.TestReplicasClient testSmallScanWithReplicas 158.826 org.apache.hadoop.hbase.regionserver.TestRemoveRegionMetrics testMoveRegion 146.995 org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd testEndToEnd 106.28 org.apache.hadoop.hbase.regionserver.wal.TestLogRolling testLogRolling 103.126 org.apache.hadoop.hbase.mapred.TestTableSnapshotInputFormat testWithMapReduceMultiRegion 100.889 org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat testMRIncrementalLoadWithSplit 97.4 org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat testExcludeMinorCompaction ... {code} Tests with most threads: {code} [stack@c2021 hbase.git]$ grep -h "after: " `find . -iname "*-output.txt"` | sed 's/.*: after: \(.*\) Thread=\([0-9]*\).*/\2\t\1/'|sort -rn|head -100 1010 replication.TestReplicationKillSlaveRS#killOneSlaveRS 964 replication.multiwal.TestReplicationKillMasterRSCompressedWithMultipleWAL#killOneMasterRS 942 replication.TestReplicationKillMasterRS#killOneMasterRS 930 replication.TestReplicationKillMasterRSCompressed#killOneMasterRS 834 snapshot.TestSecureExportSnapshot#testExportRetry 834 snapshot.TestSecureExportSnapshot#testExportFailure 832 snapshot.TestSecureExportSnapshot#testExportFileSystemStateWithSkipTmp 830 snapshot.TestSecureExportSnapshot#testSnapshotWithRefsExportFileSystemState 830 snapshot.TestExportSnapshot#testExportRetry 826 snapshot.TestSecureExportSnapshot#testEmptyExportFileSystemState 826 snapshot.TestExportSnapshot#testExportFileSystemStateWithSkipTmp 826 snapshot.TestExportSnapshot#testExportFailure 823 snapshot.TestExportSnapshot#testSnapshotWithRefsExportFileSystemState 820 snapshot.TestSecureExportSnapshot#testConsecutiveExports 818 snapshot.TestExportSnapshot#testEmptyExportFileSystemState 818 replication.TestReplicationSmallTests#testSmallBatch 815 snapshot.TestExportSnapshot#testConsecutiveExports 811 snapshot.TestSecureExportSnapshot#testExportFileSystemState 800 snapshot.TestExportSnapshot#testExportFileSystemState 800 mapreduce.TestCopyTable#testCopyTableWithBulkload 798 snapshot.TestSecureExportSnapshot#testExportWithTargetName 791 snapshot.TestExportSnapshot#testExportWithTargetName 788 mapreduce.TestCopyTable#testMainMethod 787 snapshot.TestSecureExportSnapshot#testBalanceSplit 787 mapreduce.TestCopyTable#testStartStopRow 785 mapred.TestMultiTableSnapshotInputFormat#testScanEmptyToAPP 784 mapreduce.TestMultiTableSnapshotInputFormat#testScanEmptyToAPP 783 mapreduce.TestCopyTable#testRenameFamily 781 snapshot.TestExportSnapshot#testBalanceSplit 779 mapreduce.TestMultiTableSnapshotInputFormat#testScanEmptyToEmpty 778 replication.TestReplicationSmallTests#testReplicationStatus 776 mapreduce.TestTableInputFormatScan1#testGetSplits 774 replication.TestReplicationSmallTests#testSimplePutDelete 774 mapreduce.TestMultiTableSnapshotInputFormat#testScanYZYToEmpty 773 mapreduce.TestTableInputFormatScan2#testScanYZYToEmpty 773 mapreduce.TestMultiTableInputFormat#testScanEmptyToAPP 773 mapred.TestMultiTableSnapshotInputFormat#testScanYZYToEmpty 772 mapreduce.TestTableInputFormatScan2#testScanYYXToEmpty 772 mapreduce.TestMultiTableSnapshotInputFormat#testScanOBBToOPP 770 mapreduce.TestMultiTableInputFormat#testScanEmptyToEmpty 770 mapreduce.TestCopyTable#testCopyTable 769 mapreduce.TestTableInputFormatScan1#testScanEmptyToAPP 768 mapreduce.TestTableInputFormatScan2#testScanYYYToEmpty 767 mapreduce.TestTableInputFormatScan2#testScanOPPToEmpty 767 mapreduce.TestMultiTableInputFormat#testScanYZYToEmpty 767 mapred.TestMultiTableSnapshotInputFormat#testScanOBBToOPP 767 mapred.TestMultiTableSnapshotInputFormat#testScanEmptyToEmpty 765 mapreduce.TestTableInputFormatScan1#testScanEmptyToEmpty 765 mapreduce.TestTableInputFormatScan1#testGetSplitsPoint 764 mapreduce.TestTableInputFormatScan2#testScanOBBToOPP 764 mapreduce.TestTableInputFormatScan2#testScanFromConfiguration 763 mapreduce.TestTableInputFormatScan2#testScanOBBToQPP 762 mapreduce.TestTableInputFormatScan1#testScanEmptyToOPP 762 mapreduce.TestMultiTableInputFormat#testScanOBBToOPP 761 mapreduce.TestTableInputFormatScan1#testScanEmptyToBBB 759 replication.TestReplicationSmallTests#testLoading 757 mapreduce.TestTableInputFormatScan1#testScanEmptyToBBA 720 regionserver.TestRegionFavoredNodes#testFavoredNodes 717 replication.TestReplicationSmallTests#testCompactionWALEdits ... {code} Tests using lots of file descriptors: {code} [stack@c2021 hbase.git]$ grep -h "after: " `find . -iname "*-output.txt"` | sed 's/.*: after: \([^ ]*\).*OpenFileDescriptor=\([0-9]*\).*/\2\t\1/'|grep -v 'after: '|sort -rn|head -100 1010 ipc.TestAsyncIPC#testRTEDuringAsyncConnectionSetup[3] 1010 ipc.TestAsyncIPC#testRpcScheduler[2] 978 ipc.TestAsyncIPC#testCompressCellBlock[2] 947 mapred.TestMultiTableSnapshotInputFormat#testScanYZYToEmpty 946 ipc.TestAsyncIPC#testNoCodec[2] 943 mapreduce.TestMultiTableSnapshotInputFormat#testScanEmptyToAPP 943 mapred.TestMultiTableSnapshotInputFormat#testScanEmptyToAPP 942 mapreduce.TestMultiTableSnapshotInputFormat#testScanYZYToEmpty 935 snapshot.TestSecureExportSnapshot#testExportFailure 933 mapreduce.TestCopyTable#testRenameFamily 931 client.replication.TestReplicationAdminWithClusters#testEnableReplicationWhenSlaveClusterDoesntHaveTable 926 mapreduce.TestTableInputFormatScan2#testScanOPPToEmpty 926 mapreduce.TestMultiTableInputFormat#testScanYZYToEmpty 923 snapshot.TestExportSnapshot#testExportFailure 921 mapreduce.TestCopyTable#testMainMethod 921 ipc.TestAsyncIPC#testAsyncConnectionSetup[3] 920 mapreduce.TestTableInputFormatScan1#testScanEmptyToBBA 919 mapred.TestMultiTableSnapshotInputFormat#testScanEmptyToEmpty 917 mapreduce.TestMultiTableSnapshotInputFormat#testScanOBBToOPP 916 client.TestMultiParallel#testBatchWithMixedActions 914 mapred.TestMultiTableSnapshotInputFormat#testScanOBBToOPP 912 client.TestMultiParallel#testNonceCollision 909 client.TestMultiParallel#testBatchWithDelete 904 mapreduce.TestMultiTableInputFormat#testScanOBBToOPP 904 client.TestMultiParallel#testHTableDeleteWithList 904 client.TestMultiParallel#testBadFam 903 mapreduce.TestTableInputFormatScan2#testScanOBBToQPP 903 mapreduce.TestTableInputFormatScan1#testGetSplitsPoint 902 client.TestMultiParallel#testFlushCommitsNoAbort 900 mapreduce.TestTableInputFormatScan2#testScanOBBToOPP 893 client.TestMultiParallel#testBatchWithManyColsInOneRowGetAndPut 891 master.TestRegionPlacement2#testFavoredNodesPresentForRoundRobinAssignment 891 master.TestRegionPlacement2#testFavoredNodesPresentForRandomAssignment 880 client.TestFromClientSide#testJiraTest861 878 client.TestFromClientSideWithCoprocessor#testJiraTest861 861 client.replication.TestReplicationAdminWithClusters#testEnableReplicationForNonExistingTable 860 replication.TestReplicationSmallTests#testSmallBatch 860 replication.TestReplicationSmallTests#testSimplePutDelete 860 replication.TestReplicationSmallTests#testDisableEnable 860 replication.TestReplicationSmallTests#testCompactionWALEdits 855 replication.TestReplicationSmallTests#testVerifyRepJob 855 client.TestFromClientSide#testNullWithReverseScan 854 client.TestFromClientSideWithCoprocessor#testSimpleMissing 852 client.TestFromClientSide#testJiraTest867 851 client.TestFromClientSide#testMultiRowMutation 850 client.TestFromClientSideWithCoprocessor#testNullWithReverseScan 850 client.TestFromClientSideWithCoprocessor#testJiraTest867 849 client.TestFromClientSideWithCoprocessor#testMultiRowMutation 849 client.TestFromClientSide#testSimpleMissing 835 client.TestMultiParallel#testBatchWithIncrementAndAppend 835 client.TestMultiParallel#testBatchWithGet 834 replication.TestReplicationSmallTests#testReplicationStatus 834 client.TestFromClientSideWithCoprocessor#testGetStartEndKeysWithRegionReplicas 832 client.TestFromClientSide#testGetStartEndKeysWithRegionReplicas 830 client.TestFromClientSideWithCoprocessor#testFilterAllRecords 828 client.TestFromClientSideWithCoprocessor#testScan_NullQualifier .... {code} was (Author: stack): Here are the longest running tests: {code}$ grep -h "<testcase" `find . -iname "TEST-*.xml"` | sed 's/<testcase name="\(.*\)" classname="\(.*\)" time="\(.*\)".*/\3\t\2 \1/' |sort -rn |head -100 177.358 org.apache.hadoop.hbase.client.TestReplicasClient testSmallScanWithReplicas 158.826 org.apache.hadoop.hbase.regionserver.TestRemoveRegionMetrics testMoveRegion 146.995 org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd testEndToEnd 106.28 org.apache.hadoop.hbase.regionserver.wal.TestLogRolling testLogRolling 103.126 org.apache.hadoop.hbase.mapred.TestTableSnapshotInputFormat testWithMapReduceMultiRegion 100.889 org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat testMRIncrementalLoadWithSplit 97.4 org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat testExcludeMinorCompaction ... Tests with most threads: {code} [stack@c2021 hbase.git]$ grep -h "after: " `find . -iname "*-output.txt"` | sed 's/.*: after: \(.*\) Thread=\([0-9]*\).*/\2\t\1/'|sort -rn|head -100 1010 replication.TestReplicationKillSlaveRS#killOneSlaveRS 964 replication.multiwal.TestReplicationKillMasterRSCompressedWithMultipleWAL#killOneMasterRS 942 replication.TestReplicationKillMasterRS#killOneMasterRS 930 replication.TestReplicationKillMasterRSCompressed#killOneMasterRS 834 snapshot.TestSecureExportSnapshot#testExportRetry 834 snapshot.TestSecureExportSnapshot#testExportFailure 832 snapshot.TestSecureExportSnapshot#testExportFileSystemStateWithSkipTmp 830 snapshot.TestSecureExportSnapshot#testSnapshotWithRefsExportFileSystemState 830 snapshot.TestExportSnapshot#testExportRetry 826 snapshot.TestSecureExportSnapshot#testEmptyExportFileSystemState 826 snapshot.TestExportSnapshot#testExportFileSystemStateWithSkipTmp 826 snapshot.TestExportSnapshot#testExportFailure 823 snapshot.TestExportSnapshot#testSnapshotWithRefsExportFileSystemState 820 snapshot.TestSecureExportSnapshot#testConsecutiveExports 818 snapshot.TestExportSnapshot#testEmptyExportFileSystemState 818 replication.TestReplicationSmallTests#testSmallBatch 815 snapshot.TestExportSnapshot#testConsecutiveExports 811 snapshot.TestSecureExportSnapshot#testExportFileSystemState 800 snapshot.TestExportSnapshot#testExportFileSystemState 800 mapreduce.TestCopyTable#testCopyTableWithBulkload 798 snapshot.TestSecureExportSnapshot#testExportWithTargetName 791 snapshot.TestExportSnapshot#testExportWithTargetName 788 mapreduce.TestCopyTable#testMainMethod 787 snapshot.TestSecureExportSnapshot#testBalanceSplit 787 mapreduce.TestCopyTable#testStartStopRow 785 mapred.TestMultiTableSnapshotInputFormat#testScanEmptyToAPP 784 mapreduce.TestMultiTableSnapshotInputFormat#testScanEmptyToAPP 783 mapreduce.TestCopyTable#testRenameFamily 781 snapshot.TestExportSnapshot#testBalanceSplit 779 mapreduce.TestMultiTableSnapshotInputFormat#testScanEmptyToEmpty 778 replication.TestReplicationSmallTests#testReplicationStatus 776 mapreduce.TestTableInputFormatScan1#testGetSplits 774 replication.TestReplicationSmallTests#testSimplePutDelete 774 mapreduce.TestMultiTableSnapshotInputFormat#testScanYZYToEmpty 773 mapreduce.TestTableInputFormatScan2#testScanYZYToEmpty 773 mapreduce.TestMultiTableInputFormat#testScanEmptyToAPP 773 mapred.TestMultiTableSnapshotInputFormat#testScanYZYToEmpty 772 mapreduce.TestTableInputFormatScan2#testScanYYXToEmpty 772 mapreduce.TestMultiTableSnapshotInputFormat#testScanOBBToOPP 770 mapreduce.TestMultiTableInputFormat#testScanEmptyToEmpty 770 mapreduce.TestCopyTable#testCopyTable 769 mapreduce.TestTableInputFormatScan1#testScanEmptyToAPP 768 mapreduce.TestTableInputFormatScan2#testScanYYYToEmpty 767 mapreduce.TestTableInputFormatScan2#testScanOPPToEmpty 767 mapreduce.TestMultiTableInputFormat#testScanYZYToEmpty 767 mapred.TestMultiTableSnapshotInputFormat#testScanOBBToOPP 767 mapred.TestMultiTableSnapshotInputFormat#testScanEmptyToEmpty 765 mapreduce.TestTableInputFormatScan1#testScanEmptyToEmpty 765 mapreduce.TestTableInputFormatScan1#testGetSplitsPoint 764 mapreduce.TestTableInputFormatScan2#testScanOBBToOPP 764 mapreduce.TestTableInputFormatScan2#testScanFromConfiguration 763 mapreduce.TestTableInputFormatScan2#testScanOBBToQPP 762 mapreduce.TestTableInputFormatScan1#testScanEmptyToOPP 762 mapreduce.TestMultiTableInputFormat#testScanOBBToOPP 761 mapreduce.TestTableInputFormatScan1#testScanEmptyToBBB 759 replication.TestReplicationSmallTests#testLoading 757 mapreduce.TestTableInputFormatScan1#testScanEmptyToBBA 720 regionserver.TestRegionFavoredNodes#testFavoredNodes 717 replication.TestReplicationSmallTests#testCompactionWALEdits ... {code} Tests using lots of file descriptors: {code} [stack@c2021 hbase.git]$ grep -h "after: " `find . -iname "*-output.txt"` | sed 's/.*: after: \([^ ]*\).*OpenFileDescriptor=\([0-9]*\).*/\2\t\1/'|grep -v 'after: '|sort -rn|head -100 1010 ipc.TestAsyncIPC#testRTEDuringAsyncConnectionSetup[3] 1010 ipc.TestAsyncIPC#testRpcScheduler[2] 978 ipc.TestAsyncIPC#testCompressCellBlock[2] 947 mapred.TestMultiTableSnapshotInputFormat#testScanYZYToEmpty 946 ipc.TestAsyncIPC#testNoCodec[2] 943 mapreduce.TestMultiTableSnapshotInputFormat#testScanEmptyToAPP 943 mapred.TestMultiTableSnapshotInputFormat#testScanEmptyToAPP 942 mapreduce.TestMultiTableSnapshotInputFormat#testScanYZYToEmpty 935 snapshot.TestSecureExportSnapshot#testExportFailure 933 mapreduce.TestCopyTable#testRenameFamily 931 client.replication.TestReplicationAdminWithClusters#testEnableReplicationWhenSlaveClusterDoesntHaveTable 926 mapreduce.TestTableInputFormatScan2#testScanOPPToEmpty 926 mapreduce.TestMultiTableInputFormat#testScanYZYToEmpty 923 snapshot.TestExportSnapshot#testExportFailure 921 mapreduce.TestCopyTable#testMainMethod 921 ipc.TestAsyncIPC#testAsyncConnectionSetup[3] 920 mapreduce.TestTableInputFormatScan1#testScanEmptyToBBA 919 mapred.TestMultiTableSnapshotInputFormat#testScanEmptyToEmpty 917 mapreduce.TestMultiTableSnapshotInputFormat#testScanOBBToOPP 916 client.TestMultiParallel#testBatchWithMixedActions 914 mapred.TestMultiTableSnapshotInputFormat#testScanOBBToOPP 912 client.TestMultiParallel#testNonceCollision 909 client.TestMultiParallel#testBatchWithDelete 904 mapreduce.TestMultiTableInputFormat#testScanOBBToOPP 904 client.TestMultiParallel#testHTableDeleteWithList 904 client.TestMultiParallel#testBadFam 903 mapreduce.TestTableInputFormatScan2#testScanOBBToQPP 903 mapreduce.TestTableInputFormatScan1#testGetSplitsPoint 902 client.TestMultiParallel#testFlushCommitsNoAbort 900 mapreduce.TestTableInputFormatScan2#testScanOBBToOPP 893 client.TestMultiParallel#testBatchWithManyColsInOneRowGetAndPut 891 master.TestRegionPlacement2#testFavoredNodesPresentForRoundRobinAssignment 891 master.TestRegionPlacement2#testFavoredNodesPresentForRandomAssignment 880 client.TestFromClientSide#testJiraTest861 878 client.TestFromClientSideWithCoprocessor#testJiraTest861 861 client.replication.TestReplicationAdminWithClusters#testEnableReplicationForNonExistingTable 860 replication.TestReplicationSmallTests#testSmallBatch 860 replication.TestReplicationSmallTests#testSimplePutDelete 860 replication.TestReplicationSmallTests#testDisableEnable 860 replication.TestReplicationSmallTests#testCompactionWALEdits 855 replication.TestReplicationSmallTests#testVerifyRepJob 855 client.TestFromClientSide#testNullWithReverseScan 854 client.TestFromClientSideWithCoprocessor#testSimpleMissing 852 client.TestFromClientSide#testJiraTest867 851 client.TestFromClientSide#testMultiRowMutation 850 client.TestFromClientSideWithCoprocessor#testNullWithReverseScan 850 client.TestFromClientSideWithCoprocessor#testJiraTest867 849 client.TestFromClientSideWithCoprocessor#testMultiRowMutation 849 client.TestFromClientSide#testSimpleMissing 835 client.TestMultiParallel#testBatchWithIncrementAndAppend 835 client.TestMultiParallel#testBatchWithGet 834 replication.TestReplicationSmallTests#testReplicationStatus 834 client.TestFromClientSideWithCoprocessor#testGetStartEndKeysWithRegionReplicas 832 client.TestFromClientSide#testGetStartEndKeysWithRegionReplicas 830 client.TestFromClientSideWithCoprocessor#testFilterAllRecords 828 client.TestFromClientSideWithCoprocessor#testScan_NullQualifier .... {code} > Zombie Stomping Session > ----------------------- > > Key: HBASE-14420 > URL: https://issues.apache.org/jira/browse/HBASE-14420 > Project: HBase > Issue Type: Umbrella > Components: test > Reporter: stack > Assignee: stack > Priority: Critical > Attachments: hangers.txt, none_fix (1).txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt > > > Patch build are now failing most of the time because we are dropping zombies. > I confirm we are doing this on non-apache build boxes too. > Left-over zombies consume resources on build boxes (OOME cannot create native > threads). Having to do multiple test runs in the hope that we can get a > non-zombie-making build or making (arbitrary) rulings that the zombies are > 'not related' is a productivity sink. And so on... > This is an umbrella issue for a zombie stomping session that started earlier > this week. Will hang sub-issues of this one. Am running builds back-to-back > on little cluster to turn out the monsters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)