[ https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015330#comment-15015330 ]
Anoop Sam John edited comment on HBASE-14735 at 11/20/15 7:04 AM: ------------------------------------------------------------------ {code} public synchronized boolean requestSplit(final Region r) { 239 // don't split regions that are blocking 240 if (shouldSplitRegion() && ((HRegion)r).getCompactPriority() >= Store.PRIORITY_USER) { {code} Here what we says in the comment looks reasonable no? And the patch changes that And what was the blocking files count in ur cluster? See this code {code} public int getStoreCompactionPriority() { int priority = blockingFileCount - storefiles.size(); return (priority == HStore.PRIORITY_USER) ? priority + 1 : priority; } {code} When the current store files count is equal to blockingFilesCount, we will get a priority = 0 only which looks to be wrong See this {code} // degenerate case: blocked regions require recursive enqueues 536 if (store.getCompactPriority() <= 0) { 537 requestSystemCompaction(region, store, "Recursive enqueue"); 538 } else { 539 // see if the compaction has caused us to exceed max region size 540 requestSplit(region); 541 } {code} Comment says for blocked regions we will need recursive enque but check is doing priority <=0 !!! What is wrong overall? was (Author: anoop.hbase): {code| public synchronized boolean requestSplit(final Region r) { 239 // don't split regions that are blocking 240 if (shouldSplitRegion() && ((HRegion)r).getCompactPriority() >= Store.PRIORITY_USER) { {code} Here what we says in the comment looks reasonable no? And the patch changes that And what was the blocking files count in ur cluster? See this code {code} public int getStoreCompactionPriority() { int priority = blockingFileCount - storefiles.size(); return (priority == HStore.PRIORITY_USER) ? priority + 1 : priority; } {code} When the current store files count is equal to blockingFilesCount, we will get a priority = 0 only which looks to be wrong See this {code} // degenerate case: blocked regions require recursive enqueues 536 if (store.getCompactPriority() <= 0) { 537 requestSystemCompaction(region, store, "Recursive enqueue"); 538 } else { 539 // see if the compaction has caused us to exceed max region size 540 requestSplit(region); 541 } {code} Comment says for blocked regions we will need recursive enque but check is doing priority <=0 !!! What is wrong overall? > Region may grow too big and can not be split > -------------------------------------------- > > Key: HBASE-14735 > URL: https://issues.apache.org/jira/browse/HBASE-14735 > Project: HBase > Issue Type: Bug > Components: Compaction, regionserver > Affects Versions: 1.1.2, 0.98.15 > Reporter: Shuaifeng Zhou > Assignee: Shuaifeng Zhou > Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, > 14735-branch-1.2.patch, 14735-branch-1.2.patch, 14735-master (2).patch, > 14735-master.patch, 14735-master.patch > > > When a compaction completed, may there are also many storefiles in the store, > and CompactPriority < 0, then compactSplitThread will do a "Recursive > enqueue" compaction request instead of request a split: > {code:title=CompactSplitThread.java|borderStyle=solid} > if (completed) { > // degenerate case: blocked regions require recursive enqueues > if (store.getCompactPriority() <= 0) { > requestSystemCompaction(region, store, "Recursive enqueue"); > } else { > // see if the compaction has caused us to exceed max region size > requestSplit(region); > } > {code} > But in some situation, the "recursive enqueue" request may return null, and > not build up a new compaction runner. For example, an other compaction of the > same region is running, and compaction selection will exclude all files older > than the newest files currently compacting, this may cause no enough files > can be selected by the "recursive enqueue" request. When this happen, split > will not be trigged. If the input load is high enough, compactions aways > running on the region, and split will never be triggered. > In our cluster, this situation happened, and a huge region more than 400GB > and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)