[ https://issues.apache.org/jira/browse/HBASE-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HBASE-2646. -------------------------- Resolution: Fixed Fix Version/s: 0.90.0 Assignee: Jeff Whiting Hadoop Flags: [Reviewed] This was applied a while back. Resolving. Thanks for the patch Jeff (Assigned it to you). > Compaction requests should be prioritized to prevent blocking > ------------------------------------------------------------- > > Key: HBASE-2646 > URL: https://issues.apache.org/jira/browse/HBASE-2646 > Project: HBase > Issue Type: Improvement > Components: regionserver > Affects Versions: 0.20.4 > Environment: ubuntu server 10; hbase 0.20.4; 4 machine cluster (each > machine is an 8 core xeon with 16 GB of ram and 6TB of storage); ~250 Million > rows; > Reporter: Jeff Whiting > Assignee: Jeff Whiting > Priority: Critical > Labels: compaction, split > Fix For: 0.90.0 > > Attachments: 2646-fix-race-condition-r1004349.txt, 2646-v2.txt, > 2646-v3.txt, PriorityQueue-r996664.patch, prioritycompactionqueue-0.20.4.patch > > > While testing the write capacity of a 4 machine hbase cluster we were getting > long and frequent client pauses as we attempted to load the data. Looking > into the problem we'd get a relatively large compaction queue and when a > region hit the "hbase.hstore.blockingStoreFiles" limit it would get block the > client and the compaction request would get put on the back of the queue > waiting for many other less important compactions. The client is basically > stuck at that point until a compaction is done. Prioritizing the compaction > requests and allowing the request that is blocking other actions go first > would help solve the problem. > You can see the problem by looking at our log files: > You'll first see an event such as a too many HLog which will put a lot of > requests on the compaction queue. > {noformat} > 2010-05-25 10:53:26,570 INFO org.apache.hadoop.hbase.regionserver.HLog: Too > many hlogs: logs=33, maxlogs=32; forcing flush of 22 regions(s): > responseCounts,RS_6eZzLtdwhGiTwHy,1274232223324, > responses,RS_0qhkL5rUmPCbx3K-1274213057242,1274513189592, > responses,RS_1ANYnTegjzVIsHW-12742177419 > 21,1274511001873, responses,RS_1HQ4UG5BdOlAyuE-1274216757425,1274726323747, > responses,RS_1Y7SbqSTsZrYe7a-1274328697838,1274478031930, > responses,RS_1ZH5TB5OdW4BVLm-1274216239894,1274538267659, > responses,RS_3BHc4KyoM3q72Yc-1274290546987,1274502062319, > responses,RS_3ra9BaBMAXFAvbK-127421457 > 9958,1274381552543, responses,RS_6SDrGNuyyLd3oR6-1274219941155,1274385453586, > responses,RS_8AGCEMWbI6mZuoQ-1274306857429,1274319602718, > responses,RS_8C8T9DN47uwTG1S-1274215381765,1274289112817, > responses,RS_8J5wmdmKmJXzK6g-1274299593861,1274494738952, > responses,RS_8e5Sz0HeFPAdb6c-1274288 > 641459,1274495868557, > responses,RS_8rjcnmBXPKzI896-1274306981684,1274403047940, > responses,RS_9FS3VedcyrF0KX2-1274245971331,1274754745013, > responses,RS_9oZgPtxO31npv3C-1274214027769,1274396489756, > responses,RS_a3FdO2jhqWuy37C-1274209228660,1274399508186, > responses,RS_a3LJVxwTj29MHVa-12742 > {noformat} > Then you see the too many log files: > {noformat} > 2010-05-25 10:53:31,364 DEBUG > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested > for region > responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862/783020138 > because: regionserver/192.168.0.81:60020.cacheFlusher > 2010-05-25 10:53:32,364 WARN > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region > responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862 has too many > store files, putting it back at the end of the flush queue. > {noformat} > Which leads to this: > {noformat} > 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Blocking updates for 'IPC Server handler 60 on 60020' on region > responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore > size 128.0m is >= than blocking 128.0m size > 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Blocking updates for 'IPC Server handler 84 on 60020' on region > responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore > size 128.0m is >= than blocking 128.0m size > 2010-05-25 10:53:27,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Blocking updates for 'IPC Server handler 1 on 60020' on region > responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore > size 128.0m is >= than blocking 128.0m size > {noformat} > Once the compaction / split is done a flush is able to happen which unblocks > the IPC allowing writes to continue. Unfortunately this process can take > upwards of 15+ minutes (the specific case shown here from our logs took about > 4 minutes). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira