[
https://issues.apache.org/jira/browse/HADOOP-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HADOOP-1646:
--------------------------
Attachment: oome.patch
Here's a patch
HADOOP-1646 RegionServer OOME's under sustained, substantial loading by 10
concurrent clients
Added a gate that closes when overwhelmed by load. Tuned default configuration
to better suit sustained loading. Compactions and splits are taking too long,
so long, its not hard to put a region server into a state where it mostly
has clients on hold while it splits and compacts (To be addressed next).
M src/contrib/hbase/conf/hbase-default.xml
Edit of property descriptions. HMemCache thresholds are now done in
byte sizes rather than number of commits.
(hbase.regionserver.msginterval) changed from 15 to 10 seconds.
(hbase.hregion.maxunflushed) Removed. Replaced by
hbase.hregion.memcache.flush.size.
(hbase.hregion.compactionThreshold,
hbase.hregion.memcache.block.multiplier,
hbase.regionserver.thread.splitcompactcheckfrequency): Added.
(hbase.hregion.max.filesize): Changed from 128M to 64M.
M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestHMemcache.java
Removed setting of fs.file.impl. No longer neeeded.
Added assertion that history is being cleaned up.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreFile.java
(LOG): Added for debug level logging.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStore.java
LOGging edit adding size, count and names of store files.
(storeName): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreKey.java
Made all constructors go via the constructor that takes all args.
(getSize): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionServer.java
LOGging edits adding sizes, time-to-complete, etc. Made it so could
run a split even though no compaction if files on disk were big enough.
We were running adding/deleting of regions from META numRetries times
every time. Halfed default for split/compact checker thread run time.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HConstants.java
(DEFAULT_MAX_FILE_SIZE): Changed from 128M to 64M.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMemcache.java
Removed logging (Moved to hosting HRegionServer).
(getSize): Added.
M src/hbase/src/java/org/apache/hadoop/hbase/HRegion.java
Removed 'closed' from WriteState and moved it out to HRegion.
Added waiting on outstanding row locks before splitting.
Added logging of how long splits and compactions take as well sizes of
store files and region. Added forced flush if more than 10 optionals
w/o our flushing to write out ROOT and META data, usually too small
to earn a size-triggered flush. Added a checkResources that will block
clients updating if we've exceeded memcache upper-size bound.
(closed, noFlushCount, blockingMemcacheSize): Added.
(maxUnflushedEntries): Removed. Replaced by memcacheFlushSize.
(splitStoreFile): Added (Refactored duplicated code here).
(getAllStoreFiles): Added.
(startUpdate): Added read lock around get of row lock. Added
check to see if we should block.
(checkResources): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HLocking.java
Formatting.
> [hbase] RegionServer OOME's under sustained, substantial loading by 10
> concurrent clients
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-1646
> URL: https://issues.apache.org/jira/browse/HADOOP-1646
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
> Attachments: oome.patch
>
>
> Have been running ten concurrent clients uploading wikipedia to hbase. Each
> update includes some metadata -- URL, mimetype -- and the page content.
> Caching updates across compactions and splits, we OOME (Default heap size of
> 1G). 10 concurrent clients are doing over 10k rows a minute. HBase should
> be able to carry this common loading scenario.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.