Created: https://issues.apache.org/jira/browse/HBASE-12657
On Mon, Dec 8, 2014 at 11:33 AM, Vladimir Rodionov <[email protected]> wrote: > Forgot to mention: > > Low (default) value for *hbase.hstore.compaction.max* and prolonged > write activity w/o throttling will get you into this *region is out of > bounds* situation. Because, when compaction can't keep up with writes, > number of store files eventually exceeds *hbase.hstore.compaction.max. *Here > what is happening in 0.94: > > 0.94 does not enforce major compaction flag on selections with reference > files, applies selection algorithms and applies the limit by removing *first > K = N - max * files from candidate list. N - # of files in selection > list, *max* - is *hbase.hstore.compaction.max *value*.* > > > 0.98: > > It marks selection as major but after that applies limit and removes > first *K = N - max * files from a candidate list. > > In both cases, if # of references files > max, some reference files will > be excluded. > > What happens later, depends on what we have in a compaction file list for > this Store (HStore). If pending files list has at least one non-reference > file, all reference files will be excluded from above selection. > > What I have to say here, it seems that the only way to compact all > reference files is to enforce major compaction immediately after region > split. If we fail to do this, with a very high probability, reference files > will be pushing out of compaction until the write load decrease > substantially and # of store files becomes less than > *hbase.hstore.compaction.max.* > > -Vlad > > On Mon, Dec 8, 2014 at 11:14 AM, Vladimir Rodionov <[email protected] > > wrote: > >> Yes, we have a patch in house. Just need time to verify it. As for seq >> ##, lowest seq ## is the reason that reference files are constantly >> getting removed from compaction selections if there are newer files in a >> compaction queue. Just check the code. This is what is happening under high >> load when there are too many minor compaction requests in a queue, >> reference files do not have a chance to be compacted. >> >> Interestingly, that current 0.94 and 0.98 code have different issues here >> and require different patches. >> >> 0.94 does not treat compaction request with reference files as major one, >> but ignores *hbase.hstore.compaction.max *for major compactions, >> 0.98 consider compaction of reference files as a major one, but consults >> *hbase.hstore.compaction.max >> *and downgrades request when # of files exceeds this limit*.* >> >> *-*Vlad >> >> >> On Mon, Dec 8, 2014 at 10:41 AM, lars hofhansl <[email protected]> wrote: >> >>> Did you get anywhere? >>> Happy to collaborate, this is important to fix. >>> >>> Thinking about my comment again. The half-store files would have an >>> earlier sequence number than any new files, so they would naturally sort >>> first. This needs a bit more investigation. >>> >>> -- Lars >>> >>> ------------------------------ >>> *From:* Vladimir Rodionov <[email protected]> >>> *To:* "[email protected]" <[email protected]> >>> *Cc:* lars hofhansl <[email protected]> >>> *Sent:* Friday, December 5, 2014 10:40 PM >>> >>> *Subject:* Re: Region is out of bounds >>> >>> Under heavy load, the only window for compaction of reference files is >>> the first compaction request after split (when Store's filesInCompactions >>> is empty and major compaction is possible) If this major compaction request >>> fails or downgrades to minor ( i.e. # of store files > # of max files per >>> compaction) , there are very high probability that region will never be >>> split until load decreases substantially. In this case, compaction queue >>> (list of files being compacted per Store file) will be most of the time >>> empty, and major compaction (including all reference files) will get a >>> chance. >>> >>> If region has reference files - its not splittable. >>> >>> I have a very simple patch I am going to submit this weekend. >>> >>> -Vladimir Rodionov >>> >>> >>> >>> On Fri, Dec 5, 2014 at 7:12 PM, <[email protected]> wrote: >>> >>> Good points Lars. I thought a bit on how to debug/find more >>> clue...sorting it first is a good idea(currently sort by sequenceID, size >>> etc) >>> Thanks >>> >>> 发自我的 iPad >>> >>> 在 2014-12-6,9:07,Andrew Purtell <[email protected]> 写道: >>> >>> >> Seems to me we should sort reference files first _always_, to compact >>> > them away first and allow to be split further. Thoughts? File a jira? >>> > >>> > Sounds reasonable as an enhancement issue >>> > >>> > >>> > On Fri, Dec 5, 2014 at 5:02 PM, lars hofhansl <[email protected]> >>> wrote: >>> > >>> >> Digging in the (0.98) code a bit I find this: >>> >> >>> >> HRegionServer.postOpenDeployTasks(): Request a compaction either when >>> >> we're past the minimum of file or there is any reference file. Good, >>> that >>> >> will trigger >>> >> RatioBasedComactionPolicy.selectCompaction(): turns any compaction >>> into a >>> >> major one if there reference files involved in the set of files >>> already >>> >> selected. Also cool, after a split all files of a daughter will be >>> >> reference files. >>> >> >>> >> But... I do not see any code where it would make sure at least one >>> >> reference file is selected. So in theory the initial compaction >>> started by >>> >> postOpenDeployTasks could have failed for some reason.Now more data is >>> >> written and following compaction selections won't pick up any >>> reference >>> >> files, as there are many small, new files written. >>> >> So the reference files could in theory just linger until a selection >>> just >>> >> happens to come across one, all the while the daughters are (a) >>> >> unsplittable and (b) cannot migrate to another region server. >>> >> That is unless I missed something... Maybe somebody could have a look >>> too? >>> >> Seems to me we should sort reference files first _always_, to compact >>> them >>> >> away first and allow to be split further. Thoughts? File a jira? >>> >> >>> >> -- Lars >>> >> >>> >> From: lars hofhansl <[email protected]> >>> >> To: "[email protected]" <[email protected]> >>> >> Sent: Friday, December 5, 2014 1:59 PM >>> >> Subject: Re: Region is out of bounds >>> >> >>> >> We've run into something like this as well (probably). >>> >> Will be looking at this as well over the next days/weeks. Under heavy >>> load >>> >> HBase seems to just not able to get the necessary compactions in, and >>> until >>> >> that happens it cannot further split a region. >>> >> >>> >> I wonder whether HBASE-12411 would be help here (this optionally allow >>> >> compaction to use private readers), I doubt it, though. >>> >> The details are probably tricky, I thought HBase would compact split >>> >> regions with higher priority (placing those first in the compaction >>> >> queue)... Need to actually check the code. >>> >> >>> >> -- Lars >>> >> From: Qiang Tian <[email protected]> >>> >> >>> >> >>> >> To: "[email protected]" <[email protected]> >>> >> Sent: Thursday, December 4, 2014 7:26 PM >>> >> Subject: Re: Region is out of bounds >>> >> >>> >> ----My attempt to add reference files forcefully to compaction list in >>> >> Store.requetsCompaction() when region exceeds recommended maximum >>> size did >>> >> not work out well - some weird results in our test cases (but HBase >>> tests >>> >> are OK: small, medium and large). >>> >> >>> >> interesting...perhaps it was filtered out in >>> RatioBasedCompactionPolicy# >>> >> selectCompaction? >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> On Fri, Dec 5, 2014 at 5:20 AM, Andrew Purtell <[email protected]> >>> >> wrote: >>> >> >>> >>> Most versions of 0.98 since 0.98.1, but I haven't run a punishing >>> high >>> >>> scale bulk ingest for its own sake, high-ish rate ingest and a >>> setting of >>> >>> blockingStoreFiles to 200 have been in service of getting data in for >>> >>> subsequent testing. >>> >>> >>> >>> >>> >>> On Thu, Dec 4, 2014 at 12:43 PM, Vladimir Rodionov < >>> >> [email protected] >>> >>>> >>> >>> wrote: >>> >>> >>> >>>> Andrew, >>> >>>> >>> >>>> What HBase version have you run your test on? >>> >>>> >>> >>>> This issue probably does not exist anymore in a latest Apache >>> releases, >>> >>> but >>> >>>> still exists in not so latest, but still actively used, versions of >>> >> CDH, >>> >>>> HDP etc. We have discovered it during large data set loading ( 100s >>> of >>> >>> GB) >>> >>>> in our cluster (4 nodes). >>> >>>> >>> >>>> -Vladimir >>> >>>> >>> >>>> On Thu, Dec 4, 2014 at 10:23 AM, Andrew Purtell < >>> [email protected]> >>> >>>> wrote: >>> >>>> >>> >>>>> Actually I have set hbase.hstore. >>> >>>> >>> >>>> blockingStoreFiles to 200 in testing >>> >>>>> exactly :-), but must not have generated sufficient load to >>> encounter >>> >>> the >>> >>>>> issue you are seeing. Maybe it would be possible to adapt one of >>> the >>> >>>> ingest >>> >>>>> integration tests to trigger this problem? Set blockingStoreFiles >>> to >>> >>> 200 >>> >>>> or >>> >>>>> more. Tune down the region size to 128K or similar. If >>> >>>>> it's reproducible like that please open a JIRA. >>> >>>>> >>> >>>>> On Wed, Dec 3, 2014 at 9:07 AM, Vladimir Rodionov < >>> >>>> [email protected]> >>> >>>>> wrote: >>> >>>>> >>> >>>>>> Kevin, >>> >>>>>> >>> >>>>>> Thank you for your response. This is not a question on how to >>> >>> configure >>> >>>>>> correctly HBase cluster for write heavy workloads. This is >>> internal >>> >>>> HBase >>> >>>>>> issue - something is wrong in a default logic of compaction >>> >> selection >>> >>>>>> algorithm in 0.94-0.98. It seems that nobody has ever tested >>> >>> importing >>> >>>>> data >>> >>>>>> with very high hbase.hstore.blockingStoreFiles value (200 in our >>> >>> case). >>> >>>>>> >>> >>>>>> -Vladimir Rodionov >>> >>>>>> >>> >>>>>> On Wed, Dec 3, 2014 at 6:38 AM, Kevin O'dell < >>> >>> [email protected] >>> >>>>> >>> >>>>>> wrote: >>> >>>>>> >>> >>>>>>> Vladimir, >>> >>>>>>> >>> >>>>>>> I know you said, "do not ask me why", but I am going to have to >>> >>> ask >>> >>>>> you >>> >>>>>>> why. The fact you are doing this(this being blocking store >>> >> files > >>> >>>>> 200) >>> >>>>>>> tells me there is something or multiple somethings wrong with >>> >> your >>> >>>>>> cluster >>> >>>>>>> setup. A couple things come to mind: >>> >>>>>>> >>> >>>>>>> * During this heavy write period, could we use bulk loads? If >>> >> so, >>> >>>> this >>> >>>>>>> should solve almost all of your problems >>> >>>>>>> >>> >>>>>>> * 1GB region size is WAY too small, and if you are pushing the >>> >>> volume >>> >>>>> of >>> >>>>>>> data you are talking about I would recommend 10 - 20GB region >>> >> sizes >>> >>>>> this >>> >>>>>>> should help keep your region count smaller as well which will >>> >>> result >>> >>>> in >>> >>>>>>> more optimal writes >>> >>>>>>> >>> >>>>>>> * Your cluster may be undersized, if you are setting the blocking >>> >>> to >>> >>>> be >>> >>>>>>> that high, you may be pushing too much data for your cluster >>> >>> overall. >>> >>>>>>> >>> >>>>>>> Would you be so kind as to pass me a few pieces of information? >>> >>>>>>> >>> >>>>>>> 1.) Cluster size >>> >>>>>>> 2.) Average region count per RS >>> >>>>>>> 3.) Heap size, Memstore global settings, and block cache settings >>> >>>>>>> 4.) a RS log to pastebin and a time frame of "high writes" >>> >>>>>>> >>> >>>>>>> I can probably make some solid suggestions for you based on the >>> >>> above >>> >>>>>> data. >>> >>>>>>> >>> >>>>>>> On Wed, Dec 3, 2014 at 1:04 AM, Vladimir Rodionov < >>> >>>>>> [email protected]> >>> >>>>>>> wrote: >>> >>>>>>> >>> >>>>>>>> This is what we observed in our environment(s) >>> >>>>>>>> >>> >>>>>>>> The issue exists in CDH4.5, 5.1, HDP2.1, Mapr4 >>> >>>>>>>> >>> >>>>>>>> If some one sets # of blocking stores way above default value, >>> >>> say >>> >>>> - >>> >>>>>> 200 >>> >>>>>>> to >>> >>>>>>>> avoid write stalls during intensive data loading (do not ask >>> >> me , >>> >>>> why >>> >>>>>> we >>> >>>>>>> do >>> >>>>>>>> this), then >>> >>>>>>>> one of the regions grows indefinitely and takes more 99% of >>> >>> overall >>> >>>>>>> table. >>> >>>>>>>> >>> >>>>>>>> It can't be split because it still has orphaned reference >>> >> files. >>> >>>> Some >>> >>>>>> of >>> >>>>>>> a >>> >>>>>>>> reference files are able to avoid compactions for a long time, >>> >>>>>> obviously. >>> >>>>>>>> >>> >>>>>>>> The split policy is IncreasingToUpperBound, max region size is >>> >>> 1G. >>> >>>> I >>> >>>>> do >>> >>>>>>> my >>> >>>>>>>> tests on CDH4.5 mostly but all other distros seem have the same >>> >>>>> issue. >>> >>>>>>>> >>> >>>>>>>> My attempt to add reference files forcefully to compaction list >>> >>> in >>> >>>>>>>> Store.requetsCompaction() when region exceeds recommended >>> >> maximum >>> >>>>> size >>> >>>>>>> did >>> >>>>>>>> not work out well - some weird results in our test cases (but >>> >>> HBase >>> >>>>>> tests >>> >>>>>>>> are OK: small, medium and large). >>> >>>>>>>> >>> >>>>>>>> What is so special with these reference files? Any ideas, what >>> >>> can >>> >>>> be >>> >>>>>>> done >>> >>>>>>>> here to fix the issue? >>> >>>>>>>> >>> >>>>>>>> -Vladimir Rodionov >>> >>>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> -- >>> >>>>>>> Kevin O'Dell >>> >>>>>>> Systems Engineer, Cloudera >>> >>>>>>> >>> >>>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> Best regards, >>> >>>>> >>> >>>>> - Andy >>> >>>>> >>> >>>>> Problems worthy of attack prove their worth by hitting back. - Piet >>> >>> Hein >>> >>>>> (via Tom White) >>> >>>>> >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Best regards, >>> >>> >>> >>> - Andy >>> >>> >>> >>> Problems worthy of attack prove their worth by hitting back. - Piet >>> Hein >>> >>> (via Tom White) >>> >>> >>> >> >>> >> >>> >> >>> >> >>> >> >>> > >>> > >>> > >>> > -- >>> > Best regards, >>> > >>> > - Andy >>> > >>> > Problems worthy of attack prove their worth by hitting back. - Piet >>> Hein >>> > (via Tom White) >>> >>> >>> >>> >>> >> >
