Created:
https://issues.apache.org/jira/browse/HBASE-12657

On Mon, Dec 8, 2014 at 11:33 AM, Vladimir Rodionov <[email protected]>
wrote:

> Forgot to mention:
>
>  Low (default) value for *hbase.hstore.compaction.max* and prolonged
> write activity w/o throttling will get you into this *region is out of
> bounds* situation. Because, when compaction can't keep up with writes,
> number of store files eventually exceeds *hbase.hstore.compaction.max. *Here
> what is happening in 0.94:
>
> 0.94 does not enforce major compaction flag on selections with reference
> files, applies selection algorithms and applies the limit by removing *first
> K = N - max * files from candidate list. N - # of files in selection
> list, *max* - is  *hbase.hstore.compaction.max *value*.*
>
>
> 0.98:
>
> It marks selection as major but after that applies limit and  removes
> first *K = N - max * files from a candidate list.
>
> In both cases, if # of references files > max, some reference files will
> be excluded.
>
> What happens later, depends on what we have in a compaction file list for
> this Store (HStore). If pending files list has at least one non-reference
> file, all reference files will be excluded from above selection.
>
> What I have to say here, it seems that the only way to compact all
> reference files is to enforce major compaction immediately after region
> split. If we fail to do this, with a very high probability, reference files
> will be pushing out of compaction until the write load decrease
> substantially and # of store files becomes less than
> *hbase.hstore.compaction.max.*
>
> -Vlad
>
> On Mon, Dec 8, 2014 at 11:14 AM, Vladimir Rodionov <[email protected]
> > wrote:
>
>> Yes, we have a patch in house. Just need time to verify it. As for seq
>> ##,  lowest seq ## is the reason that reference files are constantly
>> getting removed from compaction selections if there are newer files in a
>> compaction queue. Just check the code. This is what is happening under high
>> load when there are too many minor compaction requests in a queue,
>> reference files do not have a chance to be compacted.
>>
>> Interestingly, that current 0.94 and 0.98 code have different issues here
>> and require different patches.
>>
>> 0.94 does not treat compaction request with reference files as major one,
>> but ignores *hbase.hstore.compaction.max *for major compactions,
>> 0.98 consider compaction of reference files as a major one, but consults  
>> *hbase.hstore.compaction.max
>> *and downgrades request when # of files exceeds this limit*.*
>>
>> *-*Vlad
>>
>>
>> On Mon, Dec 8, 2014 at 10:41 AM, lars hofhansl <[email protected]> wrote:
>>
>>> Did you get anywhere?
>>> Happy to collaborate, this is important to fix.
>>>
>>> Thinking about my comment again. The half-store files would have an
>>> earlier sequence number than any new files, so they would naturally sort
>>> first. This needs a bit more investigation.
>>>
>>> -- Lars
>>>
>>>   ------------------------------
>>>  *From:* Vladimir Rodionov <[email protected]>
>>> *To:* "[email protected]" <[email protected]>
>>> *Cc:* lars hofhansl <[email protected]>
>>> *Sent:* Friday, December 5, 2014 10:40 PM
>>>
>>> *Subject:* Re: Region is out of bounds
>>>
>>> Under heavy load, the only window for compaction of reference files is
>>> the first compaction request after split (when Store's filesInCompactions
>>> is empty and major compaction is possible) If this major compaction request
>>> fails or downgrades to minor ( i.e. # of store files > # of max files per
>>> compaction) , there are very high probability that region will never be
>>> split until load decreases substantially. In this case, compaction queue
>>> (list of files being compacted per Store file) will be most of the time
>>> empty, and major compaction (including all reference files) will get a
>>> chance.
>>>
>>> If region has reference files  - its not splittable.
>>>
>>> I have a very simple patch I am going to submit this weekend.
>>>
>>> -Vladimir Rodionov
>>>
>>>
>>>
>>> On Fri, Dec 5, 2014 at 7:12 PM, <[email protected]> wrote:
>>>
>>> Good points Lars. I thought a bit on how to debug/find more
>>> clue...sorting it first is a good idea(currently sort by sequenceID, size
>>> etc)
>>> Thanks
>>>
>>> 发自我的 iPad
>>>
>>> 在 2014-12-6,9:07,Andrew Purtell <[email protected]> 写道:
>>>
>>> >> Seems to me we should sort reference files first _always_, to compact
>>> > them away first and allow to be split further. Thoughts? File a jira?
>>> >
>>> > ​Sounds reasonable as an enhancement issue​
>>> >
>>> >
>>> > On Fri, Dec 5, 2014 at 5:02 PM, lars hofhansl <[email protected]>
>>> wrote:
>>> >
>>> >> Digging in the (0.98) code a bit I find this:
>>> >>
>>> >> HRegionServer.postOpenDeployTasks(): Request a compaction either when
>>> >> we're past the minimum of file or there is any reference file. Good,
>>> that
>>> >> will trigger
>>> >> RatioBasedComactionPolicy.selectCompaction(): turns any compaction
>>> into a
>>> >> major one if there reference files involved in the set of files
>>> already
>>> >> selected. Also cool, after a split all files of a daughter will be
>>> >> reference files.
>>> >>
>>> >> But... I do not see any code where it would make sure at least one
>>> >> reference file is selected. So in theory the initial compaction
>>> started by
>>> >> postOpenDeployTasks could have failed for some reason.Now more data is
>>> >> written and following compaction selections won't pick up any
>>> reference
>>> >> files, as there are many small, new files written.
>>> >> So the reference files could in theory just linger until a selection
>>> just
>>> >> happens to come across one, all the while the daughters are (a)
>>> >> unsplittable and (b) cannot migrate to another region server.
>>> >> That is unless I missed something... Maybe somebody could have a look
>>> too?
>>> >> Seems to me we should sort reference files first _always_, to compact
>>> them
>>> >> away first and allow to be split further. Thoughts? File a jira?
>>> >>
>>> >> -- Lars
>>> >>
>>> >>      From: lars hofhansl <[email protected]>
>>> >> To: "[email protected]" <[email protected]>
>>> >> Sent: Friday, December 5, 2014 1:59 PM
>>> >> Subject: Re: Region is out of bounds
>>> >>
>>> >> We've run into something like this as well (probably).
>>> >> Will be looking at this as well over the next days/weeks. Under heavy
>>> load
>>> >> HBase seems to just not able to get the necessary compactions in, and
>>> until
>>> >> that happens it cannot further split a region.
>>> >>
>>> >> I wonder whether HBASE-12411 would be help here (this optionally allow
>>> >> compaction to use private readers), I doubt it, though.
>>> >> The details are probably tricky, I thought HBase would compact split
>>> >> regions with higher priority (placing those first in the compaction
>>> >> queue)... Need to actually check the code.
>>> >>
>>> >> -- Lars
>>> >>      From: Qiang Tian <[email protected]>
>>> >>
>>> >>
>>> >> To: "[email protected]" <[email protected]>
>>> >> Sent: Thursday, December 4, 2014 7:26 PM
>>> >> Subject: Re: Region is out of bounds
>>> >>
>>> >> ----My attempt to add reference files forcefully to compaction list in
>>> >> Store.requetsCompaction() when region exceeds recommended maximum
>>> size did
>>> >> not work out well - some weird results in our test cases (but HBase
>>> tests
>>> >> are OK: small, medium and large).
>>> >>
>>> >> interesting...perhaps it was filtered out in
>>> RatioBasedCompactionPolicy#
>>> >> selectCompaction?
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Dec 5, 2014 at 5:20 AM, Andrew Purtell <[email protected]>
>>> >> wrote:
>>> >>
>>> >>> Most versions of 0.98 since 0.98.1, but I haven't run a punishing
>>> high
>>> >>> scale bulk ingest for its own sake, high-ish rate ingest and a
>>> setting of
>>> >>> blockingStoreFiles to 200 have been in service of getting data in for
>>> >>> subsequent testing.
>>> >>>
>>> >>>
>>> >>> On Thu, Dec 4, 2014 at 12:43 PM, Vladimir Rodionov <
>>> >> [email protected]
>>> >>>>
>>> >>> wrote:
>>> >>>
>>> >>>> Andrew,
>>> >>>>
>>> >>>> What HBase version have you run your test on?
>>> >>>>
>>> >>>> This issue probably does not exist anymore in a latest Apache
>>> releases,
>>> >>> but
>>> >>>> still exists in not so latest, but still actively used, versions of
>>> >> CDH,
>>> >>>> HDP etc. We have discovered it during large data set loading ( 100s
>>> of
>>> >>> GB)
>>> >>>> in our cluster (4 nodes).
>>> >>>>
>>> >>>> -Vladimir
>>> >>>>
>>> >>>> On Thu, Dec 4, 2014 at 10:23 AM, Andrew Purtell <
>>> [email protected]>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> Actually I have set hbase.hstore.
>>> >>>> ​​
>>> >>>> blockingStoreFiles to 200 in testing
>>> >>>>> exactly :-), but must not have generated sufficient load to
>>> encounter
>>> >>> the
>>> >>>>> issue you are seeing. Maybe it would be possible to adapt one of
>>> the
>>> >>>> ingest
>>> >>>>> integration tests to trigger this problem? Set blockingStoreFiles
>>> to
>>> >>> 200
>>> >>>> or
>>> >>>>> more. Tune down the region size to 128K or similar. If
>>> >>>>> it's reproducible like that please open a JIRA.
>>> >>>>>
>>> >>>>> On Wed, Dec 3, 2014 at 9:07 AM, Vladimir Rodionov <
>>> >>>> [email protected]>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>>> Kevin,
>>> >>>>>>
>>> >>>>>> Thank you for your response. This is not a question on how to
>>> >>> configure
>>> >>>>>> correctly HBase cluster for write heavy workloads. This is
>>> internal
>>> >>>> HBase
>>> >>>>>> issue - something is wrong in a default logic of compaction
>>> >> selection
>>> >>>>>> algorithm in 0.94-0.98. It seems that nobody has ever tested
>>> >>> importing
>>> >>>>> data
>>> >>>>>> with very high hbase.hstore.blockingStoreFiles value (200 in our
>>> >>> case).
>>> >>>>>>
>>> >>>>>> -Vladimir Rodionov
>>> >>>>>>
>>> >>>>>> On Wed, Dec 3, 2014 at 6:38 AM, Kevin O'dell <
>>> >>> [email protected]
>>> >>>>>
>>> >>>>>> wrote:
>>> >>>>>>
>>> >>>>>>> Vladimir,
>>> >>>>>>>
>>> >>>>>>> I know you said, "do not ask me why", but I am going to have to
>>> >>> ask
>>> >>>>> you
>>> >>>>>>> why.  The fact you are doing this(this being blocking store
>>> >> files >
>>> >>>>> 200)
>>> >>>>>>> tells me there is something or multiple somethings wrong with
>>> >> your
>>> >>>>>> cluster
>>> >>>>>>> setup.  A couple things come to mind:
>>> >>>>>>>
>>> >>>>>>> * During this heavy write period, could we use bulk loads?  If
>>> >> so,
>>> >>>> this
>>> >>>>>>> should solve almost all of your problems
>>> >>>>>>>
>>> >>>>>>> * 1GB region size is WAY too small, and if you are pushing the
>>> >>> volume
>>> >>>>> of
>>> >>>>>>> data you are talking about I would recommend 10 - 20GB region
>>> >> sizes
>>> >>>>> this
>>> >>>>>>> should help keep your region count smaller as well which will
>>> >>> result
>>> >>>> in
>>> >>>>>>> more optimal writes
>>> >>>>>>>
>>> >>>>>>> * Your cluster may be undersized, if you are setting the blocking
>>> >>> to
>>> >>>> be
>>> >>>>>>> that high, you may be pushing too much data for your cluster
>>> >>> overall.
>>> >>>>>>>
>>> >>>>>>> Would you be so kind as to pass me a few pieces of information?
>>> >>>>>>>
>>> >>>>>>> 1.) Cluster size
>>> >>>>>>> 2.) Average region count per RS
>>> >>>>>>> 3.) Heap size, Memstore global settings, and block cache settings
>>> >>>>>>> 4.) a RS log to pastebin and a time frame of "high writes"
>>> >>>>>>>
>>> >>>>>>> I can probably make some solid suggestions for you based on the
>>> >>> above
>>> >>>>>> data.
>>> >>>>>>>
>>> >>>>>>> On Wed, Dec 3, 2014 at 1:04 AM, Vladimir Rodionov <
>>> >>>>>> [email protected]>
>>> >>>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>>> This is what we observed in our environment(s)
>>> >>>>>>>>
>>> >>>>>>>> The issue exists in CDH4.5, 5.1, HDP2.1, Mapr4
>>> >>>>>>>>
>>> >>>>>>>> If some one sets # of blocking stores way above default value,
>>> >>> say
>>> >>>> -
>>> >>>>>> 200
>>> >>>>>>> to
>>> >>>>>>>> avoid write stalls during intensive data loading (do not ask
>>> >> me ,
>>> >>>> why
>>> >>>>>> we
>>> >>>>>>> do
>>> >>>>>>>> this), then
>>> >>>>>>>> one of the regions grows indefinitely and takes more 99% of
>>> >>> overall
>>> >>>>>>> table.
>>> >>>>>>>>
>>> >>>>>>>> It can't be split because it still has orphaned reference
>>> >> files.
>>> >>>> Some
>>> >>>>>> of
>>> >>>>>>> a
>>> >>>>>>>> reference files are able to avoid compactions for a long time,
>>> >>>>>> obviously.
>>> >>>>>>>>
>>> >>>>>>>> The split policy is IncreasingToUpperBound, max region size is
>>> >>> 1G.
>>> >>>> I
>>> >>>>> do
>>> >>>>>>> my
>>> >>>>>>>> tests on CDH4.5 mostly but all other distros seem have the same
>>> >>>>> issue.
>>> >>>>>>>>
>>> >>>>>>>> My attempt to add reference files forcefully to compaction list
>>> >>> in
>>> >>>>>>>> Store.requetsCompaction() when region exceeds recommended
>>> >> maximum
>>> >>>>> size
>>> >>>>>>> did
>>> >>>>>>>> not work out well - some weird results in our test cases (but
>>> >>> HBase
>>> >>>>>> tests
>>> >>>>>>>> are OK: small, medium and large).
>>> >>>>>>>>
>>> >>>>>>>> What is so special with these reference files? Any ideas, what
>>> >>> can
>>> >>>> be
>>> >>>>>>> done
>>> >>>>>>>> here to fix the issue?
>>> >>>>>>>>
>>> >>>>>>>> -Vladimir Rodionov
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> Kevin O'Dell
>>> >>>>>>> Systems Engineer, Cloudera
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Best regards,
>>> >>>>>
>>> >>>>>   - Andy
>>> >>>>>
>>> >>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> >>> Hein
>>> >>>>> (via Tom White)
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Best regards,
>>> >>>
>>> >>>   - Andy
>>> >>>
>>> >>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein
>>> >>> (via Tom White)
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Best regards,
>>> >
>>> >   - Andy
>>> >
>>> > Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein
>>> > (via Tom White)
>>>
>>>
>>>
>>>
>>>
>>
>

Reply via email to