[jira] [Created] (HBASE-24646) Set the log level for ScheduledChore to INFO in HBTU
Duo Zhang created HBASE-24646: - Summary: Set the log level for ScheduledChore to INFO in HBTU Key: HBASE-24646 URL: https://issues.apache.org/jira/browse/HBASE-24646 Project: HBase Issue Type: Task Components: test Reporter: Duo Zhang Now we changed the thead waken interval to 100ms in tests, which causes the test log beng flooded with {noformat} 2020-06-27 10:28:00,680 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms. 2020-06-27 10:28:00,680 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. 2020-06-27 10:28:00,684 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms. 2020-06-27 10:28:00,684 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. 2020-06-27 10:28:00,696 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. 2020-06-27 10:28:00,696 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms. 2020-06-27 10:28:00,782 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. 2020-06-27 10:28:00,782 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms. 2020-06-27 10:28:00,786 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. {noformat} We could set the log level to INFO when starting a mini cluster in HBTU. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] Normalizer and pre-split tables
Heya, I've seen a lot of use-cases where the normalizer would be a nice solution for operators and application developers. I've been trying to beef it up a bit to handle these cases. However, some of these considerations are at odds, so I want to vet the ideas here. The normalizer is a background chore in the HMaster that attempts to converge region sizes within a table toward the average region size. It has a pretty wide error bar, but that's the overall goal. Early on, it was observed that an operator needs to pre-split a table, so special considerations were included, by way of `hbase.normalizer.min.region.count`, `hbase.normalizer.merge.min_region_age.days`, and `hbase.normalizer.merge.min_region_size.mb`. All these nobs are designed to give an operator means of controlling this behavior. We have (what I see as) a competing objective: doing away with empty, or nearly-empty regions. The use-case is pretty common when there's a TTL applied to a table, especially if there's also a timestamp component in the rowkey. In this case, we want the normalizer to "merge away" these empty regions. The trouble is we ship defaults for all of the `*min*` configs, and right now there's no way to "unset" them, disable the functionality. Which means there still isn't a way to support the empty regions use-case without awkward special-case checks. This is where I'm looking for suggestions from the community. There's some discussion under way over on the PR for HBASE-24583. Please take a look. Thanks in advance, Nick
[jira] [Resolved] (HBASE-20819) Use TableDescriptor to replace HTableDescriptor in hbase-shell module
[ https://issues.apache.org/jira/browse/HBASE-20819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-20819. --- Resolution: Fixed Merged. Thanks for nice contrib [~bitoffdev] (thanks for reviews [~zhangduo] and [~busbey]) > Use TableDescriptor to replace HTableDescriptor in hbase-shell module > - > > Key: HBASE-20819 > URL: https://issues.apache.org/jira/browse/HBASE-20819 > Project: HBase > Issue Type: Improvement > Components: shell >Affects Versions: 2.0.0 >Reporter: Xiaolin Ha >Assignee: Elliot Miller >Priority: Minor > Fix For: 3.0.0-alpha-1 > > Attachments: HBASE-20819.branch-2.001.patch, > HBASE-20819.branch-2.002.patch, > HBaseConstants-b5563432922268c7a16deacbb51bfba89c0a2aba.txt, > HBaseConstants-cf2aa593e590133b0c76d3723b4074b28b55dcc9.txt, > HBaseConstants-diff.txt > > > HTableDescriptor is deprecated as of release 2.0.0, and will be removed in > 3.0.0. This patch replaces all usages of HTableDescriptor and > HColumnDescriptor in the hbase-shell module so that HTableDescriptor can be > removed. > There a few other consequences of this change: > * Ruby methods relating to HTableDescriptor and HColumnDescriptor have been > removed. This is noted in "Release Note" on this issue. > * We no longer import constants from HTableDescriptor and HColumnDescriptor > into the ruby HBaseConstants module. Instead, we import them from > ColumnFamilyDescriptorBuilder and TableDescriptorBuilder. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24645) Include dump files in the jenkins archive
Nick Dimiduk created HBASE-24645: Summary: Include dump files in the jenkins archive Key: HBASE-24645 URL: https://issues.apache.org/jira/browse/HBASE-24645 Project: HBase Issue Type: Task Components: build Reporter: Nick Dimiduk Tracking down jenkins test failures in the face of {{org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?}} is difficult right now. Let's include the "dump" files in our archives so that there's something to look at, maybe find a usable stack trace. The failures I want to track look like {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test) on project hbase-common: There are test failures. [ERROR] [ERROR] Please refer to /home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.3@2/component/hbase-common/target/surefire-reports for the individual test results. [ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream. [ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.3@2/component/hbase-common && /usr/lib/jvm/jdk8u232-b09/jre/bin/java -enableassertions -Dhbase.build.id=2020-06-24T15:53:27Z -Xmx2200m -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Djdk.net.URLClassPath.disableClassPathURLCheck=true -Dorg.apache.hbase.thirdparty.io.netty.leakDetection.level=advanced -Dio.netty.eventLoopThreads=3 -jar /home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.3@2/component/hbase-common/target/surefire/surefirebooter9093790277421313959.jar /home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.3@2/component/hbase-common/target/surefire 2020-06-24T15-53-31_857-jvmRun1 surefire8040542579565444758tmp surefire_783341445535876751188tmp [ERROR] Process Exit Code: 0 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Removing problematic terms from our project
Circling back after more inputs, if we use this as a description of the proposals: 1. Replace "master"/"hmaster" with ???, this one has by far the most significant impact and both opinion and interpretation on this one is mixed. 2. Replace "slave" with "follower", seems to impact the cross cluster replication subsystem only. 3. Replace "black list" with "deny list". 4. Replace "white list" with "accept list". Then by my read of the responses we have consensus to do #2, #3, and #4. They were not controversial. JIRAs and patches will be welcome. Seems pretty clear committers and PMC will approve and do what is needed to complete any necessary deprecation cycle. Regarding #1, opinion is mixed. By my read I also think committers and PMC will approve patches and do what is needed to complete any necessary deprecation cycle for this one too. Enough PMC members expressed support to successfully vote on a release (although not if there were to be opposing votes). If a contributor were to open a JIRA and provide patches for this, there would be more discussion. There is no consensus, yet, on what replacement term is best. Personally, I can accept Zheng's recent suggestion of "controller". I can see how syllable count matters. I don't mean this summary to close the conversation. It is only a checkpoint. If anyone reading this has an opinion they do not wish to express publically, you are welcome to write to priv...@hbase.apache.org to state your opinion and the PMC will of course respectfully listen to it. On Thu, Jun 25, 2020 at 7:47 PM zheng wang <18031...@qq.com> wrote: > I like the controller. > > > Coordinator is a bit long for me to write and speak. > Manager and Admin is used somewhere yet in HBase. > > > > > -- 原始邮件 -- > 发件人: "Andrew Purtell" 发送时间: 2020年6月26日(星期五) 上午9:08 > 收件人: "Hbase-User" 抄送: "dev" 主题: Re: [DISCUSS] Removing problematic terms from our project > > > > > - AdminServer (as you already have AdminClient to talk to it). > > Oh... I like AdminServer. AdminServer (serving admin functions) and > RegionServer (serving region data). > > On Thu, Jun 25, 2020 at 4:46 PM Andrey Elenskiy > > > > Is there a word that's not "master" and not "coordinator" that > is clear > > and > > suitable for (diverse, polyglot) community? > > > > There are also: > > - captain (sounds pretty close to "master" without the negative side > and it > > should be relatable around the world) > > - conductor (as in orchestra) > > - controller (in kafka controller assigns partitions) > > - RegionDriver (more relevant to what it's actually doing in hbase and > > borrowed from PlacementDrive of TiKV) > > - AdminServer (as you already have AdminClient to talk to it). > > > > On Thu, Jun 25, 2020 at 3:49 PM Sean Busbey wrote: > > > > > How about "manager"? > > > > > > (It would help me if folks could explain what is lacking in > > "coordinator".) > > > > > > On Thu, Jun 25, 2020, 13:32 Nick Dimiduk wrote: > > > > > > > On Wed, Jun 24, 2020 at 10:14 PM 张铎(Duo Zhang) < > palomino...@gmail.com> > > > > wrote: > > > > > > > > > -0/+1/+1/+1 > > > > > > > > > > I’m the one who asked whether ‘master’ is safe to use > without ‘slave’ > > > in > > > > > the private list. > > > > > > > > > > I’m still not convinced that it is really necessary > and I do not > > think > > > > > other words like ‘coordinator’ can fully describe the > role of HMaster > > > in > > > > > HBase. HBase is more than 10 years old. In the context > of HBase, the > > > word > > > > > ‘HMaster’ has its own meaning. Changing the name will > hurt our users > > > and > > > > > make them confusing, especially for us non native > English speakers... > > > > > > > > > > > > > Is there a word that's not "master" and not "coordinator" > that is clear > > > and > > > > suitable for (diverse, polyglot) community? > > > > > > > > Stack > > > > > > > > > > +1/+1/+1/+1 where hbase3 adds the deprecation and > hbase4 follows > > > hbase3 > > > > > > soon after sounds good to me. I'm up for working > on this. > > > > > > S > > > > > > > > > > > > On Wed, Jun 24, 2020 at 2:26 PM Xu Cang < > xuc...@apache.org> wrote: > > > > > > > > > > > > > Strongly agree with what Nick said here: > > > > > > > > > > > > > > " From my perspective, we gain nothing > as a project or as a > > > > community > > > > > be > > > > > > > willfully retaining use of language that is > well understood to be > > > > > > > problematic or hurtful, On the contrary, > we have much to gain > > > by > > > > > > > encouraging > > > > > > > contributions from as many people as > possible." > > > > > > > > > > > > > > +1 to Andrew's proposal. > > > > > > > > > > > > > > It might be good to have a source of truth > web page or README > > file > > > > for > > > > > > > developers and users to refer to regarding > all naming > > transitions. > > > > It's > > > > > > > going to help both developers changing the > code and users looking > > > for > > > > > > some > > > > >
Re: HBase 2 slower than HBase 1?
Hey Anoop, I opened https://issues.apache.org/jira/browse/HBASE-24637 and attached the patches and script used to make the comparison. On Fri, Jun 26, 2020 at 2:33 AM Anoop John wrote: > Great investigation Andy. Do you know any Jiras which made changes in SQM? > Would be great if you can attach your patch which tracks the scan flow. If > we have a Jira for this issue, can you pls attach? > > Anoop > > On Fri, Jun 26, 2020 at 1:56 AM Andrew Purtell > wrote: > > > Related, I think I found a bug in branch-1 where we don’t heartbeat in > the > > filter all case until we switch store files, so scanning a very large > store > > file might time out with client defaults. Remarking on this here so I > don’t > > forget to follow up. > > > > > On Jun 25, 2020, at 12:27 PM, Andrew Purtell > > wrote: > > > > > > > > > I repeated this test with pe --filterAll and the results were > revealing, > > at least for this case. I also patched in thread local hash map for > atomic > > counters that I could update from code paths in SQM, StoreScanner, > > HFileReader*, and HFileBlock. Because a RPC is processed by a single > > handler thread I could update counters and accumulate micro-timings via > > System#nanoTime() per RPC and dump them out of CallRunner in some new > trace > > logging. I spent a couple of days making sure the instrumentation was > > placed equivalently in both 1.6 and 2.2 code bases and was producing > > consistent results. I can provide these patches upon request. > > > > > > Again, test tables with one family and 1, 5, 10, 20, 50, and 100 > > distinct column-qualifiers per row. After loading the table I made a > > snapshot and cloned the snapshot for testing, for both 1.6 and 2.2, so > both > > versions were tested using the exact same data files on HDFS. I also used > > the 1.6 version of PE for both, so the only change is on the server (1.6 > vs > > 2.2 masters and regionservers). > > > > > > It appears a refactor to ScanQueryMatcher and friends has disabled the > > ability of filters to provide SKIP hints, which prevents us from > bypassing > > version checking (so some extra cost in SQM), and appears to disable an > > optimization that avoids reseeking, leading to a serious and proportional > > regression in reseek activity and time spent in that code path. So for > > queries that use filters, there can be a substantial regression. > > > > > > Other test cases that did not use filters did not show a regression. > > > > > > A test case where I used ROW_INDEX_V1 encoding showed an expected > modest > > proportional regression in seeking time, due to the fact it is optimized > > for point queries and not optimized for the full table scan case. > > > > > > I will come back here when I understand this better. > > > > > > Here are the results for the pe --filterAll case: > > > > > > > > > 1.6.0 c1 2.2.5 c1 > > > 1.6.0 c5 2.2.5 c5 > > > 1.6.0 c10 2.2.5 c10 > > > 1.6.0 c20 2.2.5 c20 > > > 1.6.0 c50 2.2.5 c50 > > > 1.6.0 c1002.2.5 c100 > > > Counts > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (better heartbeating) > > > (better heartbeating) > > > (better heartbeating) > > > (better heartbeating) > > > (better heartbeating) > > > rpcs 1 2 200%2 6 300%2 10 > > 500%3 17 567%4 37 925%8 72 > 900% > > > block_reads 11507 11508 100%57255 57257 100%114471 > > 114474 100%230372 230377 100%578292 578298 100%1157955 > > 1157963 100% > > > block_unpacks 11507 11508 100%57255 57257 100%114471 > > 114474 100%230372 230377 100%578292 578298 100%1157955 > > 1157963 100% > > > seeker_next 10001000100%5000 > > 5000100%1 1 100%2 > > 2 100%5 5 100% > > 10 10 100% > > > store_next10009988268 100%500049940082 > > 100%1 99879401100%2 > > 199766539 100%5 499414653 100% > > 10 998836518 100% > > > store_reseek 1 11733 > ! 2 59924 > ! 8 > > 120607 > ! 6 233467 > ! 10 585357 > ! 8 > > 1163490 > ! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > cells_matched 20002000100%6000 > > 6000100%11000 11000 100%21000 > > 21000 100%51000 51000 100% > > 101000 101000 100% > > > column_hint_include 10001000100%5000 > > 5000100%1 1 100%
[jira] [Created] (HBASE-24644) Add a clause to the book noting that sometimes we short-circuit the deprecation cycle
Nick Dimiduk created HBASE-24644: Summary: Add a clause to the book noting that sometimes we short-circuit the deprecation cycle Key: HBASE-24644 URL: https://issues.apache.org/jira/browse/HBASE-24644 Project: HBase Issue Type: Task Components: community, documentation Reporter: Nick Dimiduk Let's add a note to the book that describes the circumstances around HBASE-21782 and how that can result in code not following our stated deprecation cycle guidelines. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24643) Replace Cluster#primariesOfRegionsPerServer from int array to treemap
Huaxiang Sun created HBASE-24643: Summary: Replace Cluster#primariesOfRegionsPerServer from int array to treemap Key: HBASE-24643 URL: https://issues.apache.org/jira/browse/HBASE-24643 Project: HBase Issue Type: Improvement Components: Balancer Affects Versions: 2.3.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Currently, primariesOfRegionsPerServer is an int array, moveRegion does heavy work by searching the array (linearly) and insert/remove an element requires allocating/copying the whole array. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Removing problematic terms from our project
> Is there a word that's not "master" and not "coordinator" that is clear and suitable for (diverse, polyglot) community? There are also: - captain (sounds pretty close to "master" without the negative side and it should be relatable around the world) - conductor (as in orchestra) - controller (in kafka controller assigns partitions) - RegionDriver (more relevant to what it's actually doing in hbase and borrowed from PlacementDrive of TiKV) - AdminServer (as you already have AdminClient to talk to it). On Thu, Jun 25, 2020 at 3:49 PM Sean Busbey wrote: > How about "manager"? > > (It would help me if folks could explain what is lacking in "coordinator".) > > On Thu, Jun 25, 2020, 13:32 Nick Dimiduk wrote: > > > On Wed, Jun 24, 2020 at 10:14 PM 张铎(Duo Zhang) > > wrote: > > > > > -0/+1/+1/+1 > > > > > > I’m the one who asked whether ‘master’ is safe to use without ‘slave’ > in > > > the private list. > > > > > > I’m still not convinced that it is really necessary and I do not think > > > other words like ‘coordinator’ can fully describe the role of HMaster > in > > > HBase. HBase is more than 10 years old. In the context of HBase, the > word > > > ‘HMaster’ has its own meaning. Changing the name will hurt our users > and > > > make them confusing, especially for us non native English speakers... > > > > > > > Is there a word that's not "master" and not "coordinator" that is clear > and > > suitable for (diverse, polyglot) community? > > > > Stack 于2020年6月25日 周四06:34写道: > > > > > > > +1/+1/+1/+1 where hbase3 adds the deprecation and hbase4 follows > hbase3 > > > > soon after sounds good to me. I'm up for working on this. > > > > S > > > > > > > > On Wed, Jun 24, 2020 at 2:26 PM Xu Cang wrote: > > > > > > > > > Strongly agree with what Nick said here: > > > > > > > > > > " From my perspective, we gain nothing as a project or as a > > community > > > be > > > > > willfully retaining use of language that is well understood to be > > > > > problematic or hurtful, On the contrary, we have much to gain > by > > > > > encouraging > > > > > contributions from as many people as possible." > > > > > > > > > > +1 to Andrew's proposal. > > > > > > > > > > It might be good to have a source of truth web page or README file > > for > > > > > developers and users to refer to regarding all naming transitions. > > It's > > > > > going to help both developers changing the code and users looking > for > > > > some > > > > > answers online that use old namings. > > > > > > > > > > Xu > > > > > > > > > > On Wed, Jun 24, 2020 at 2:21 PM Nick Dimiduk > > > > wrote: > > > > > > > > > > > On Tue, Jun 23, 2020 at 13:11 Sean Busbey > > wrote: > > > > > > > > > > > > > I would like to make sure I am emphatically clear that "master" > > by > > > > > itself > > > > > > > is not okay if the context is the same as what would normally > be > > a > > > > > > > master/slave context. Furthermore our use of master is clearly > > > such a > > > > > > > context. > > > > > > > > > > > > > > > > > > I agree: to me “Master”, as in “HMaster” caries with it the > > > > master/slave > > > > > > baggage. As an alternative, I prefer the term “coordinator” over > > > > > “leader”. > > > > > > Thus we would have daemons called “coordinator” and “region > > server”. > > > > > > > > > > > > To me, “master” as in “master branch” does not carry the same > > > baggage, > > > > > but > > > > > > I’m also in favor changing the name of our default branch to a > word > > > > that > > > > > is > > > > > > less conflicted. I see nothing that we gain as a community by > > > > continuing > > > > > to > > > > > > use this word. > > > > > > > > > > > > It seems to me we have, broadly speaking, consensus around making > > > > *some* > > > > > > > changes. I haven't seen a strong push for "break everything in > > the > > > > name > > > > > > of > > > > > > > expediency" (I would personally be fine with this). So barring > > > > > additional > > > > > > > discussion that favors breaking changes, current approaches > > should > > > > > > comport > > > > > > > with our existing project compatibility goals. > > > > > > > > > > > > > > Maybe we could stop talking about what-ifs and look at actual > > > > practical > > > > > > > examples? If anyone is currently up for doing the work of a PR > we > > > can > > > > > > look > > > > > > > at for one of these? > > > > > > > > > > > > > > If folks would prefer we e.g. just say "we should break > whatever > > we > > > > > need > > > > > > to > > > > > > > in 3.0.0 to make this happen" then it would be good to speak > up. > > > > > > Otherwise > > > > > > > likely we would be done with needed changes circa hbase 4, > > probably > > > > > late > > > > > > > 2021 or 2022. > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 23, 2020, 03:03 zheng wang <18031...@qq.com> > wrote: > > > > > > > > > > > > > > > IMO, master is ok if not used with slave together. > > > > > > > > > > > > > > > > > > > > > > > > -1/+1/+1/+1 > > > > >
[jira] [Created] (HBASE-24642) Apache Yetus integration
Bharath Vissapragada created HBASE-24642: Summary: Apache Yetus integration Key: HBASE-24642 URL: https://issues.apache.org/jira/browse/HBASE-24642 Project: HBase Issue Type: Sub-task Reporter: Bharath Vissapragada Assignee: Bharath Vissapragada Now that we have a clean test run with all the tests passing on trunk (with HBase trunk) , lets get the Yetus integration with test-patch utility working. That makes it much easier to implement a precommit with github. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24641) Add linter checks for patches
Bharath Vissapragada created HBASE-24641: Summary: Add linter checks for patches Key: HBASE-24641 URL: https://issues.apache.org/jira/browse/HBASE-24641 Project: HBase Issue Type: Sub-task Components: Client, native-client Reporter: Bharath Vissapragada I've worked with clang-tidy before and I think it works well. There is also a [helper script|https://clang.llvm.org/extra/doxygen/clang-tidy-diff_8py_source.html] that runs clang-tidy on a patch rather than the entire source tree. - pull in clang as a dependency - Define .clang-format - Run clang-tidy-diff on each checkin There are also other tools like cpplint from Google. I don't have any preference, so either works for me. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] VisibleForTesting annotation as it pertains to our API compatibility guidelines
I've restored the previous method signature to LoadIncrementalHFiles, so that piece is complete for the next RC. Based on the latest comments, let me update my proposed course of action. 1. restore any VisibleForTesting method signatures for 2.3.0, treat this as public API going forward. 2. purge the VisibleForTesting annotation from our codebase for 2.4+, involving: 2a. replace VisibleForTesting with IA.Private anywhere method visibility cannot be limited 2b. perhaps add a new Yetus check that would ban new use of VisibleForTesting I have filed https://issues.apache.org/jira/browse/HBASE-24640 as a tracking task for this work. If the above process is agreeable, let's add these steps as a comment for future reference. Thanks, Nick On Fri, Jun 26, 2020 at 8:38 AM 张铎(Duo Zhang) wrote: > For the LoadIncrementalHFiles class, the IA.Public annotation itself is a > problem. It should be IA.LimitPrivate(TOOLS). So I'm fine with either > adding the method back or not, the opinion of the release manager is > most important I think. > > Thanks. > > Viraj Jasani 于2020年6月26日周五 下午9:50写道: > > > Agree on replacing VFT in next 2.4.0 or 3.0.0 release and restoring the > > required > > method for now to unblock 2.3.0 RC1. > > > > > > On 2020/06/26 01:11:31, Andrew Purtell wrote: > > > Sounds fine to me. > > > > > > My earlier objection was to talk of an HBase 3 followed by an HBase 4. > We > > > don't need to do a full deprecation cycle across two major versions to > > > remove an annotation that never promised public access. (By definition, > > > tagged fields and members were VisibleForTesting (only). The 'only' was > > > implied, but I think a reasonable assumption and common knowledge.) > > > > > > On Thu, Jun 25, 2020 at 3:48 PM Sean Busbey wrote: > > > > > > > Agree on restoring the member and then getting this done for 2.4.0. > > > > > > > > > > > > On Thu, Jun 25, 2020, 15:02 Nick Dimiduk > wrote: > > > > > > > > > And now by module, > > > > > > > > > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+ | > > cut > > > > -d/ > > > > > -f2 | sort | uniq -c > > > > >6 hbase-backup > > > > > 87 hbase-client > > > > > 40 hbase-common > > > > >1 hbase-endpoint > > > > >7 hbase-hadoop-compat > > > > >3 hbase-http > > > > > 18 hbase-mapreduce > > > > >1 hbase-metrics-api > > > > > 24 hbase-procedure > > > > > 10 hbase-replication > > > > > 456 hbase-server > > > > >2 hbase-thrift > > > > >1 hbase-zookeeper > > > > > > > > > > I prefer we not make this change a prerequisite to 2.3. I would > > rather we > > > > > restore the one method modified by HBASE-24221 and do the work for > > > > > VisibleForTesting for 2.4.0. > > > > > > > > > > On Thu, Jun 25, 2020 at 12:57 PM Nick Dimiduk > > > > > wrote: > > > > > > > > > > > On Thu, Jun 25, 2020 at 12:36 PM Andrew Purtell < > > apurt...@apache.org> > > > > > > wrote: > > > > > > > > > > > >> I think we are in agreement except for a need to have a > > deprecation > > > > > cycle. > > > > > >> Just remove VisibleForTesting and replace with whatever > > alternative > > > > you > > > > > >> like. Certainly in the next minors. No strong opinion either way > > about > > > > > >> patch releases, leave as is? > > > > > >> > > > > > > > > > > > > Thanks Andrew and Bharath, I now better understand your > positions. > > > > > > > > > > > > The annotation is fairly common in our codebase, from branch-2.3, > > > > > > > > > > > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+ > > | wc > > > > -l > > > > > > 668 > > > > > > > > > > > > I don't have an easy way to cross-reference this with our AI > > > > annotations, > > > > > > but my concern is that any change we make here without a > > deprecation > > > > > cycle > > > > > > will be disruptive to users. > > > > > > > > > > > > On Thu, Jun 25, 2020 at 11:30 AM Nick Dimiduk < > ndimi...@apache.org > > > > > > > > wrote: > > > > > >> > > > > > >> > On Wed, Jun 24, 2020 at 3:19 PM Andrew Purtell < > > apurt...@apache.org > > > > > > > > > > >> > wrote: > > > > > >> > > > > > > >> > > It is possible some users may not understand what Guava's > > > > > >> > VisibleForTesting > > > > > >> > > implies, but those users are much more likely to be Java > > > > developers > > > > > or > > > > > >> > Java > > > > > >> > > developer adjacent, and familiar with what this fad > entailed. > > Such > > > > > >> > tagging > > > > > >> > > was/is done specifically to indicate the exposed field or > > method > > > > was > > > > > >> only > > > > > >> > > made to allow test access to internals, as something less > than > > > > > public. > > > > > >> > > > > > > > >> > > For us to treat such annotated fields and methods as public > > after > > > > > all > > > > > >> is > > > > > >> > > unnecessary, possibly surprising, and not semantically sound > > > > (IMHO). > > > > > >> > > > > > > > >> > > > > > > >> > I don't want to preserve use of VisibleForTesting as an > > indicato
[jira] [Created] (HBASE-24640) Purge use of VisibleForTesting
Nick Dimiduk created HBASE-24640: Summary: Purge use of VisibleForTesting Key: HBASE-24640 URL: https://issues.apache.org/jira/browse/HBASE-24640 Project: HBase Issue Type: Task Components: community Reporter: Nick Dimiduk >From the dev-list thread ["[DISCUSS] VisibleForTesting annotation as it >pertains to our API compatibility >guidelines"|https://lists.apache.org/thread.html/rc7c7c66f134fe135d0a4454a883215e26ff3d20e5a31ecd6a2d1db77%40%3Cdev.hbase.apache.org%3E], > when used in classes annotated with interface audience other than IA.Private, >the VisibleForTesting annotation is confusing and considered harmful. The >consensus is that we do not want to use this annotation as part of the >definition of our public APIs, and we need to remove the point of confusion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24221) Support bulkLoadHFile by family
[ https://issues.apache.org/jira/browse/HBASE-24221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk resolved HBASE-24221. -- Fix Version/s: (was: 2.4.0) Resolution: Fixed And branch-2.2. Thanks for the reviews. > Support bulkLoadHFile by family > --- > > Key: HBASE-24221 > URL: https://issues.apache.org/jira/browse/HBASE-24221 > Project: HBase > Issue Type: Improvement > Components: HFile >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.4 >Reporter: niuyulin >Assignee: niuyulin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.5 > > > Support bulkLoadHFile by family to avoid long time waiting of bulkLoadHFile > because of compacting at server side -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24221) Support bulkLoadHFile by family
[ https://issues.apache.org/jira/browse/HBASE-24221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk reopened HBASE-24221: -- Reopening, pardon. need branch-2.2 as well. > Support bulkLoadHFile by family > --- > > Key: HBASE-24221 > URL: https://issues.apache.org/jira/browse/HBASE-24221 > Project: HBase > Issue Type: Improvement > Components: HFile >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.4 >Reporter: niuyulin >Assignee: niuyulin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.5 > > > Support bulkLoadHFile by family to avoid long time waiting of bulkLoadHFile > because of compacting at server side -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24221) Support bulkLoadHFile by family
[ https://issues.apache.org/jira/browse/HBASE-24221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk resolved HBASE-24221. -- Resolution: Fixed Addendum pushed to branch-2 and branch-2.3. > Support bulkLoadHFile by family > --- > > Key: HBASE-24221 > URL: https://issues.apache.org/jira/browse/HBASE-24221 > Project: HBase > Issue Type: Improvement > Components: HFile >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.4 >Reporter: niuyulin >Assignee: niuyulin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.5 > > > Support bulkLoadHFile by family to avoid long time waiting of bulkLoadHFile > because of compacting at server side -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24638) Edit doc on (offheap) memory management
[ https://issues.apache.org/jira/browse/HBASE-24638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24638. --- Fix Version/s: 3.0.0-alpha-1 Assignee: Michael Stack Resolution: Fixed Merged doc change. > Edit doc on (offheap) memory management > --- > > Key: HBASE-24638 > URL: https://issues.apache.org/jira/browse/HBASE-24638 > Project: HBase > Issue Type: Sub-task > Components: documentation >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Gave it a read over to try and figure current state of memory management in > hbase-2.3.0. Updated it to reflect more of what the current state is. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] VisibleForTesting annotation as it pertains to our API compatibility guidelines
For the LoadIncrementalHFiles class, the IA.Public annotation itself is a problem. It should be IA.LimitPrivate(TOOLS). So I'm fine with either adding the method back or not, the opinion of the release manager is most important I think. Thanks. Viraj Jasani 于2020年6月26日周五 下午9:50写道: > Agree on replacing VFT in next 2.4.0 or 3.0.0 release and restoring the > required > method for now to unblock 2.3.0 RC1. > > > On 2020/06/26 01:11:31, Andrew Purtell wrote: > > Sounds fine to me. > > > > My earlier objection was to talk of an HBase 3 followed by an HBase 4. We > > don't need to do a full deprecation cycle across two major versions to > > remove an annotation that never promised public access. (By definition, > > tagged fields and members were VisibleForTesting (only). The 'only' was > > implied, but I think a reasonable assumption and common knowledge.) > > > > On Thu, Jun 25, 2020 at 3:48 PM Sean Busbey wrote: > > > > > Agree on restoring the member and then getting this done for 2.4.0. > > > > > > > > > On Thu, Jun 25, 2020, 15:02 Nick Dimiduk wrote: > > > > > > > And now by module, > > > > > > > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+ | > cut > > > -d/ > > > > -f2 | sort | uniq -c > > > >6 hbase-backup > > > > 87 hbase-client > > > > 40 hbase-common > > > >1 hbase-endpoint > > > >7 hbase-hadoop-compat > > > >3 hbase-http > > > > 18 hbase-mapreduce > > > >1 hbase-metrics-api > > > > 24 hbase-procedure > > > > 10 hbase-replication > > > > 456 hbase-server > > > >2 hbase-thrift > > > >1 hbase-zookeeper > > > > > > > > I prefer we not make this change a prerequisite to 2.3. I would > rather we > > > > restore the one method modified by HBASE-24221 and do the work for > > > > VisibleForTesting for 2.4.0. > > > > > > > > On Thu, Jun 25, 2020 at 12:57 PM Nick Dimiduk > > > wrote: > > > > > > > > > On Thu, Jun 25, 2020 at 12:36 PM Andrew Purtell < > apurt...@apache.org> > > > > > wrote: > > > > > > > > > >> I think we are in agreement except for a need to have a > deprecation > > > > cycle. > > > > >> Just remove VisibleForTesting and replace with whatever > alternative > > > you > > > > >> like. Certainly in the next minors. No strong opinion either way > about > > > > >> patch releases, leave as is? > > > > >> > > > > > > > > > > Thanks Andrew and Bharath, I now better understand your positions. > > > > > > > > > > The annotation is fairly common in our codebase, from branch-2.3, > > > > > > > > > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+ > | wc > > > -l > > > > > 668 > > > > > > > > > > I don't have an easy way to cross-reference this with our AI > > > annotations, > > > > > but my concern is that any change we make here without a > deprecation > > > > cycle > > > > > will be disruptive to users. > > > > > > > > > > On Thu, Jun 25, 2020 at 11:30 AM Nick Dimiduk > > > > > wrote: > > > > >> > > > > >> > On Wed, Jun 24, 2020 at 3:19 PM Andrew Purtell < > apurt...@apache.org > > > > > > > > >> > wrote: > > > > >> > > > > > >> > > It is possible some users may not understand what Guava's > > > > >> > VisibleForTesting > > > > >> > > implies, but those users are much more likely to be Java > > > developers > > > > or > > > > >> > Java > > > > >> > > developer adjacent, and familiar with what this fad entailed. > Such > > > > >> > tagging > > > > >> > > was/is done specifically to indicate the exposed field or > method > > > was > > > > >> only > > > > >> > > made to allow test access to internals, as something less than > > > > public. > > > > >> > > > > > > >> > > For us to treat such annotated fields and methods as public > after > > > > all > > > > >> is > > > > >> > > unnecessary, possibly surprising, and not semantically sound > > > (IMHO). > > > > >> > > > > > > >> > > > > > >> > I don't want to preserve use of VisibleForTesting as an > indicator of > > > > >> public > > > > >> > API. I want to ensure that we're clear to our downstream users > > > > >> > that its presence is not a factor in determining public API. For > > > > >> example, I > > > > >> > don't want to update our book to give any meaning to this > > > annotation, > > > > >> and I > > > > >> > don't want to update our javadoc filters to take it into account > > > when > > > > >> > generating the various versions of javadoc that we publish. I > want > > > to > > > > >> purge > > > > >> > it from the discussion by annotating the methods it decorates > with > > > the > > > > >> > symbols we do use to define our public API. The steps I propose > > > above > > > > >> are > > > > >> > my suggestion of how we work toward that goal. > > > > >> > > > > > >> > Does anyone have a counter-proposal to the steps I've outlined > > > above? > > > > A > > > > >> > resolution to this discussion is now the final blocker on > 2.3.0rc1. > > > > >> > > > > > >> > Thanks, > > > > >> > Nick > > > > >> > > > > > >> > On Wed, Jun 24, 2020 at 2:53 PM Sean Busbey > > > >
Re: [DISCUSS] VisibleForTesting annotation as it pertains to our API compatibility guidelines
Agree on replacing VFT in next 2.4.0 or 3.0.0 release and restoring the required method for now to unblock 2.3.0 RC1. On 2020/06/26 01:11:31, Andrew Purtell wrote: > Sounds fine to me. > > My earlier objection was to talk of an HBase 3 followed by an HBase 4. We > don't need to do a full deprecation cycle across two major versions to > remove an annotation that never promised public access. (By definition, > tagged fields and members were VisibleForTesting (only). The 'only' was > implied, but I think a reasonable assumption and common knowledge.) > > On Thu, Jun 25, 2020 at 3:48 PM Sean Busbey wrote: > > > Agree on restoring the member and then getting this done for 2.4.0. > > > > > > On Thu, Jun 25, 2020, 15:02 Nick Dimiduk wrote: > > > > > And now by module, > > > > > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+ | cut > > -d/ > > > -f2 | sort | uniq -c > > >6 hbase-backup > > > 87 hbase-client > > > 40 hbase-common > > >1 hbase-endpoint > > >7 hbase-hadoop-compat > > >3 hbase-http > > > 18 hbase-mapreduce > > >1 hbase-metrics-api > > > 24 hbase-procedure > > > 10 hbase-replication > > > 456 hbase-server > > >2 hbase-thrift > > >1 hbase-zookeeper > > > > > > I prefer we not make this change a prerequisite to 2.3. I would rather we > > > restore the one method modified by HBASE-24221 and do the work for > > > VisibleForTesting for 2.4.0. > > > > > > On Thu, Jun 25, 2020 at 12:57 PM Nick Dimiduk > > wrote: > > > > > > > On Thu, Jun 25, 2020 at 12:36 PM Andrew Purtell > > > > wrote: > > > > > > > >> I think we are in agreement except for a need to have a deprecation > > > cycle. > > > >> Just remove VisibleForTesting and replace with whatever alternative > > you > > > >> like. Certainly in the next minors. No strong opinion either way about > > > >> patch releases, leave as is? > > > >> > > > > > > > > Thanks Andrew and Bharath, I now better understand your positions. > > > > > > > > The annotation is fairly common in our codebase, from branch-2.3, > > > > > > > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+ | wc > > -l > > > > 668 > > > > > > > > I don't have an easy way to cross-reference this with our AI > > annotations, > > > > but my concern is that any change we make here without a deprecation > > > cycle > > > > will be disruptive to users. > > > > > > > > On Thu, Jun 25, 2020 at 11:30 AM Nick Dimiduk > > > wrote: > > > >> > > > >> > On Wed, Jun 24, 2020 at 3:19 PM Andrew Purtell > > > > > >> > wrote: > > > >> > > > > >> > > It is possible some users may not understand what Guava's > > > >> > VisibleForTesting > > > >> > > implies, but those users are much more likely to be Java > > developers > > > or > > > >> > Java > > > >> > > developer adjacent, and familiar with what this fad entailed. Such > > > >> > tagging > > > >> > > was/is done specifically to indicate the exposed field or method > > was > > > >> only > > > >> > > made to allow test access to internals, as something less than > > > public. > > > >> > > > > > >> > > For us to treat such annotated fields and methods as public after > > > all > > > >> is > > > >> > > unnecessary, possibly surprising, and not semantically sound > > (IMHO). > > > >> > > > > > >> > > > > >> > I don't want to preserve use of VisibleForTesting as an indicator of > > > >> public > > > >> > API. I want to ensure that we're clear to our downstream users > > > >> > that its presence is not a factor in determining public API. For > > > >> example, I > > > >> > don't want to update our book to give any meaning to this > > annotation, > > > >> and I > > > >> > don't want to update our javadoc filters to take it into account > > when > > > >> > generating the various versions of javadoc that we publish. I want > > to > > > >> purge > > > >> > it from the discussion by annotating the methods it decorates with > > the > > > >> > symbols we do use to define our public API. The steps I propose > > above > > > >> are > > > >> > my suggestion of how we work toward that goal. > > > >> > > > > >> > Does anyone have a counter-proposal to the steps I've outlined > > above? > > > A > > > >> > resolution to this discussion is now the final blocker on 2.3.0rc1. > > > >> > > > > >> > Thanks, > > > >> > Nick > > > >> > > > > >> > On Wed, Jun 24, 2020 at 2:53 PM Sean Busbey > > > wrote: > > > >> > > > > > >> > > > Andrew are you specifically opposed to using a deprecation cycle > > > to > > > >> > > > formally label as private anything that currently has a > > > >> > VisibleForTesting > > > >> > > > annotation? > > > >> > > > > > > >> > > > On Wed, Jun 24, 2020, 16:07 Andrew Purtell > > > > > >> > wrote: > > > >> > > > > > > >> > > > > I am -1 on treating VisibleForTesting as public API. > > > Semantically > > > >> it > > > >> > > > makes > > > >> > > > > no sense. > > > >> > > > > > > > >> > > > > On Wed, Jun 24, 2020 at 12:22 PM Nick Dimiduk < > > > >> ndimi...@apache.org> > > > >> >
[jira] [Created] (HBASE-24639) RequestId Tracing feature for HBase
Pranshu Khandelwal created HBASE-24639: -- Summary: RequestId Tracing feature for HBase Key: HBASE-24639 URL: https://issues.apache.org/jira/browse/HBASE-24639 Project: HBase Issue Type: New Feature Reporter: Pranshu Khandelwal -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24626) [HBCK2] Remove reference to hase I.A. private class CommonFsUtils from FsRegionsMetaRecoverer
[ https://issues.apache.org/jira/browse/HBASE-24626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-24626. -- Resolution: Fixed Merged to master. > [HBCK2] Remove reference to hase I.A. private class CommonFsUtils from > FsRegionsMetaRecoverer > - > > Key: HBASE-24626 > URL: https://issues.apache.org/jira/browse/HBASE-24626 > Project: HBase > Issue Type: Improvement > Components: hbase-operator-tools, hbck2 >Affects Versions: hbase-operator-tools-1.0.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: hbase-operator-tools-1.1.0 > > > FsRegionsMetaRecoverer used to reference I.A. private targeted interface > FSUtils, which changed on hbase 2.3, causing hbck2 fail to compile. > HBASE-24482 fixed it by pointing to CommonFSUtils interface, where the > methods it was relying upon was actually defined. Since this is also a IA > private interface, there's no compatibility guarantees. This PR removes > reference to CommonFSUtils on FsRegionsMetaRecoverer. Other classes in hbck2 > still require similar work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: HBase 2 slower than HBase 1?
Great investigation Andy. Do you know any Jiras which made changes in SQM? Would be great if you can attach your patch which tracks the scan flow. If we have a Jira for this issue, can you pls attach? Anoop On Fri, Jun 26, 2020 at 1:56 AM Andrew Purtell wrote: > Related, I think I found a bug in branch-1 where we don’t heartbeat in the > filter all case until we switch store files, so scanning a very large store > file might time out with client defaults. Remarking on this here so I don’t > forget to follow up. > > > On Jun 25, 2020, at 12:27 PM, Andrew Purtell > wrote: > > > > > > I repeated this test with pe --filterAll and the results were revealing, > at least for this case. I also patched in thread local hash map for atomic > counters that I could update from code paths in SQM, StoreScanner, > HFileReader*, and HFileBlock. Because a RPC is processed by a single > handler thread I could update counters and accumulate micro-timings via > System#nanoTime() per RPC and dump them out of CallRunner in some new trace > logging. I spent a couple of days making sure the instrumentation was > placed equivalently in both 1.6 and 2.2 code bases and was producing > consistent results. I can provide these patches upon request. > > > > Again, test tables with one family and 1, 5, 10, 20, 50, and 100 > distinct column-qualifiers per row. After loading the table I made a > snapshot and cloned the snapshot for testing, for both 1.6 and 2.2, so both > versions were tested using the exact same data files on HDFS. I also used > the 1.6 version of PE for both, so the only change is on the server (1.6 vs > 2.2 masters and regionservers). > > > > It appears a refactor to ScanQueryMatcher and friends has disabled the > ability of filters to provide SKIP hints, which prevents us from bypassing > version checking (so some extra cost in SQM), and appears to disable an > optimization that avoids reseeking, leading to a serious and proportional > regression in reseek activity and time spent in that code path. So for > queries that use filters, there can be a substantial regression. > > > > Other test cases that did not use filters did not show a regression. > > > > A test case where I used ROW_INDEX_V1 encoding showed an expected modest > proportional regression in seeking time, due to the fact it is optimized > for point queries and not optimized for the full table scan case. > > > > I will come back here when I understand this better. > > > > Here are the results for the pe --filterAll case: > > > > > > 1.6.0 c1 2.2.5 c1 > > 1.6.0 c5 2.2.5 c5 > > 1.6.0 c10 2.2.5 c10 > > 1.6.0 c20 2.2.5 c20 > > 1.6.0 c50 2.2.5 c50 > > 1.6.0 c1002.2.5 c100 > > Counts > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (better heartbeating) > > (better heartbeating) > > (better heartbeating) > > (better heartbeating) > > (better heartbeating) > > rpcs 1 2 200%2 6 300%2 10 > 500%3 17 567%4 37 925%8 72 900% > > block_reads 11507 11508 100%57255 57257 100%114471 > 114474 100%230372 230377 100%578292 578298 100%1157955 > 1157963 100% > > block_unpacks 11507 11508 100%57255 57257 100%114471 > 114474 100%230372 230377 100%578292 578298 100%1157955 > 1157963 100% > > seeker_next 10001000100%5000 > 5000100%1 1 100%2 > 2 100%5 5 100% > 10 10 100% > > store_next10009988268 100%500049940082 > 100%1 99879401100%2 > 199766539 100%5 499414653 100% > 10 998836518 100% > > store_reseek 1 11733 > ! 2 59924 > ! 8 > 120607 > ! 6 233467 > ! 10 585357 > ! 8 > 1163490 > ! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > cells_matched 20002000100%6000 > 6000100%11000 11000 100%21000 > 21000 100%51000 51000 100% > 101000 101000 100% > > column_hint_include 10001000100%5000 > 5000100%1 1 100% > 2 2 100%5 5 > 100%10 10 100% > > filter_hint_skip 10001000100%5000 > 5000100%1 1 100% > 2 2 100%5 5 > 100%10 10 100% > > sqm_hint_done 999 999 100%998 998 100%998 > 998 100%
Re: [DISCUSS] IncreasingToUpperBoundRegionSplitPolicy can lead to unpredictable large regions size
> > It's supposed to be controlling how big the region is? > Precisely. It may not make a big difference for compaction itself, but might have further implications on overall RS resource usage, with larger than expected regions. Given the feedback provided here, I guess we can proceed with current proposal from HBASE-24530 all the way to maintenance branches (it doesn't change IncreasingToUpperBoundRegionSplitPolicy behaviour, but adds a new policy that in fact respect region max size for the whole region). We can then fix IncreasingToUpperBoundRegionSplitPolicy at minor versions branches as suggested by Busbey. Em qua., 24 de jun. de 2020 às 18:00, Andrew Purtell escreveu: > It's supposed to be controlling how big the region is? > > On Wed, Jun 24, 2020 at 8:42 AM 张铎(Duo Zhang) > wrote: > > > I think one of the goals of limiting the store file size is for > compaction. > > As long as we just do compactions per family, what is the actual problem > if > > the whole region is too big? > > > > Wellington Chevreuil 于2020年6月24日周三 > > 下午10:56写道: > > > > > The expected behaviour for the property is well documented, so renaming > > and > > > deprecation would rather be a separate task. HBASE-24530 should concern > > > with making IncreasingToUpperBoundRegionSplitPolicy respect what > > > hbase.hregion.max.filesize and MAX_FILESIZE table level descriptor > > > documentation mandate, as well as being consistent with other split > > > policies behaviour in relation to these properties. > > > > > > Em qua., 24 de jun. de 2020 às 08:00, Anoop John < > anoop.hb...@gmail.com> > > > escreveu: > > > > > > > If we are going to change (correct) hbase.hregion.max.filesize to > > > > hbase.hregion.max.size (Via proper deprecation cycle) also along > with > > > this > > > > change, am good. > > > > > > > > Anoop > > > > > > > > On Wed, Jun 24, 2020 at 1:29 AM Sean Busbey > wrote: > > > > > > > > > Let's fix via approach #3. Get it done for next minor versions and > > then > > > > if > > > > > folks aren't sure about principle of least surprise we can talk > about > > > > > wether it goes into maintenance releases. > > > > > > > > > > On Tue, Jun 23, 2020, 13:07 Andrew Purtell > > > wrote: > > > > > > > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation > is > > > > > > violating those configs. > > > > > > > > > > > > Thank you for pointing this out. I feel even more strongly now > this > > > is > > > > a > > > > > > bug. > > > > > > I vote for #3. > > > > > > > > > > > > On Tue, Jun 23, 2020 at 2:42 AM Wellington Chevreuil < > > > > > > wellington.chevre...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > The config name was/is hbase.hregion.max.*filesize* and > > never * > > > > > > > > hbase.hregion.max.size*. > > > > > > > > > > > > > > > > > > > > > > Description for hbase.hregion.max.filesize is very clear > stating > > > that > > > > > > it's > > > > > > > the sum of all hfiles in the region that should not exceed this > > > > > property > > > > > > > value. And we not always use *hbase.hregion.max.filesize* to > > > > determine > > > > > > the > > > > > > > limit, but a MAX_FILESIZE table level descriptor whose > > description > > > > > reads > > > > > > as > > > > > > > below, on TableDescriptorBuilder javadoc: > > > > > > > > > > > > > > /** > > > > > > >* Returns the maximum size upto which a region can grow to > > after > > > > > > which a > > > > > > >* region split is triggered. The region size is represented > by > > > the > > > > > > size > > > > > > > of > > > > > > >* the biggest store file in that region. > > > > > > >* > > > > > > >* @return max hregion size for table, -1 if not set. > > > > > > >*/ > > > > > > > > > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation > is > > > > > > violating > > > > > > > those configs. > > > > > > > > > > > > > > Do we have a consensus on applying #3 for all active branches? > If > > > > so, I > > > > > > > would instruct HBASE-24530 to proceed as such. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Em dom., 21 de jun. de 2020 às 19:09, Andrew Purtell < > > > > > > > andrew.purt...@gmail.com> escreveu: > > > > > > > > > > > > > > > ‘Filesize’ and ‘size’ are ambiguous. They are open to > > > > interpretation > > > > > > and > > > > > > > I > > > > > > > > don’t see one as more clear than the other, other than to > imply > > > > > > something > > > > > > > > about file level measures being the determining factor. It > > > doesn’t > > > > > > convey > > > > > > > > more semantics beyond that, ie one file trips the limit or > the > > > > > combined > > > > > > > > sizes of all files trips the limit. We can fix that with > > > clarifying > > > > > > > > documentation. While doing so we also have an opportunity to > > fix > > > > > > > something > > > > > > > > if our consensus is the current policy is not the usual user > > > > > > expectation. > > > > > > > > > > > > > > >