[jira] [Created] (HBASE-24646) Set the log level for ScheduledChore to INFO in HBTU

2020-06-26 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-24646:
-

 Summary: Set the log level for ScheduledChore to INFO in HBTU
 Key: HBASE-24646
 URL: https://issues.apache.org/jira/browse/HBASE-24646
 Project: HBase
  Issue Type: Task
  Components: test
Reporter: Duo Zhang


Now we changed the thead waken interval to 100ms in tests, which causes the 
test log beng flooded with

{noformat}
2020-06-27 10:28:00,680 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] 
hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms.
2020-06-27 10:28:00,680 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] 
hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
2020-06-27 10:28:00,684 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] 
hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms.
2020-06-27 10:28:00,684 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] 
hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
2020-06-27 10:28:00,696 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] 
hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
2020-06-27 10:28:00,696 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] 
hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms.
2020-06-27 10:28:00,782 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] 
hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
2020-06-27 10:28:00,782 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] 
hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms.
2020-06-27 10:28:00,786 DEBUG [regionserver/zhangduo-ubuntu:0.Chore.1] 
hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms.
{noformat}

We could set the log level to INFO when starting a mini cluster in HBTU.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] Normalizer and pre-split tables

2020-06-26 Thread Nick Dimiduk
Heya,

I've seen a lot of use-cases where the normalizer would be a nice solution
for operators and application developers. I've been trying to beef it up a
bit to handle these cases. However, some of these considerations are at
odds, so I want to vet the ideas here.

The normalizer is a background chore in the HMaster that attempts to
converge region sizes within a table toward the average region size. It has
a pretty wide error bar, but that's the overall goal.

Early on, it was observed that an operator needs to pre-split a table, so
special considerations were included, by way of
`hbase.normalizer.min.region.count`,
`hbase.normalizer.merge.min_region_age.days`, and
`hbase.normalizer.merge.min_region_size.mb`. All these nobs are designed to
give an operator means of controlling this behavior.

We have (what I see as) a competing objective: doing away with empty, or
nearly-empty regions. The use-case is pretty common when there's a TTL
applied to a table, especially if there's also a timestamp component in the
rowkey. In this case, we want the normalizer to "merge away" these empty
regions.

The trouble is we ship defaults for all of the `*min*` configs, and right
now there's no way to "unset" them, disable the functionality. Which means
there still isn't a way to support the empty regions use-case without
awkward special-case checks. This is where I'm looking for suggestions from
the community. There's some discussion under way over on the PR for
HBASE-24583. Please take a look.

Thanks in advance,
Nick


[jira] [Resolved] (HBASE-20819) Use TableDescriptor to replace HTableDescriptor in hbase-shell module

2020-06-26 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-20819.
---
Resolution: Fixed

Merged. Thanks for nice contrib [~bitoffdev] (thanks for reviews [~zhangduo] 
and [~busbey])

> Use TableDescriptor to replace HTableDescriptor in hbase-shell module
> -
>
> Key: HBASE-20819
> URL: https://issues.apache.org/jira/browse/HBASE-20819
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Affects Versions: 2.0.0
>Reporter: Xiaolin Ha
>Assignee: Elliot Miller
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
> Attachments: HBASE-20819.branch-2.001.patch, 
> HBASE-20819.branch-2.002.patch, 
> HBaseConstants-b5563432922268c7a16deacbb51bfba89c0a2aba.txt, 
> HBaseConstants-cf2aa593e590133b0c76d3723b4074b28b55dcc9.txt, 
> HBaseConstants-diff.txt
>
>
> HTableDescriptor is deprecated as of release 2.0.0, and will be removed in 
> 3.0.0. This patch replaces all usages of HTableDescriptor and 
> HColumnDescriptor in the hbase-shell module so that HTableDescriptor can be 
> removed.
> There a few other consequences of this change:
>  * Ruby methods relating to HTableDescriptor and HColumnDescriptor have been 
> removed. This is noted in "Release Note" on this issue.
>  * We no longer import constants from HTableDescriptor and HColumnDescriptor 
> into the ruby HBaseConstants module. Instead, we import them from 
> ColumnFamilyDescriptorBuilder and TableDescriptorBuilder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24645) Include dump files in the jenkins archive

2020-06-26 Thread Nick Dimiduk (Jira)
Nick Dimiduk created HBASE-24645:


 Summary: Include dump files in the jenkins archive
 Key: HBASE-24645
 URL: https://issues.apache.org/jira/browse/HBASE-24645
 Project: HBase
  Issue Type: Task
  Components: build
Reporter: Nick Dimiduk


Tracking down jenkins test failures in the face of 
{{org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?}} is difficult right now. Let's include the "dump" 
files in our archives so that there's something to look at, maybe find a usable 
stack trace.

The failures I want to track look like

{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test) on 
project hbase-common: There are test failures.
[ERROR] 
[ERROR] Please refer to 
/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.3@2/component/hbase-common/target/surefire-reports
 for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, 
[date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying 
goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd 
/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.3@2/component/hbase-common
 && /usr/lib/jvm/jdk8u232-b09/jre/bin/java -enableassertions 
-Dhbase.build.id=2020-06-24T15:53:27Z -Xmx2200m 
-Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true 
-Djava.awt.headless=true -Djdk.net.URLClassPath.disableClassPathURLCheck=true 
-Dorg.apache.hbase.thirdparty.io.netty.leakDetection.level=advanced 
-Dio.netty.eventLoopThreads=3 -jar 
/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.3@2/component/hbase-common/target/surefire/surefirebooter9093790277421313959.jar
 
/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.3@2/component/hbase-common/target/surefire
 2020-06-24T15-53-31_857-jvmRun1 surefire8040542579565444758tmp 
surefire_783341445535876751188tmp
[ERROR] Process Exit Code: 0
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Removing problematic terms from our project

2020-06-26 Thread Andrew Purtell
Circling back after more inputs, if we use this as a description of the
proposals:

1. Replace "master"/"hmaster" with ???, this one has by far the most
significant impact and both opinion and interpretation on this one is mixed.

2. Replace "slave" with "follower", seems to impact the cross cluster
replication subsystem only.

3. Replace "black list" with "deny list".

4. Replace "white list" with "accept list".

Then by my read of the responses we have consensus to do #2, #3, and #4.
They were not controversial. JIRAs and patches will be welcome. Seems
pretty clear committers and PMC will approve and do what is needed to
complete any necessary deprecation cycle.

Regarding #1, opinion is mixed. By my read I also think committers and PMC
will approve patches and do what is needed to complete any necessary
deprecation cycle for this one too. Enough PMC members expressed support to
successfully vote on a release (although not if there were to be opposing
votes). If a contributor were to open a JIRA and provide patches for this,
there would be more discussion. There is no consensus, yet, on what
replacement term is best. Personally, I can accept Zheng's recent
suggestion of "controller". I can see how syllable count matters.

I don't mean this summary to close the conversation. It is only a
checkpoint.

If anyone reading this has an opinion they do not wish to express
publically, you are welcome to write to priv...@hbase.apache.org to state
your opinion and the PMC will of course respectfully listen to it.



On Thu, Jun 25, 2020 at 7:47 PM zheng wang <18031...@qq.com> wrote:

> I like the controller.
>
>
> Coordinator is a bit long for me to write and speak.
> Manager and Admin is used somewhere yet in HBase.
>
>
>
>
> -- 原始邮件 --
> 发件人: "Andrew Purtell" 发送时间: 2020年6月26日(星期五) 上午9:08
> 收件人: "Hbase-User" 抄送: "dev" 主题: Re: [DISCUSS] Removing problematic terms from our project
>
>
>
> > - AdminServer (as you already have AdminClient to talk to it).
>
> Oh... I like AdminServer. AdminServer (serving admin functions) and
> RegionServer (serving region data).
>
> On Thu, Jun 25, 2020 at 4:46 PM Andrey Elenskiy
> 
> > > Is there a word that's not "master" and not "coordinator" that
> is clear
> > and
> > suitable for (diverse, polyglot) community?
> >
> > There are also:
> > - captain (sounds pretty close to "master" without the negative side
> and it
> > should be relatable around the world)
> > - conductor (as in orchestra)
> > - controller (in kafka controller assigns partitions)
> > - RegionDriver (more relevant to what it's actually doing in hbase and
> > borrowed from PlacementDrive of TiKV)
> > - AdminServer (as you already have AdminClient to talk to it).
> >
> > On Thu, Jun 25, 2020 at 3:49 PM Sean Busbey  wrote:
> >
> > > How about "manager"?
> > >
> > > (It would help me if folks could explain what is lacking in
> > "coordinator".)
> > >
> > > On Thu, Jun 25, 2020, 13:32 Nick Dimiduk  wrote:
> > >
> > > > On Wed, Jun 24, 2020 at 10:14 PM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> > > > wrote:
> > > >
> > > > > -0/+1/+1/+1
> > > > >
> > > > > I’m the one who asked whether ‘master’ is safe to use
> without ‘slave’
> > > in
> > > > > the private list.
> > > > >
> > > > > I’m still not convinced that it is really necessary
> and I do not
> > think
> > > > > other words like ‘coordinator’ can fully describe the
> role of HMaster
> > > in
> > > > > HBase. HBase is more than 10 years old. In the context
> of HBase, the
> > > word
> > > > > ‘HMaster’ has its own meaning. Changing the name will
> hurt our users
> > > and
> > > > > make them confusing, especially for us non native
> English speakers...
> > > > >
> > > >
> > > > Is there a word that's not "master" and not "coordinator"
> that is clear
> > > and
> > > > suitable for (diverse, polyglot) community?
> > > >
> > > > Stack  > > > >
> > > > > > +1/+1/+1/+1 where hbase3 adds the deprecation and
> hbase4 follows
> > > hbase3
> > > > > > soon after sounds good to me. I'm up for working
> on this.
> > > > > > S
> > > > > >
> > > > > > On Wed, Jun 24, 2020 at 2:26 PM Xu Cang <
> xuc...@apache.org> wrote:
> > > > > >
> > > > > > > Strongly agree with what Nick said here:
> > > > > > >
> > > > > > >  " From my perspective, we gain nothing
> as a project or as a
> > > > community
> > > > > be
> > > > > > > willfully retaining use of language that is
> well understood to be
> > > > > > > problematic or hurtful, On the contrary,
> we have much to gain
> > > by
> > > > > > > encouraging
> > > > > > > contributions from as many people as
> possible."
> > > > > > >
> > > > > > > +1 to Andrew's proposal.
> > > > > > >
> > > > > > > It might be good to have a source of truth
> web page or README
> > file
> > > > for
> > > > > > > developers and users to refer to regarding
> all naming
> > transitions.
> > > > It's
> > > > > > > going to help both developers changing the
> code and users looking
> > > for
> > > > > > some
> > > > > 

Re: HBase 2 slower than HBase 1?

2020-06-26 Thread Andrew Purtell
Hey Anoop, I opened https://issues.apache.org/jira/browse/HBASE-24637 and
attached the patches and script used to make the comparison.

On Fri, Jun 26, 2020 at 2:33 AM Anoop John  wrote:

> Great investigation Andy.  Do you know any Jiras which made changes in SQM?
> Would be great if you can attach your patch which tracks the scan flow.  If
> we have a Jira for this issue, can you pls attach?
>
> Anoop
>
> On Fri, Jun 26, 2020 at 1:56 AM Andrew Purtell 
> wrote:
>
> > Related, I think I found a bug in branch-1 where we don’t heartbeat in
> the
> > filter all case until we switch store files, so scanning a very large
> store
> > file might time out with client defaults. Remarking on this here so I
> don’t
> > forget to follow up.
> >
> > > On Jun 25, 2020, at 12:27 PM, Andrew Purtell 
> > wrote:
> > >
> > > 
> > > I repeated this test with pe --filterAll and the results were
> revealing,
> > at least for this case. I also patched in thread local hash map for
> atomic
> > counters that I could update from code paths in SQM, StoreScanner,
> > HFileReader*, and HFileBlock. Because a RPC is processed by a single
> > handler thread I could update counters and accumulate micro-timings via
> > System#nanoTime() per RPC and dump them out of CallRunner in some new
> trace
> > logging. I spent a couple of days making sure the instrumentation was
> > placed equivalently in both 1.6 and 2.2 code bases and was producing
> > consistent results. I can provide these patches upon request.
> > >
> > > Again, test tables with one family and 1, 5, 10, 20, 50, and 100
> > distinct column-qualifiers per row. After loading the table I made a
> > snapshot and cloned the snapshot for testing, for both 1.6 and 2.2, so
> both
> > versions were tested using the exact same data files on HDFS. I also used
> > the 1.6 version of PE for both, so the only change is on the server (1.6
> vs
> > 2.2 masters and regionservers).
> > >
> > > It appears a refactor to ScanQueryMatcher and friends has disabled the
> > ability of filters to provide SKIP hints, which prevents us from
> bypassing
> > version checking (so some extra cost in SQM), and appears to disable an
> > optimization that avoids reseeking, leading to a serious and proportional
> > regression in reseek activity and time spent in that code path. So for
> > queries that use filters, there can be a substantial regression.
> > >
> > > Other test cases that did not use filters did not show a regression.
> > >
> > > A test case where I used ROW_INDEX_V1 encoding showed an expected
> modest
> > proportional regression in seeking time, due to the fact it is optimized
> > for point queries and not optimized for the full table scan case.
> > >
> > > I will come back here when I understand this better.
> > >
> > > Here are the results for the pe --filterAll case:
> > >
> > >
> > > 1.6.0 c1  2.2.5 c1
> > > 1.6.0 c5  2.2.5 c5
> > > 1.6.0 c10 2.2.5 c10
> > > 1.6.0 c20 2.2.5 c20
> > > 1.6.0 c50 2.2.5 c50
> > > 1.6.0 c1002.2.5 c100
> > > Counts
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > (better heartbeating)
> > > (better heartbeating)
> > > (better heartbeating)
> > > (better heartbeating)
> > > (better heartbeating)
> > > rpcs  1   2   200%2   6   300%2   10
> > 500%3   17  567%4   37  925%8   72
> 900%
> > > block_reads   11507   11508   100%57255   57257   100%114471
> > 114474  100%230372  230377  100%578292  578298  100%1157955
> > 1157963 100%
> > > block_unpacks 11507   11508   100%57255   57257   100%114471
> > 114474  100%230372  230377  100%578292  578298  100%1157955
> > 1157963 100%
> > > seeker_next   10001000100%5000
> > 5000100%1   1   100%2
> >  2   100%5   5   100%
> > 10  10  100%
> > > store_next10009988268 100%500049940082
> >   100%1   99879401100%2
> >  199766539   100%5   499414653   100%
> > 10  998836518   100%
> > > store_reseek  1   11733   > ! 2   59924   > ! 8
> >  120607  > ! 6   233467  > ! 10  585357  > ! 8
> >  1163490 > !
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > cells_matched 20002000100%6000
> > 6000100%11000   11000   100%21000
> >  21000   100%51000   51000   100%
> > 101000  101000  100%
> > > column_hint_include   10001000100%5000
> >   5000100%1   1   100%

[jira] [Created] (HBASE-24644) Add a clause to the book noting that sometimes we short-circuit the deprecation cycle

2020-06-26 Thread Nick Dimiduk (Jira)
Nick Dimiduk created HBASE-24644:


 Summary: Add a clause to the book noting that sometimes we 
short-circuit the deprecation cycle
 Key: HBASE-24644
 URL: https://issues.apache.org/jira/browse/HBASE-24644
 Project: HBase
  Issue Type: Task
  Components: community, documentation
Reporter: Nick Dimiduk


Let's add a note to the book that describes the circumstances around 
HBASE-21782 and how that can result in code not following our stated 
deprecation cycle guidelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24643) Replace Cluster#primariesOfRegionsPerServer from int array to treemap

2020-06-26 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24643:


 Summary: Replace Cluster#primariesOfRegionsPerServer from int 
array to treemap
 Key: HBASE-24643
 URL: https://issues.apache.org/jira/browse/HBASE-24643
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Affects Versions: 2.3.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Currently, primariesOfRegionsPerServer is an int array, moveRegion does heavy 
work by searching the array (linearly) and insert/remove an element requires 
allocating/copying the whole array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Removing problematic terms from our project

2020-06-26 Thread Andrey Elenskiy
> Is there a word that's not "master" and not "coordinator" that is clear
and
suitable for (diverse, polyglot) community?

There are also:
- captain (sounds pretty close to "master" without the negative side and it
should be relatable around the world)
- conductor (as in orchestra)
- controller (in kafka controller assigns partitions)
- RegionDriver (more relevant to what it's actually doing in hbase and
borrowed from PlacementDrive of TiKV)
- AdminServer (as you already have AdminClient to talk to it).

On Thu, Jun 25, 2020 at 3:49 PM Sean Busbey  wrote:

> How about "manager"?
>
> (It would help me if folks could explain what is lacking in "coordinator".)
>
> On Thu, Jun 25, 2020, 13:32 Nick Dimiduk  wrote:
>
> > On Wed, Jun 24, 2020 at 10:14 PM 张铎(Duo Zhang) 
> > wrote:
> >
> > > -0/+1/+1/+1
> > >
> > > I’m the one who asked whether ‘master’ is safe to use without ‘slave’
> in
> > > the private list.
> > >
> > > I’m still not convinced that it is really necessary and I do not think
> > > other words like ‘coordinator’ can fully describe the role of HMaster
> in
> > > HBase. HBase is more than 10 years old. In the context of HBase, the
> word
> > > ‘HMaster’ has its own meaning. Changing the name will hurt our users
> and
> > > make them confusing, especially for us non native English speakers...
> > >
> >
> > Is there a word that's not "master" and not "coordinator" that is clear
> and
> > suitable for (diverse, polyglot) community?
> >
> > Stack 于2020年6月25日 周四06:34写道:
> > >
> > > > +1/+1/+1/+1 where hbase3 adds the deprecation and hbase4 follows
> hbase3
> > > > soon after sounds good to me. I'm up for working on this.
> > > > S
> > > >
> > > > On Wed, Jun 24, 2020 at 2:26 PM Xu Cang  wrote:
> > > >
> > > > > Strongly agree with what Nick said here:
> > > > >
> > > > >  " From my perspective, we gain nothing as a project or as a
> > community
> > > be
> > > > > willfully retaining use of language that is well understood to be
> > > > > problematic or hurtful, On the contrary, we have much to gain
> by
> > > > > encouraging
> > > > > contributions from as many people as possible."
> > > > >
> > > > > +1 to Andrew's proposal.
> > > > >
> > > > > It might be good to have a source of truth web page or README file
> > for
> > > > > developers and users to refer to regarding all naming transitions.
> > It's
> > > > > going to help both developers changing the code and users looking
> for
> > > > some
> > > > > answers online that use old namings.
> > > > >
> > > > > Xu
> > > > >
> > > > > On Wed, Jun 24, 2020 at 2:21 PM Nick Dimiduk 
> > > > wrote:
> > > > >
> > > > > > On Tue, Jun 23, 2020 at 13:11 Sean Busbey 
> > wrote:
> > > > > >
> > > > > > > I would like to make sure I am emphatically clear that "master"
> > by
> > > > > itself
> > > > > > > is not okay if the context is the same as what would normally
> be
> > a
> > > > > > > master/slave context. Furthermore our use of master is clearly
> > > such a
> > > > > > > context.
> > > > > >
> > > > > >
> > > > > > I agree: to me “Master”, as in “HMaster” caries with it the
> > > > master/slave
> > > > > > baggage. As an alternative, I prefer the term “coordinator” over
> > > > > “leader”.
> > > > > > Thus we would have daemons called “coordinator” and “region
> > server”.
> > > > > >
> > > > > > To me, “master” as in “master branch” does not carry the same
> > > baggage,
> > > > > but
> > > > > > I’m also in favor changing the name of our default branch to a
> word
> > > > that
> > > > > is
> > > > > > less conflicted. I see nothing that we gain as a community by
> > > > continuing
> > > > > to
> > > > > > use this word.
> > > > > >
> > > > > > It seems to me we have, broadly speaking, consensus around making
> > > > *some*
> > > > > > > changes. I haven't seen a strong push for "break everything in
> > the
> > > > name
> > > > > > of
> > > > > > > expediency" (I would personally be fine with this). So barring
> > > > > additional
> > > > > > > discussion that favors breaking changes, current approaches
> > should
> > > > > > comport
> > > > > > > with our existing project compatibility goals.
> > > > > > >
> > > > > > > Maybe we could stop talking about what-ifs and look at actual
> > > > practical
> > > > > > > examples? If anyone is currently up for doing the work of a PR
> we
> > > can
> > > > > > look
> > > > > > > at for one of these?
> > > > > > >
> > > > > > > If folks would prefer we e.g. just say "we should break
> whatever
> > we
> > > > > need
> > > > > > to
> > > > > > > in 3.0.0 to make this happen" then it would be good to speak
> up.
> > > > > > Otherwise
> > > > > > > likely we would be done with needed changes circa hbase 4,
> > probably
> > > > > late
> > > > > > > 2021 or 2022.
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jun 23, 2020, 03:03 zheng wang <18031...@qq.com>
> wrote:
> > > > > > >
> > > > > > > > IMO, master is ok if not used with slave together.
> > > > > > > >
> > > > > > > >
> > > > > > > > -1/+1/+1/+1
> > > > > 

[jira] [Created] (HBASE-24642) Apache Yetus integration

2020-06-26 Thread Bharath Vissapragada (Jira)
Bharath Vissapragada created HBASE-24642:


 Summary: Apache Yetus integration
 Key: HBASE-24642
 URL: https://issues.apache.org/jira/browse/HBASE-24642
 Project: HBase
  Issue Type: Sub-task
Reporter: Bharath Vissapragada
Assignee: Bharath Vissapragada


Now that we have a clean test run with all the tests passing on trunk (with 
HBase trunk) , lets get the Yetus integration with test-patch utility working. 
That makes it much easier to implement a precommit with github.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24641) Add linter checks for patches

2020-06-26 Thread Bharath Vissapragada (Jira)
Bharath Vissapragada created HBASE-24641:


 Summary: Add linter checks for patches
 Key: HBASE-24641
 URL: https://issues.apache.org/jira/browse/HBASE-24641
 Project: HBase
  Issue Type: Sub-task
  Components: Client, native-client
Reporter: Bharath Vissapragada


I've worked with clang-tidy before and I think it works well. There is also a 
[helper 
script|https://clang.llvm.org/extra/doxygen/clang-tidy-diff_8py_source.html] 
that runs clang-tidy on a patch rather than the entire source tree.

- pull in clang as a dependency
- Define .clang-format
- Run clang-tidy-diff on each checkin

There are also other tools like cpplint from Google. I don't have any 
preference, so either works for me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] VisibleForTesting annotation as it pertains to our API compatibility guidelines

2020-06-26 Thread Nick Dimiduk
I've restored the previous method signature to LoadIncrementalHFiles, so
that piece is complete for the next RC.

Based on the latest comments, let me update my proposed course of action.

1. restore any VisibleForTesting method signatures for 2.3.0, treat this as
public API going forward.
2. purge the VisibleForTesting annotation from our codebase for 2.4+,
involving:
2a. replace VisibleForTesting with IA.Private anywhere method visibility
cannot be limited
2b. perhaps add a new Yetus check that would ban new use of
VisibleForTesting

I have filed https://issues.apache.org/jira/browse/HBASE-24640 as a
tracking task for this work. If the above process is agreeable, let's add
these steps as a comment for future reference.

Thanks,
Nick

On Fri, Jun 26, 2020 at 8:38 AM 张铎(Duo Zhang)  wrote:

> For the LoadIncrementalHFiles class, the IA.Public annotation itself is a
> problem. It should be IA.LimitPrivate(TOOLS). So I'm fine with either
> adding the method back or not, the opinion of the release manager is
> most important I think.
>
> Thanks.
>
> Viraj Jasani  于2020年6月26日周五 下午9:50写道:
>
> > Agree on replacing VFT in next 2.4.0 or 3.0.0 release and restoring the
> > required
> > method for now to unblock 2.3.0 RC1.
> >
> >
> > On 2020/06/26 01:11:31, Andrew Purtell  wrote:
> > > Sounds fine to me.
> > >
> > > My earlier objection was to talk of an HBase 3 followed by an HBase 4.
> We
> > > don't need to do a full deprecation cycle across two major versions to
> > > remove an annotation that never promised public access. (By definition,
> > > tagged fields and members were VisibleForTesting (only). The 'only' was
> > > implied, but I think a reasonable assumption and common knowledge.)
> > >
> > > On Thu, Jun 25, 2020 at 3:48 PM Sean Busbey  wrote:
> > >
> > > > Agree on restoring the member and then getting this done for 2.4.0.
> > > >
> > > >
> > > > On Thu, Jun 25, 2020, 15:02 Nick Dimiduk 
> wrote:
> > > >
> > > > > And now by module,
> > > > >
> > > > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+ |
> > cut
> > > > -d/
> > > > > -f2 | sort | uniq -c
> > > > >6 hbase-backup
> > > > >   87 hbase-client
> > > > >   40 hbase-common
> > > > >1 hbase-endpoint
> > > > >7 hbase-hadoop-compat
> > > > >3 hbase-http
> > > > >   18 hbase-mapreduce
> > > > >1 hbase-metrics-api
> > > > >   24 hbase-procedure
> > > > >   10 hbase-replication
> > > > >  456 hbase-server
> > > > >2 hbase-thrift
> > > > >1 hbase-zookeeper
> > > > >
> > > > > I prefer we not make this change a prerequisite to 2.3. I would
> > rather we
> > > > > restore the one method modified by HBASE-24221 and do the work for
> > > > > VisibleForTesting for 2.4.0.
> > > > >
> > > > > On Thu, Jun 25, 2020 at 12:57 PM Nick Dimiduk  >
> > > > wrote:
> > > > >
> > > > > > On Thu, Jun 25, 2020 at 12:36 PM Andrew Purtell <
> > apurt...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> I think we are in agreement except for a need to have a
> > deprecation
> > > > > cycle.
> > > > > >> Just remove VisibleForTesting and replace with whatever
> > alternative
> > > > you
> > > > > >> like. Certainly in the next minors. No strong opinion either way
> > about
> > > > > >> patch releases, leave as is?
> > > > > >>
> > > > > >
> > > > > > Thanks Andrew and Bharath, I now better understand your
> positions.
> > > > > >
> > > > > > The annotation is fairly common in our codebase, from branch-2.3,
> > > > > >
> > > > > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+
> > | wc
> > > > -l
> > > > > >  668
> > > > > >
> > > > > > I don't have an easy way to cross-reference this with our AI
> > > > annotations,
> > > > > > but my concern is that any change we make here without a
> > deprecation
> > > > > cycle
> > > > > > will be disruptive to users.
> > > > > >
> > > > > > On Thu, Jun 25, 2020 at 11:30 AM Nick Dimiduk <
> ndimi...@apache.org
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> > On Wed, Jun 24, 2020 at 3:19 PM Andrew Purtell <
> > apurt...@apache.org
> > > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > It is possible some users may not understand what Guava's
> > > > > >> > VisibleForTesting
> > > > > >> > > implies, but those users are much more likely to be Java
> > > > developers
> > > > > or
> > > > > >> > Java
> > > > > >> > > developer adjacent, and familiar with what this fad
> entailed.
> > Such
> > > > > >> > tagging
> > > > > >> > > was/is done specifically to indicate the exposed field or
> > method
> > > > was
> > > > > >> only
> > > > > >> > > made to allow test access to internals, as something less
> than
> > > > > public.
> > > > > >> > >
> > > > > >> > > For us to treat such annotated fields and methods as public
> > after
> > > > > all
> > > > > >> is
> > > > > >> > > unnecessary, possibly surprising, and not semantically sound
> > > > (IMHO).
> > > > > >> > >
> > > > > >> >
> > > > > >> > I don't want to preserve use of VisibleForTesting as an
> > indicato

[jira] [Created] (HBASE-24640) Purge use of VisibleForTesting

2020-06-26 Thread Nick Dimiduk (Jira)
Nick Dimiduk created HBASE-24640:


 Summary: Purge use of VisibleForTesting
 Key: HBASE-24640
 URL: https://issues.apache.org/jira/browse/HBASE-24640
 Project: HBase
  Issue Type: Task
  Components: community
Reporter: Nick Dimiduk


>From the dev-list thread ["[DISCUSS] VisibleForTesting annotation as it 
>pertains to our API compatibility 
>guidelines"|https://lists.apache.org/thread.html/rc7c7c66f134fe135d0a4454a883215e26ff3d20e5a31ecd6a2d1db77%40%3Cdev.hbase.apache.org%3E],
> when used in classes annotated with interface audience other than IA.Private, 
>the VisibleForTesting annotation is confusing and considered harmful. The 
>consensus is that we do not want to use this annotation as part of the 
>definition of our public APIs, and we need to remove the point of confusion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24221) Support bulkLoadHFile by family

2020-06-26 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24221.
--
Fix Version/s: (was: 2.4.0)
   Resolution: Fixed

And branch-2.2. Thanks for the reviews.

> Support bulkLoadHFile by family
> ---
>
> Key: HBASE-24221
> URL: https://issues.apache.org/jira/browse/HBASE-24221
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.4
>Reporter: niuyulin
>Assignee: niuyulin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.5
>
>
> Support bulkLoadHFile by family to avoid long time waiting of bulkLoadHFile 
> because of compacting at server side



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-24221) Support bulkLoadHFile by family

2020-06-26 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk reopened HBASE-24221:
--

Reopening, pardon. need branch-2.2 as well.

> Support bulkLoadHFile by family
> ---
>
> Key: HBASE-24221
> URL: https://issues.apache.org/jira/browse/HBASE-24221
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.4
>Reporter: niuyulin
>Assignee: niuyulin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.5
>
>
> Support bulkLoadHFile by family to avoid long time waiting of bulkLoadHFile 
> because of compacting at server side



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24221) Support bulkLoadHFile by family

2020-06-26 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-24221.
--
Resolution: Fixed

Addendum pushed to branch-2 and branch-2.3.

> Support bulkLoadHFile by family
> ---
>
> Key: HBASE-24221
> URL: https://issues.apache.org/jira/browse/HBASE-24221
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.4
>Reporter: niuyulin
>Assignee: niuyulin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.5
>
>
> Support bulkLoadHFile by family to avoid long time waiting of bulkLoadHFile 
> because of compacting at server side



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24638) Edit doc on (offheap) memory management

2020-06-26 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24638.
---
Fix Version/s: 3.0.0-alpha-1
 Assignee: Michael Stack
   Resolution: Fixed

Merged doc change.

> Edit doc on (offheap) memory management
> ---
>
> Key: HBASE-24638
> URL: https://issues.apache.org/jira/browse/HBASE-24638
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> Gave it a read over to try and figure current state of memory management in 
> hbase-2.3.0. Updated it to reflect more of what the current state is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] VisibleForTesting annotation as it pertains to our API compatibility guidelines

2020-06-26 Thread Duo Zhang
For the LoadIncrementalHFiles class, the IA.Public annotation itself is a
problem. It should be IA.LimitPrivate(TOOLS). So I'm fine with either
adding the method back or not, the opinion of the release manager is
most important I think.

Thanks.

Viraj Jasani  于2020年6月26日周五 下午9:50写道:

> Agree on replacing VFT in next 2.4.0 or 3.0.0 release and restoring the
> required
> method for now to unblock 2.3.0 RC1.
>
>
> On 2020/06/26 01:11:31, Andrew Purtell  wrote:
> > Sounds fine to me.
> >
> > My earlier objection was to talk of an HBase 3 followed by an HBase 4. We
> > don't need to do a full deprecation cycle across two major versions to
> > remove an annotation that never promised public access. (By definition,
> > tagged fields and members were VisibleForTesting (only). The 'only' was
> > implied, but I think a reasonable assumption and common knowledge.)
> >
> > On Thu, Jun 25, 2020 at 3:48 PM Sean Busbey  wrote:
> >
> > > Agree on restoring the member and then getting this done for 2.4.0.
> > >
> > >
> > > On Thu, Jun 25, 2020, 15:02 Nick Dimiduk  wrote:
> > >
> > > > And now by module,
> > > >
> > > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+ |
> cut
> > > -d/
> > > > -f2 | sort | uniq -c
> > > >6 hbase-backup
> > > >   87 hbase-client
> > > >   40 hbase-common
> > > >1 hbase-endpoint
> > > >7 hbase-hadoop-compat
> > > >3 hbase-http
> > > >   18 hbase-mapreduce
> > > >1 hbase-metrics-api
> > > >   24 hbase-procedure
> > > >   10 hbase-replication
> > > >  456 hbase-server
> > > >2 hbase-thrift
> > > >1 hbase-zookeeper
> > > >
> > > > I prefer we not make this change a prerequisite to 2.3. I would
> rather we
> > > > restore the one method modified by HBASE-24221 and do the work for
> > > > VisibleForTesting for 2.4.0.
> > > >
> > > > On Thu, Jun 25, 2020 at 12:57 PM Nick Dimiduk 
> > > wrote:
> > > >
> > > > > On Thu, Jun 25, 2020 at 12:36 PM Andrew Purtell <
> apurt...@apache.org>
> > > > > wrote:
> > > > >
> > > > >> I think we are in agreement except for a need to have a
> deprecation
> > > > cycle.
> > > > >> Just remove VisibleForTesting and replace with whatever
> alternative
> > > you
> > > > >> like. Certainly in the next minors. No strong opinion either way
> about
> > > > >> patch releases, leave as is?
> > > > >>
> > > > >
> > > > > Thanks Andrew and Bharath, I now better understand your positions.
> > > > >
> > > > > The annotation is fairly common in our codebase, from branch-2.3,
> > > > >
> > > > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+
> | wc
> > > -l
> > > > >  668
> > > > >
> > > > > I don't have an easy way to cross-reference this with our AI
> > > annotations,
> > > > > but my concern is that any change we make here without a
> deprecation
> > > > cycle
> > > > > will be disruptive to users.
> > > > >
> > > > > On Thu, Jun 25, 2020 at 11:30 AM Nick Dimiduk  >
> > > > wrote:
> > > > >>
> > > > >> > On Wed, Jun 24, 2020 at 3:19 PM Andrew Purtell <
> apurt...@apache.org
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > It is possible some users may not understand what Guava's
> > > > >> > VisibleForTesting
> > > > >> > > implies, but those users are much more likely to be Java
> > > developers
> > > > or
> > > > >> > Java
> > > > >> > > developer adjacent, and familiar with what this fad entailed.
> Such
> > > > >> > tagging
> > > > >> > > was/is done specifically to indicate the exposed field or
> method
> > > was
> > > > >> only
> > > > >> > > made to allow test access to internals, as something less than
> > > > public.
> > > > >> > >
> > > > >> > > For us to treat such annotated fields and methods as public
> after
> > > > all
> > > > >> is
> > > > >> > > unnecessary, possibly surprising, and not semantically sound
> > > (IMHO).
> > > > >> > >
> > > > >> >
> > > > >> > I don't want to preserve use of VisibleForTesting as an
> indicator of
> > > > >> public
> > > > >> > API. I want to ensure that we're clear to our downstream users
> > > > >> > that its presence is not a factor in determining public API. For
> > > > >> example, I
> > > > >> > don't want to update our book to give any meaning to this
> > > annotation,
> > > > >> and I
> > > > >> > don't want to update our javadoc filters to take it into account
> > > when
> > > > >> > generating the various versions of javadoc that we publish. I
> want
> > > to
> > > > >> purge
> > > > >> > it from the discussion by annotating the methods it decorates
> with
> > > the
> > > > >> > symbols we do use to define our public API. The steps I propose
> > > above
> > > > >> are
> > > > >> > my suggestion of how we work toward that goal.
> > > > >> >
> > > > >> > Does anyone have a counter-proposal to the steps I've outlined
> > > above?
> > > > A
> > > > >> > resolution to this discussion is now the final blocker on
> 2.3.0rc1.
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Nick
> > > > >> >
> > > > >> > On Wed, Jun 24, 2020 at 2:53 PM Sean Busbey 
> > > >

Re: [DISCUSS] VisibleForTesting annotation as it pertains to our API compatibility guidelines

2020-06-26 Thread Viraj Jasani
Agree on replacing VFT in next 2.4.0 or 3.0.0 release and restoring the required
method for now to unblock 2.3.0 RC1.


On 2020/06/26 01:11:31, Andrew Purtell  wrote: 
> Sounds fine to me.
> 
> My earlier objection was to talk of an HBase 3 followed by an HBase 4. We
> don't need to do a full deprecation cycle across two major versions to
> remove an annotation that never promised public access. (By definition,
> tagged fields and members were VisibleForTesting (only). The 'only' was
> implied, but I think a reasonable assumption and common knowledge.)
> 
> On Thu, Jun 25, 2020 at 3:48 PM Sean Busbey  wrote:
> 
> > Agree on restoring the member and then getting this done for 2.4.0.
> >
> >
> > On Thu, Jun 25, 2020, 15:02 Nick Dimiduk  wrote:
> >
> > > And now by module,
> > >
> > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+ | cut
> > -d/
> > > -f2 | sort | uniq -c
> > >6 hbase-backup
> > >   87 hbase-client
> > >   40 hbase-common
> > >1 hbase-endpoint
> > >7 hbase-hadoop-compat
> > >3 hbase-http
> > >   18 hbase-mapreduce
> > >1 hbase-metrics-api
> > >   24 hbase-procedure
> > >   10 hbase-replication
> > >  456 hbase-server
> > >2 hbase-thrift
> > >1 hbase-zookeeper
> > >
> > > I prefer we not make this change a prerequisite to 2.3. I would rather we
> > > restore the one method modified by HBASE-24221 and do the work for
> > > VisibleForTesting for 2.4.0.
> > >
> > > On Thu, Jun 25, 2020 at 12:57 PM Nick Dimiduk 
> > wrote:
> > >
> > > > On Thu, Jun 25, 2020 at 12:36 PM Andrew Purtell 
> > > > wrote:
> > > >
> > > >> I think we are in agreement except for a need to have a deprecation
> > > cycle.
> > > >> Just remove VisibleForTesting and replace with whatever alternative
> > you
> > > >> like. Certainly in the next minors. No strong opinion either way about
> > > >> patch releases, leave as is?
> > > >>
> > > >
> > > > Thanks Andrew and Bharath, I now better understand your positions.
> > > >
> > > > The annotation is fairly common in our codebase, from branch-2.3,
> > > >
> > > > $ find . -iname '*.java' -exec grep -n '@VisibleForTesting' {} \+ | wc
> > -l
> > > >  668
> > > >
> > > > I don't have an easy way to cross-reference this with our AI
> > annotations,
> > > > but my concern is that any change we make here without a deprecation
> > > cycle
> > > > will be disruptive to users.
> > > >
> > > > On Thu, Jun 25, 2020 at 11:30 AM Nick Dimiduk 
> > > wrote:
> > > >>
> > > >> > On Wed, Jun 24, 2020 at 3:19 PM Andrew Purtell  > >
> > > >> > wrote:
> > > >> >
> > > >> > > It is possible some users may not understand what Guava's
> > > >> > VisibleForTesting
> > > >> > > implies, but those users are much more likely to be Java
> > developers
> > > or
> > > >> > Java
> > > >> > > developer adjacent, and familiar with what this fad entailed. Such
> > > >> > tagging
> > > >> > > was/is done specifically to indicate the exposed field or method
> > was
> > > >> only
> > > >> > > made to allow test access to internals, as something less than
> > > public.
> > > >> > >
> > > >> > > For us to treat such annotated fields and methods as public after
> > > all
> > > >> is
> > > >> > > unnecessary, possibly surprising, and not semantically sound
> > (IMHO).
> > > >> > >
> > > >> >
> > > >> > I don't want to preserve use of VisibleForTesting as an indicator of
> > > >> public
> > > >> > API. I want to ensure that we're clear to our downstream users
> > > >> > that its presence is not a factor in determining public API. For
> > > >> example, I
> > > >> > don't want to update our book to give any meaning to this
> > annotation,
> > > >> and I
> > > >> > don't want to update our javadoc filters to take it into account
> > when
> > > >> > generating the various versions of javadoc that we publish. I want
> > to
> > > >> purge
> > > >> > it from the discussion by annotating the methods it decorates with
> > the
> > > >> > symbols we do use to define our public API. The steps I propose
> > above
> > > >> are
> > > >> > my suggestion of how we work toward that goal.
> > > >> >
> > > >> > Does anyone have a counter-proposal to the steps I've outlined
> > above?
> > > A
> > > >> > resolution to this discussion is now the final blocker on 2.3.0rc1.
> > > >> >
> > > >> > Thanks,
> > > >> > Nick
> > > >> >
> > > >> > On Wed, Jun 24, 2020 at 2:53 PM Sean Busbey 
> > > wrote:
> > > >> > >
> > > >> > > > Andrew are you specifically opposed to using a deprecation cycle
> > > to
> > > >> > > > formally label as private anything that currently has a
> > > >> > VisibleForTesting
> > > >> > > > annotation?
> > > >> > > >
> > > >> > > > On Wed, Jun 24, 2020, 16:07 Andrew Purtell  > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > I am -1 on treating VisibleForTesting as public API.
> > > Semantically
> > > >> it
> > > >> > > > makes
> > > >> > > > > no sense.
> > > >> > > > >
> > > >> > > > > On Wed, Jun 24, 2020 at 12:22 PM Nick Dimiduk <
> > > >> ndimi...@apache.org>
> > > >> >

[jira] [Created] (HBASE-24639) RequestId Tracing feature for HBase

2020-06-26 Thread Pranshu Khandelwal (Jira)
Pranshu Khandelwal created HBASE-24639:
--

 Summary: RequestId Tracing feature for HBase 
 Key: HBASE-24639
 URL: https://issues.apache.org/jira/browse/HBASE-24639
 Project: HBase
  Issue Type: New Feature
Reporter: Pranshu Khandelwal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24626) [HBCK2] Remove reference to hase I.A. private class CommonFsUtils from FsRegionsMetaRecoverer

2020-06-26 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-24626.
--
Resolution: Fixed

Merged to master.

> [HBCK2] Remove reference to hase I.A. private class CommonFsUtils from 
> FsRegionsMetaRecoverer
> -
>
> Key: HBASE-24626
> URL: https://issues.apache.org/jira/browse/HBASE-24626
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase-operator-tools, hbck2
>Affects Versions: hbase-operator-tools-1.0.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: hbase-operator-tools-1.1.0
>
>
> FsRegionsMetaRecoverer used to reference I.A. private targeted interface 
> FSUtils, which changed on hbase 2.3, causing hbck2 fail to compile. 
> HBASE-24482 fixed it by pointing to CommonFSUtils interface, where the 
> methods it was relying upon was actually defined. Since this is also a IA 
> private interface, there's no compatibility guarantees. This PR removes 
> reference to CommonFSUtils on FsRegionsMetaRecoverer. Other classes in hbck2 
> still require similar work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: HBase 2 slower than HBase 1?

2020-06-26 Thread Anoop John
Great investigation Andy.  Do you know any Jiras which made changes in SQM?
Would be great if you can attach your patch which tracks the scan flow.  If
we have a Jira for this issue, can you pls attach?

Anoop

On Fri, Jun 26, 2020 at 1:56 AM Andrew Purtell 
wrote:

> Related, I think I found a bug in branch-1 where we don’t heartbeat in the
> filter all case until we switch store files, so scanning a very large store
> file might time out with client defaults. Remarking on this here so I don’t
> forget to follow up.
>
> > On Jun 25, 2020, at 12:27 PM, Andrew Purtell 
> wrote:
> >
> > 
> > I repeated this test with pe --filterAll and the results were revealing,
> at least for this case. I also patched in thread local hash map for atomic
> counters that I could update from code paths in SQM, StoreScanner,
> HFileReader*, and HFileBlock. Because a RPC is processed by a single
> handler thread I could update counters and accumulate micro-timings via
> System#nanoTime() per RPC and dump them out of CallRunner in some new trace
> logging. I spent a couple of days making sure the instrumentation was
> placed equivalently in both 1.6 and 2.2 code bases and was producing
> consistent results. I can provide these patches upon request.
> >
> > Again, test tables with one family and 1, 5, 10, 20, 50, and 100
> distinct column-qualifiers per row. After loading the table I made a
> snapshot and cloned the snapshot for testing, for both 1.6 and 2.2, so both
> versions were tested using the exact same data files on HDFS. I also used
> the 1.6 version of PE for both, so the only change is on the server (1.6 vs
> 2.2 masters and regionservers).
> >
> > It appears a refactor to ScanQueryMatcher and friends has disabled the
> ability of filters to provide SKIP hints, which prevents us from bypassing
> version checking (so some extra cost in SQM), and appears to disable an
> optimization that avoids reseeking, leading to a serious and proportional
> regression in reseek activity and time spent in that code path. So for
> queries that use filters, there can be a substantial regression.
> >
> > Other test cases that did not use filters did not show a regression.
> >
> > A test case where I used ROW_INDEX_V1 encoding showed an expected modest
> proportional regression in seeking time, due to the fact it is optimized
> for point queries and not optimized for the full table scan case.
> >
> > I will come back here when I understand this better.
> >
> > Here are the results for the pe --filterAll case:
> >
> >
> > 1.6.0 c1  2.2.5 c1
> > 1.6.0 c5  2.2.5 c5
> > 1.6.0 c10 2.2.5 c10
> > 1.6.0 c20 2.2.5 c20
> > 1.6.0 c50 2.2.5 c50
> > 1.6.0 c1002.2.5 c100
> > Counts
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > (better heartbeating)
> > (better heartbeating)
> > (better heartbeating)
> > (better heartbeating)
> > (better heartbeating)
> > rpcs  1   2   200%2   6   300%2   10
> 500%3   17  567%4   37  925%8   72  900%
> > block_reads   11507   11508   100%57255   57257   100%114471
> 114474  100%230372  230377  100%578292  578298  100%1157955
> 1157963 100%
> > block_unpacks 11507   11508   100%57255   57257   100%114471
> 114474  100%230372  230377  100%578292  578298  100%1157955
> 1157963 100%
> > seeker_next   10001000100%5000
> 5000100%1   1   100%2
>  2   100%5   5   100%
> 10  10  100%
> > store_next10009988268 100%500049940082
>   100%1   99879401100%2
>  199766539   100%5   499414653   100%
> 10  998836518   100%
> > store_reseek  1   11733   > ! 2   59924   > ! 8
>  120607  > ! 6   233467  > ! 10  585357  > ! 8
>  1163490 > !
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > cells_matched 20002000100%6000
> 6000100%11000   11000   100%21000
>  21000   100%51000   51000   100%
> 101000  101000  100%
> > column_hint_include   10001000100%5000
>   5000100%1   1   100%
> 2   2   100%5   5
>  100%10  10  100%
> > filter_hint_skip  10001000100%5000
>   5000100%1   1   100%
> 2   2   100%5   5
>  100%10  10  100%
> > sqm_hint_done 999 999 100%998 998 100%998
> 998 100%

Re: [DISCUSS] IncreasingToUpperBoundRegionSplitPolicy can lead to unpredictable large regions size

2020-06-26 Thread Wellington Chevreuil
>
> It's supposed to be controlling how big the region is?
>
Precisely. It may not make a big difference for compaction itself, but
might have further implications on overall RS resource usage, with larger
than expected regions.  Given the feedback provided here, I guess we can
proceed with current proposal from HBASE-24530 all the way to maintenance
branches (it doesn't change IncreasingToUpperBoundRegionSplitPolicy
behaviour, but adds a new policy that in fact respect region max size for
the whole region). We can then fix IncreasingToUpperBoundRegionSplitPolicy
at minor versions branches as suggested by Busbey.

Em qua., 24 de jun. de 2020 às 18:00, Andrew Purtell 
escreveu:

> It's supposed to be controlling how big the region is?
>
> On Wed, Jun 24, 2020 at 8:42 AM 张铎(Duo Zhang) 
> wrote:
>
> > I think one of the goals of limiting the store file size is for
> compaction.
> > As long as we just do compactions per family, what is the actual problem
> if
> > the whole region is too big?
> >
> > Wellington Chevreuil  于2020年6月24日周三
> > 下午10:56写道:
> >
> > > The expected behaviour for the property is well documented, so renaming
> > and
> > > deprecation would rather be a separate task. HBASE-24530 should concern
> > > with making IncreasingToUpperBoundRegionSplitPolicy respect what
> > > hbase.hregion.max.filesize and MAX_FILESIZE table level descriptor
> > > documentation mandate, as well as being consistent with other split
> > > policies behaviour in relation to these properties.
> > >
> > > Em qua., 24 de jun. de 2020 às 08:00, Anoop John <
> anoop.hb...@gmail.com>
> > > escreveu:
> > >
> > > > If we are going to change (correct)   hbase.hregion.max.filesize to
> > > > hbase.hregion.max.size  (Via proper deprecation cycle) also along
> with
> > > this
> > > > change, am good.
> > > >
> > > > Anoop
> > > >
> > > > On Wed, Jun 24, 2020 at 1:29 AM Sean Busbey 
> wrote:
> > > >
> > > > > Let's fix via approach #3. Get it done for next minor versions and
> > then
> > > > if
> > > > > folks aren't sure about principle of least surprise we can talk
> about
> > > > > wether it goes into maintenance releases.
> > > > >
> > > > > On Tue, Jun 23, 2020, 13:07 Andrew Purtell 
> > > wrote:
> > > > >
> > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation
> is
> > > > > > violating those configs.
> > > > > >
> > > > > > Thank you for pointing this out. I feel even more strongly now
> this
> > > is
> > > > a
> > > > > > bug.
> > > > > > I vote for #3.
> > > > > >
> > > > > > On Tue, Jun 23, 2020 at 2:42 AM Wellington Chevreuil <
> > > > > > wellington.chevre...@gmail.com> wrote:
> > > > > >
> > > > > > > >
> > > > > > > > The config name was/is   hbase.hregion.max.*filesize* and
> > never *
> > > > > > > > hbase.hregion.max.size*.
> > > > > > > >
> > > > > > >
> > > > > > > Description for hbase.hregion.max.filesize is very clear
> stating
> > > that
> > > > > > it's
> > > > > > > the sum of all hfiles in the region that should not exceed this
> > > > > property
> > > > > > > value. And we not always use  *hbase.hregion.max.filesize* to
> > > > determine
> > > > > > the
> > > > > > > limit, but a MAX_FILESIZE table level descriptor whose
> > description
> > > > > reads
> > > > > > as
> > > > > > > below, on TableDescriptorBuilder javadoc:
> > > > > > >
> > > > > > >   /**
> > > > > > >* Returns the maximum size upto which a region can grow to
> > after
> > > > > > which a
> > > > > > >* region split is triggered. The region size is represented
> by
> > > the
> > > > > > size
> > > > > > > of
> > > > > > >* the biggest store file in that region.
> > > > > > >*
> > > > > > >* @return max hregion size for table, -1 if not set.
> > > > > > >*/
> > > > > > >
> > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation
> is
> > > > > > violating
> > > > > > > those configs.
> > > > > > >
> > > > > > > Do we have a consensus on applying #3 for all active branches?
> If
> > > > so, I
> > > > > > > would instruct HBASE-24530 to proceed as such.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Em dom., 21 de jun. de 2020 às 19:09, Andrew Purtell <
> > > > > > > andrew.purt...@gmail.com> escreveu:
> > > > > > >
> > > > > > > > ‘Filesize’ and ‘size’ are ambiguous. They are open to
> > > > interpretation
> > > > > > and
> > > > > > > I
> > > > > > > > don’t see one as more clear than the other, other than to
> imply
> > > > > > something
> > > > > > > > about file level measures being the determining factor. It
> > > doesn’t
> > > > > > convey
> > > > > > > > more semantics beyond that, ie one file trips the limit or
> the
> > > > > combined
> > > > > > > > sizes of all files trips the limit. We can fix that with
> > > clarifying
> > > > > > > > documentation. While doing so we also have an opportunity to
> > fix
> > > > > > > something
> > > > > > > > if our consensus is the current policy is not the usual user
> > > > > > expectation.
> > > > > > > >
> > > > > > >