[jira] [Reopened] (HBASE-25284) Check-in "Enable memstore replication..." design

2020-11-19 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reopened HBASE-25284:
---

Reopen to update the design doc to explicitly mention that HedgeRead mode can 
not increase the throughput of meta qps as it will only request secondary 
replicas when primary replica can not return in time. But if the primary 
replica is overloaded then the whole cluster will be in trouble so it is 
useless to request secondary replicas. This is one of the most important goal 
of this issue.

My modification permission on the design doc has been removed to avoid 
spamming, so please update the design doc on google doc and commit it again to 
the code base [~huaxiangsun]. I'm not going to at stack again since he refused 
to do anything to solve the problem here, I do not want to waste any time.

Thanks.

> Check-in "Enable memstore replication..." design
> 
>
> Key: HBASE-25284
> URL: https://issues.apache.org/jira/browse/HBASE-25284
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Add the design doc under dev-support/design-docs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: VOTE: Merge HBASE-18070 "Enable memstore replication for meta replica" to master and then back to branch-2" (Was "HEAD-UP: Merging HBASE-18070 "Enable memstore replication for meta replica" to mas

2020-11-19 Thread Duo Zhang
Oh, just noticed that the design doc has been committed to master and
branch-2 directly. I'm not sure if this is the correct way but since it is
already like this, let's just fix it on master and branch-2.

Then there is no blocker on the merge of the feature branch any more.
Change my vote to +1.

I've already reopened HBASE-25284 to mention what to change in the design
doc. I do not have the permission to modify the design doc, as it has been
messed up by others so the modification permission for most people have
been removed to avoid spamming.

Thanks.

张铎(Duo Zhang)  于2020年11月19日周四 下午2:24写道:

> It is not only about satisfying me, as a community we need to make sure
> that we are all on the same page before actually moving forward, or at
> least we should know what is the actual pivot point.
>
> I did not pose a quiz for you, there are just 4 technical questions. You
> strongly disagree that the test proposed by me is for HBASE-18070 and keep
> saying that the problem can be solved by 'HedgeRead', then I think it is
> valid for me to ask what do you think about what problems can be solved by
> the 'HedgeRead' and what can be solved by HBASE-18070? If this is not well
> understood by all, later someone may remove this benefit of HBASE-18070 and
> you will approve it and make HBASE-18070 useless.
>
> That's why I proposed we add this explicitly to the design doc, to at
> least let all the developers know this.
>
> Thanks.
>
> Stack  于2020年11月19日周四 下午1:43写道:
>
>> On Wed, Nov 18, 2020 at 7:03 PM 张铎(Duo Zhang) 
>> wrote:
>>
>> > OK, let me explain the technical part.
>> >
>> > What I proposed in the test is to verify that we could distribute the
>> load
>> > across all the meta so we could benefit if the main replica is f**ked
>> up.
>> > But then stack said this has already been solved by the old read
>> replicas
>> > feature. Maybe in the first place I did not speak clearly enough but
>> later
>> > I spoke clearly that I was talking about the distribution of the load
>> for
>> > the meta table, but stack still does not agree and insist that I was
>> > talking about hedge read.
>> >
>> > For me, I do not think hedge read can fully solve the 'primary region
>> > f**ked up' problem. Of course we will go to secondary replicas if the
>> > primary can not respond, but it usually means the primary replica is
>> not in
>> > a good state. The region server in a cluster will not go to the
>> secondary
>> > replicas to read right? If the primary replica is unavailable, a
>> failure of
>> > meta read could crash a region server. And it could also affect write
>> > requests to meta, which could cause serious problems on master too. I've
>> > implemented a lot of procedures on 2.x, usually we will just abort
>> master
>> > if there is a failure when accessing meta. This means, in the old hedge
>> > read mode, if the primary replica has been f**ked up, the cluster will
>> not
>> > be in a good state, finally the test will fail.
>> >
>> > And I think HBASE-18070 can solve the problem. But the main developer
>> seems
>> > to have a different opinion on this. So I asked him what are his
>> opinion on
>> > the 4 questions on jira, but so far I do not get a response from him
>> yet.
>> >
>> > Why I do not want to write  the above explanation before is that, if I
>> > throw this out, the main developer could easily say that 'yes I agree
>> with
>> > you, this is my point', to simply let the vote process to pass. But the
>> > actual issue will be covered as he never speaks out his own opinion, and
>> > may cause trouble in the future.
>> >
>> >
>> The veto seems to pivot on whether I, a co-author, knows what the feature
>> I
>> co-designed and co-wrote does. He has posed a quiz for me to fill out that
>> I am to answer to his satisfaction even though my co-author has already
>> answered his questionnaire.
>>
>> I suggest that the vote be on the feature rather than my responses to a
>> questionnaire of Duo's making.
>>
>> S
>>
>>
>>
>> > Thanks.
>> >
>> > Andrew Purtell  于2020年11月19日周四 上午10:23写道:
>> >
>> > > That's not how a technical veto works. The burden to explain how the
>> > > contributors can fix the reason for the veto is on you. You need to
>> give
>> > a
>> > > list of action items. "Fundamental of the issue" is just your opinion.
>> > > Nobody here is a Boss. Contributors don't have to satisfy your
>> (nebulous)
>> > > requirements, you have to successfully argue your point.
>> > >
>> > > On Wed, Nov 18, 2020 at 6:10 PM 张铎(Duo Zhang) 
>> > > wrote:
>> > >
>> > > > Thank you Andrew. I think my last comment clearly describe the two
>> > > > questions given by you.
>> > > >
>> > > > A clear and compelling reason why the proposed change is harmful or
>> > > > >undesirable
>> > > >
>> > > >
>> > > > It is about the fundamental of this issue. Due to the back and
>> forth on
>> > > how
>> > > > a test could used to verify the feature, I'm concerned whether the
>> main
>> > > > developer has the same opinion on the proble

[jira] [Created] (HBASE-25306) The log in SimpleLoadBalancer#onConfigurationChange is wrong

2020-11-19 Thread Baiqiang Zhao (Jira)
Baiqiang Zhao created HBASE-25306:
-

 Summary: The log in SimpleLoadBalancer#onConfigurationChange is 
wrong
 Key: HBASE-25306
 URL: https://issues.apache.org/jira/browse/HBASE-25306
 Project: HBase
  Issue Type: Bug
Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.2.7
Reporter: Baiqiang Zhao
Assignee: Baiqiang Zhao


[https://github.com/apache/hbase/blob/8c1e4763b3e11d4553e5a59e620ab30e3b2047e9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java#L139]

current overallSlop should be "overallSlop"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25300) 'Unknown table hbase:quota' happens when desc table in shell if quota disabled

2020-11-19 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-25300.

Fix Version/s: 2.3.4
   2.2.7
   2.4.0
   Resolution: Fixed

Pushed to branch-2.2+. Thanks [~Ddupg] for contributing.

> 'Unknown table hbase:quota' happens when desc table in shell if quota disabled
> --
>
> Key: HBASE-25300
> URL: https://issues.apache.org/jira/browse/HBASE-25300
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25307) ThreadLocal pooling leads to NullPointerException

2020-11-19 Thread Balazs Meszaros (Jira)
Balazs Meszaros created HBASE-25307:
---

 Summary: ThreadLocal pooling leads to NullPointerException
 Key: HBASE-25307
 URL: https://issues.apache.org/jira/browse/HBASE-25307
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 3.0.0-alpha-1
Reporter: Balazs Meszaros
Assignee: Balazs Meszaros


We got NPE after setting {{hbase.client.ipc.pool.type}} to {{thread-local}}:
{noformat}
20/11/18 01:53:04 ERROR yarn.ApplicationMaster: User class threw exception: 
java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.close(AbstractRpcClient.java:496)
at 
org.apache.hadoop.hbase.client.ConnectionImplementation.close(ConnectionImplementation.java:1944)
at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.close(TableInputFormatBase.java:660)
{noformat}
The root cause of the issue is probably at {{PoolMap.ThreadLocalPool.values()}}:
{code:java}
public Collection values() {
  List values = new ArrayList<>();
  values.add(get());
  return values;
}
{code}
It adds {{null}} into the collection if the current thread does not have any 
resources which leads to NPE later.

I traced the usages of values() and it should return every resource, not just 
that one which is attached to the caller thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: VOTE: Merge HBASE-18070 "Enable memstore replication for meta replica" to master and then back to branch-2" (Was "HEAD-UP: Merging HBASE-18070 "Enable memstore replication for meta replica" to mas

2020-11-19 Thread Stack
On Thu, Nov 19, 2020 at 12:53 AM 张铎(Duo Zhang) 
wrote:

> Oh, just noticed that the design doc has been committed to master and
> branch-2 directly. I'm not sure if this is the correct way but since it is
> already like this, let's just fix it on master and branch-2.
>
> Then there is no blocker on the merge of the feature branch any more.
> Change my vote to +1.
>
> I've already reopened HBASE-25284 to mention what to change in the design
> doc. I do not have the permission to modify the design doc, as it has been
> messed up by others so the modification permission for most people have
> been removed to avoid spamming.
>
>
I granted you edit rights.
S



> Thanks.
>
> 张铎(Duo Zhang)  于2020年11月19日周四 下午2:24写道:
>
> > It is not only about satisfying me, as a community we need to make sure
> > that we are all on the same page before actually moving forward, or at
> > least we should know what is the actual pivot point.
> >
> > I did not pose a quiz for you, there are just 4 technical questions. You
> > strongly disagree that the test proposed by me is for HBASE-18070 and
> keep
> > saying that the problem can be solved by 'HedgeRead', then I think it is
> > valid for me to ask what do you think about what problems can be solved
> by
> > the 'HedgeRead' and what can be solved by HBASE-18070? If this is not
> well
> > understood by all, later someone may remove this benefit of HBASE-18070
> and
> > you will approve it and make HBASE-18070 useless.
> >
> > That's why I proposed we add this explicitly to the design doc, to at
> > least let all the developers know this.
> >
> > Thanks.
> >
> > Stack  于2020年11月19日周四 下午1:43写道:
> >
> >> On Wed, Nov 18, 2020 at 7:03 PM 张铎(Duo Zhang) 
> >> wrote:
> >>
> >> > OK, let me explain the technical part.
> >> >
> >> > What I proposed in the test is to verify that we could distribute the
> >> load
> >> > across all the meta so we could benefit if the main replica is f**ked
> >> up.
> >> > But then stack said this has already been solved by the old read
> >> replicas
> >> > feature. Maybe in the first place I did not speak clearly enough but
> >> later
> >> > I spoke clearly that I was talking about the distribution of the load
> >> for
> >> > the meta table, but stack still does not agree and insist that I was
> >> > talking about hedge read.
> >> >
> >> > For me, I do not think hedge read can fully solve the 'primary region
> >> > f**ked up' problem. Of course we will go to secondary replicas if the
> >> > primary can not respond, but it usually means the primary replica is
> >> not in
> >> > a good state. The region server in a cluster will not go to the
> >> secondary
> >> > replicas to read right? If the primary replica is unavailable, a
> >> failure of
> >> > meta read could crash a region server. And it could also affect write
> >> > requests to meta, which could cause serious problems on master too.
> I've
> >> > implemented a lot of procedures on 2.x, usually we will just abort
> >> master
> >> > if there is a failure when accessing meta. This means, in the old
> hedge
> >> > read mode, if the primary replica has been f**ked up, the cluster will
> >> not
> >> > be in a good state, finally the test will fail.
> >> >
> >> > And I think HBASE-18070 can solve the problem. But the main developer
> >> seems
> >> > to have a different opinion on this. So I asked him what are his
> >> opinion on
> >> > the 4 questions on jira, but so far I do not get a response from him
> >> yet.
> >> >
> >> > Why I do not want to write  the above explanation before is that, if I
> >> > throw this out, the main developer could easily say that 'yes I agree
> >> with
> >> > you, this is my point', to simply let the vote process to pass. But
> the
> >> > actual issue will be covered as he never speaks out his own opinion,
> and
> >> > may cause trouble in the future.
> >> >
> >> >
> >> The veto seems to pivot on whether I, a co-author, knows what the
> feature
> >> I
> >> co-designed and co-wrote does. He has posed a quiz for me to fill out
> that
> >> I am to answer to his satisfaction even though my co-author has already
> >> answered his questionnaire.
> >>
> >> I suggest that the vote be on the feature rather than my responses to a
> >> questionnaire of Duo's making.
> >>
> >> S
> >>
> >>
> >>
> >> > Thanks.
> >> >
> >> > Andrew Purtell  于2020年11月19日周四 上午10:23写道:
> >> >
> >> > > That's not how a technical veto works. The burden to explain how the
> >> > > contributors can fix the reason for the veto is on you. You need to
> >> give
> >> > a
> >> > > list of action items. "Fundamental of the issue" is just your
> opinion.
> >> > > Nobody here is a Boss. Contributors don't have to satisfy your
> >> (nebulous)
> >> > > requirements, you have to successfully argue your point.
> >> > >
> >> > > On Wed, Nov 18, 2020 at 6:10 PM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Thank you Andrew. I think my last comment clearly describe the two
> >> > > > questions given by yo

Re: VOTE: Merge HBASE-18070 "Enable memstore replication for meta replica" to master and then back to branch-2" (Was "HEAD-UP: Merging HBASE-18070 "Enable memstore replication for meta replica" to mas

2020-11-19 Thread Andrew Purtell
Thank you for providing actionable feedback Duo.

I also thank you personally for adjusting your vote, as it unblocks
everyone here.



On Thu, Nov 19, 2020 at 12:53 AM 张铎(Duo Zhang) 
wrote:

> Oh, just noticed that the design doc has been committed to master and
> branch-2 directly. I'm not sure if this is the correct way but since it is
> already like this, let's just fix it on master and branch-2.
>
> Then there is no blocker on the merge of the feature branch any more.
> Change my vote to +1.
>
> I've already reopened HBASE-25284 to mention what to change in the design
> doc. I do not have the permission to modify the design doc, as it has been
> messed up by others so the modification permission for most people have
> been removed to avoid spamming.
>
> Thanks.
>
> 张铎(Duo Zhang)  于2020年11月19日周四 下午2:24写道:
>
> > It is not only about satisfying me, as a community we need to make sure
> > that we are all on the same page before actually moving forward, or at
> > least we should know what is the actual pivot point.
> >
> > I did not pose a quiz for you, there are just 4 technical questions. You
> > strongly disagree that the test proposed by me is for HBASE-18070 and
> keep
> > saying that the problem can be solved by 'HedgeRead', then I think it is
> > valid for me to ask what do you think about what problems can be solved
> by
> > the 'HedgeRead' and what can be solved by HBASE-18070? If this is not
> well
> > understood by all, later someone may remove this benefit of HBASE-18070
> and
> > you will approve it and make HBASE-18070 useless.
> >
> > That's why I proposed we add this explicitly to the design doc, to at
> > least let all the developers know this.
> >
> > Thanks.
> >
> > Stack  于2020年11月19日周四 下午1:43写道:
> >
> >> On Wed, Nov 18, 2020 at 7:03 PM 张铎(Duo Zhang) 
> >> wrote:
> >>
> >> > OK, let me explain the technical part.
> >> >
> >> > What I proposed in the test is to verify that we could distribute the
> >> load
> >> > across all the meta so we could benefit if the main replica is f**ked
> >> up.
> >> > But then stack said this has already been solved by the old read
> >> replicas
> >> > feature. Maybe in the first place I did not speak clearly enough but
> >> later
> >> > I spoke clearly that I was talking about the distribution of the load
> >> for
> >> > the meta table, but stack still does not agree and insist that I was
> >> > talking about hedge read.
> >> >
> >> > For me, I do not think hedge read can fully solve the 'primary region
> >> > f**ked up' problem. Of course we will go to secondary replicas if the
> >> > primary can not respond, but it usually means the primary replica is
> >> not in
> >> > a good state. The region server in a cluster will not go to the
> >> secondary
> >> > replicas to read right? If the primary replica is unavailable, a
> >> failure of
> >> > meta read could crash a region server. And it could also affect write
> >> > requests to meta, which could cause serious problems on master too.
> I've
> >> > implemented a lot of procedures on 2.x, usually we will just abort
> >> master
> >> > if there is a failure when accessing meta. This means, in the old
> hedge
> >> > read mode, if the primary replica has been f**ked up, the cluster will
> >> not
> >> > be in a good state, finally the test will fail.
> >> >
> >> > And I think HBASE-18070 can solve the problem. But the main developer
> >> seems
> >> > to have a different opinion on this. So I asked him what are his
> >> opinion on
> >> > the 4 questions on jira, but so far I do not get a response from him
> >> yet.
> >> >
> >> > Why I do not want to write  the above explanation before is that, if I
> >> > throw this out, the main developer could easily say that 'yes I agree
> >> with
> >> > you, this is my point', to simply let the vote process to pass. But
> the
> >> > actual issue will be covered as he never speaks out his own opinion,
> and
> >> > may cause trouble in the future.
> >> >
> >> >
> >> The veto seems to pivot on whether I, a co-author, knows what the
> feature
> >> I
> >> co-designed and co-wrote does. He has posed a quiz for me to fill out
> that
> >> I am to answer to his satisfaction even though my co-author has already
> >> answered his questionnaire.
> >>
> >> I suggest that the vote be on the feature rather than my responses to a
> >> questionnaire of Duo's making.
> >>
> >> S
> >>
> >>
> >>
> >> > Thanks.
> >> >
> >> > Andrew Purtell  于2020年11月19日周四 上午10:23写道:
> >> >
> >> > > That's not how a technical veto works. The burden to explain how the
> >> > > contributors can fix the reason for the veto is on you. You need to
> >> give
> >> > a
> >> > > list of action items. "Fundamental of the issue" is just your
> opinion.
> >> > > Nobody here is a Boss. Contributors don't have to satisfy your
> >> (nebulous)
> >> > > requirements, you have to successfully argue your point.
> >> > >
> >> > > On Wed, Nov 18, 2020 at 6:10 PM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> >> > > wrote:
> >> > >
> >> > >

[jira] [Resolved] (HBASE-25306) The log in SimpleLoadBalancer#onConfigurationChange is wrong

2020-11-19 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani resolved HBASE-25306.
--
Fix Version/s: 2.3.4
   2.2.7
   2.4.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks for the contribution [~DeanZ].

> The log in SimpleLoadBalancer#onConfigurationChange is wrong
> 
>
> Key: HBASE-25306
> URL: https://issues.apache.org/jira/browse/HBASE-25306
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.2.7
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4
>
>
> [https://github.com/apache/hbase/blob/8c1e4763b3e11d4553e5a59e620ab30e3b2047e9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java#L139]
> current overallSlop should be "overallSlop"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: VOTE: Merge HBASE-18070 "Enable memstore replication for meta replica" to master and then back to branch-2" (Was "HEAD-UP: Merging HBASE-18070 "Enable memstore replication for meta replica" to mas

2020-11-19 Thread Huaxiang Sun
Thank you all for the vote! Let me follow up on HBASE-25284.

Best Regards,
Huaxiang

On Thu, Nov 19, 2020 at 9:05 AM Andrew Purtell  wrote:

> Thank you for providing actionable feedback Duo.
>
> I also thank you personally for adjusting your vote, as it unblocks
> everyone here.
>
>
>
> On Thu, Nov 19, 2020 at 12:53 AM 张铎(Duo Zhang) 
> wrote:
>
> > Oh, just noticed that the design doc has been committed to master and
> > branch-2 directly. I'm not sure if this is the correct way but since it
> is
> > already like this, let's just fix it on master and branch-2.
> >
> > Then there is no blocker on the merge of the feature branch any more.
> > Change my vote to +1.
> >
> > I've already reopened HBASE-25284 to mention what to change in the design
> > doc. I do not have the permission to modify the design doc, as it has
> been
> > messed up by others so the modification permission for most people have
> > been removed to avoid spamming.
> >
> > Thanks.
> >
> > 张铎(Duo Zhang)  于2020年11月19日周四 下午2:24写道:
> >
> > > It is not only about satisfying me, as a community we need to make sure
> > > that we are all on the same page before actually moving forward, or at
> > > least we should know what is the actual pivot point.
> > >
> > > I did not pose a quiz for you, there are just 4 technical questions.
> You
> > > strongly disagree that the test proposed by me is for HBASE-18070 and
> > keep
> > > saying that the problem can be solved by 'HedgeRead', then I think it
> is
> > > valid for me to ask what do you think about what problems can be solved
> > by
> > > the 'HedgeRead' and what can be solved by HBASE-18070? If this is not
> > well
> > > understood by all, later someone may remove this benefit of HBASE-18070
> > and
> > > you will approve it and make HBASE-18070 useless.
> > >
> > > That's why I proposed we add this explicitly to the design doc, to at
> > > least let all the developers know this.
> > >
> > > Thanks.
> > >
> > > Stack  于2020年11月19日周四 下午1:43写道:
> > >
> > >> On Wed, Nov 18, 2020 at 7:03 PM 张铎(Duo Zhang) 
> > >> wrote:
> > >>
> > >> > OK, let me explain the technical part.
> > >> >
> > >> > What I proposed in the test is to verify that we could distribute
> the
> > >> load
> > >> > across all the meta so we could benefit if the main replica is
> f**ked
> > >> up.
> > >> > But then stack said this has already been solved by the old read
> > >> replicas
> > >> > feature. Maybe in the first place I did not speak clearly enough but
> > >> later
> > >> > I spoke clearly that I was talking about the distribution of the
> load
> > >> for
> > >> > the meta table, but stack still does not agree and insist that I was
> > >> > talking about hedge read.
> > >> >
> > >> > For me, I do not think hedge read can fully solve the 'primary
> region
> > >> > f**ked up' problem. Of course we will go to secondary replicas if
> the
> > >> > primary can not respond, but it usually means the primary replica is
> > >> not in
> > >> > a good state. The region server in a cluster will not go to the
> > >> secondary
> > >> > replicas to read right? If the primary replica is unavailable, a
> > >> failure of
> > >> > meta read could crash a region server. And it could also affect
> write
> > >> > requests to meta, which could cause serious problems on master too.
> > I've
> > >> > implemented a lot of procedures on 2.x, usually we will just abort
> > >> master
> > >> > if there is a failure when accessing meta. This means, in the old
> > hedge
> > >> > read mode, if the primary replica has been f**ked up, the cluster
> will
> > >> not
> > >> > be in a good state, finally the test will fail.
> > >> >
> > >> > And I think HBASE-18070 can solve the problem. But the main
> developer
> > >> seems
> > >> > to have a different opinion on this. So I asked him what are his
> > >> opinion on
> > >> > the 4 questions on jira, but so far I do not get a response from him
> > >> yet.
> > >> >
> > >> > Why I do not want to write  the above explanation before is that,
> if I
> > >> > throw this out, the main developer could easily say that 'yes I
> agree
> > >> with
> > >> > you, this is my point', to simply let the vote process to pass. But
> > the
> > >> > actual issue will be covered as he never speaks out his own opinion,
> > and
> > >> > may cause trouble in the future.
> > >> >
> > >> >
> > >> The veto seems to pivot on whether I, a co-author, knows what the
> > feature
> > >> I
> > >> co-designed and co-wrote does. He has posed a quiz for me to fill out
> > that
> > >> I am to answer to his satisfaction even though my co-author has
> already
> > >> answered his questionnaire.
> > >>
> > >> I suggest that the vote be on the feature rather than my responses to
> a
> > >> questionnaire of Duo's making.
> > >>
> > >> S
> > >>
> > >>
> > >>
> > >> > Thanks.
> > >> >
> > >> > Andrew Purtell  于2020年11月19日周四 上午10:23写道:
> > >> >
> > >> > > That's not how a technical veto works. The burden to explain how
> the
> > >> > > contributors can fix the rea

[jira] [Created] (HBASE-25308) [branch-1] Consume Guava from hbase-thirdparty hbase-shaded-miscellaneous

2020-11-19 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-25308:
---

 Summary: [branch-1] Consume Guava from hbase-thirdparty 
hbase-shaded-miscellaneous
 Key: HBASE-25308
 URL: https://issues.apache.org/jira/browse/HBASE-25308
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 1.7.0


We are again having classpath versioning issues related to Guava in our 
branch-1 based application.

Hadoop 3, HBase 2, Phoenix 5, and other projects deal with Guava cross-version 
incompatibilities, as they manifest on a combined classpath with other 
components, via shading.

I propose to do a global search and replace of all direct uses of Guava in our 
branch-1 code base and refer to Guava as provided in hbase-thirdparty's 
hbase-shaded-miscellaneous. This will protect HBase branch-1 from Guava 
cross-version vagaries just like the same technique protects branch-2 and 
branch-2 based releases. 

There are a couple of Public interfaces that incorporate Guava's HostAndPort 
and Service that will be indirectly impacted. We are about to release a new 
minor branch-1 version, 1.7.0, and this would be a great opportunity to 
introduce this kind of change in a manner consistent with semantic versioning 
and our compatibility policies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[RESULT] VOTE: Merge HBASE-18070 "Enable memstore replication for meta replica" to master and then back to branch-2" (Was "HEAD-UP: Merging HBASE-18070 "Enable memstore replication for meta replica" t

2020-11-19 Thread Stack
With 5 binding and 1 non-binding votes in favor and no votes against, this
vote to merge passes.
Thanks to all who participated.
S

On Tue, Nov 17, 2020 at 8:43 AM Stack  wrote:

> Please VOTE on whether to merge HBASE-18070 feature branch to master (and
> HBASE-18070.branch-2 to branch-2). The VOTE runs for 24 hours. The majority
> prevails (+ or -).
>
> Quoting the design lead-in:
>
> Read Replicas on the hbase:meta Table currently only does primitive read
> of the primary’s hfiles refreshing every (configurable) N seconds. This
> issue is about making it so we can do the Async WAL Replication
>  ability,
> currently only available for user-space Tables, against the hbase:meta
> system Tables too; i.e. the primary replica pushes edits to its Replicas so
> they run much closer to the primaries’ state. If clients could be satisfied
> reading from Replicas, then we could have improved hbase:meta uptimes but
> also, we can distribute load off of the primary and alleviate hbase:meta
> Table (read) hotspotting.
>
> Each PR that comprises the feature branch has been reviewed before commit.
>
>  * For the design, see [2].
>  * For an amalgamated PR of the 5 or 6 reviewed PRs that comprise this
> feature, see [3].
>  * For a PE report that compared performance before and after, see
> HBASE-25127 (no regression).
>  * A report on ITBLL runs is pending to be attached to HBASE-18070 but
> runs so far show no regression with the feature enabled (ITBLL runs were
> done against a backport of this feature to branch-2 as the ITBLL state of
> master is currently an unknown).
>
> Testing continues mainly looking for further improvement and to better
> understand this feature in operation. Documentation is included. There are
> some follow-ons that have been identified but these can land later.
>
> Thanks and thanks to all who contributed to this feature; the reviewers
> and the testers in particular.
>
> S
>
> 1. http://hbase.apache.org/book.html#_asnyc_wal_replication
> 2.
> https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit#
> This patch is currently missing HBASE-25280, a bug found in testing.
> 3. https://github.com/apache/hbase/pull/2643
>


Re: HEAD-UP: Merging HBASE-18070 "Enable memstore replication for meta replica" to master and then back to branch-2

2020-11-19 Thread Stack
The VOTE on the adjacent thread has passed.

Will rebase and rerun the hadoopqa check to be sure all is good, and then
merge the current unaltered state of branch HBASE-18070 to the master
branch (this afternoon if all goes well).

I will then work on doing the same for branch-2 merging
HBASE-18070.branch-2 (contingent on how the master merge goes).

Will work on the matter of landing outstanding design doc edits
concurrently (Andrew, if ok, please hold on the RC until this is ironed out
-- thanks).

S

On Tue, Nov 17, 2020 at 9:13 AM Stack  wrote:

> I've started an adjacent VOTE thread in an attempt at clarity of
> how-to-go-forward here.
> Thanks,
> S
>
> On Tue, Nov 17, 2020 at 7:56 AM Andrew Purtell 
> wrote:
>
>> Hi Duo,
>>
>> Just to be clear: You are saying go ahead with the merge, but then also
>> go back and start this discussion fresh, to see if anything was missed and
>> more can be done?
>>
>> > On Nov 16, 2020, at 11:25 PM, 张铎  wrote:
>> >
>> > Oh, this is my fault. I mean the old behavior IS to go to primary
>> replica
>> > first, which is what we want to change here.
>> >
>> > And what I commented  on jira, is to say that we do not need to get a
>> > performance improvement before merging, it is not the goal of this
>> issue.
>> > And I suggested that if we want to show our advantage, we need to get
>> the
>> > primary replica fucked up. I do not know why then the discussion went to
>> > the HedgeRead and I could not poll it back. I do not think this should
>> > block the merging but even though it was still very hard to
>> communicate, so
>> > I assumed this means we still have a big gap on what we want to solve
>> here,
>> > thus I voted a -1 here.
>> >
>> > I think we need to go back to the beginning, to reach an agreement on
>> the
>> > goal here. Let’s review the design doc again to see if we missed
>> something
>> > which lead us to this situation.
>> >
>> > And I need to say that, I do not want to block the issue to be merged. I
>> > tried my best to speed up the process. I suggested to land the changes
>> at
>> > client side to master directly but was refused. I helped to add scan on
>> > specific replica feature soon on branch-2 to let the port to branch-2
>> can
>> > be landed cleanly.
>> >
>> > On a mobile device so can not review the code or PR. Very busy these
>> days.
>> > And the health examination this morning told me that I had a high blood
>> > pressure. Not a good birthday present. Will get back to the issue when
>> > possible.
>> >
>> > Thanks.
>> >
>> > Stack 于2020年11月17日 周二06:34写道:
>> >
>> >>> On Sun, Nov 15, 2020 at 11:20 PM 张铎(Duo Zhang) > >
>> >>> wrote:
>> >>>
>> >>> So what is your purpose of distributing the request of region location
>> >>> lookup? It is just because you want to 'distribute the request of
>> region
>> >>> location lookup'?
>> >>>
>> >>> Then I'm -1 on merging. We should reach an agreement on what we want
>> to
>> >>> solve before merging at least.
>> >>>
>> >>>
>> >> HERE.1
>> >>
>> >>
>> >>> I've helped this issue from the design doc step. For me, the purpose
>> for
>> >>> this issue is clear. We want to prevent the hotspot of meta, so the
>> >>> solution is simple, enable meta replica, and then just modify the
>> client
>> >> to
>> >>> not always go to primary replica first(this is the old behavior even
>> with
>> >>> meta replica feature on).
>> >>> And this will introduce another problem that, there is no meta region
>> >>> replication implementation for meta read replicas, which means the
>> >> latency
>> >>> will be large as we can only sync the data between replicas through
>> >> region
>> >>> flush, so we implement meta region replication.
>> >>>
>> >>> So I think it is very important to verify that we have truly
>> distributed
>> >>> the request of region location lookup, and also make sure that we
>> could
>> >>> support more requests of region location lookup. Otherwise this
>> feature
>> >> is
>> >>> useless.
>> >>>
>> >>> And I agree with Andrew that, since the feature is default off on
>> >> branch-2
>> >>> and has no regression, it is OK to merge for now. Theoretically our
>> >>> approach here should work, so even it does not work for now, I think
>> we
>> >>> could fix the problems to make it work.
>> >>>
>> >>>
>> >> HERE.2
>> >>
>> >> I agree with all of the above between HERE.1 and HERE.2 (except the
>> >> suggestion that the old behavior of read replicas is that they went to
>> the
>> >> replica first; they go to the primary first -- see [1], [2]).
>> >>
>> >> Lets work with any misalignment of understanding/communication offline
>> and
>> >> not in the way of merge.
>> >>
>> >> Thanks,
>> >> S
>> >>
>> >> 1. http://hbase.apache.org/book.html#_timeline_consistency "In case a
>> read
>> >> is performed with Consistency.TIMELINE, then the read RPC will be sent
>> to
>> >> the primary region server first."
>> >> 2.
>> >>
>> >>
>> https://github.com/apache/hbase/blob/branch-2/hbase-client/src/main/java/org/apache/hadoop/hbase/cl

DISCUSS: Remove hbase-backup from master?

2020-11-19 Thread Stack
It strikes me as work that has been abandoned with no supporting developer.
It has had no improvement and few commits other than adjustment because a
backing dependency has changed since original contribution. It has not been
included in a release so has no users as yet. Does anyone use it or want
it? If not, I suggest we remove it.  I could file an issue for it to be
added to hbase-operator-tools for some gallant dev to pick up if they
wanted to use this backup work? (I could help w/ the migration).

What do others think?

S


[jira] [Reopened] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.

2020-11-19 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-25127:
---

[~clarax98007] I'm reopening this. I made a mistake when I committed it. I 
added [~zhangduo]  as a 'Signed-off-by' when he had a 'requested changes' mark 
still in place. I pushed the issue thinking the 'requested changes' addressed 
but this was the mistake apparently (my mistake). So, let me put up a new PR. 
Do you know what the 'requested changes' are? If not, lets figure them out. I 
can help. Once addressed we can ask [~zhangduo] to take a look. I'm around to 
help on this one. Sorry for the inconvenience.

> Enhance PerformanceEvaluation to profile meta replica performance.
> --
>
> Key: HBASE-25127
> URL: https://issues.apache.org/jira/browse/HBASE-25127
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Clara Xiong
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
> Attachments: Screen Shot 2020-11-13 at 5.30.11 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25309) Support start/stop replication server by scripts

2020-11-19 Thread Sun Xin (Jira)
Sun Xin created HBASE-25309:
---

 Summary: Support start/stop replication server by scripts
 Key: HBASE-25309
 URL: https://issues.apache.org/jira/browse/HBASE-25309
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)