[jira] [Resolved] (HBASE-28756) RegionSizeCalculator ignored the size of memstore, which leads Spark miss data

2024-07-26 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-28756.
-
Fix Version/s: 3.0.0-beta-2
   2.6.1
   2.5.11
   Resolution: Fixed

> RegionSizeCalculator ignored the size of memstore, which leads Spark miss data
> --
>
> Key: HBASE-28756
> URL: https://issues.apache.org/jira/browse/HBASE-28756
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 2.6.0, 3.0.0-beta-1, 2.5.10
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0-beta-2, 2.6.1, 2.5.11
>
>
> RegionSizeCalculator only considers the size of StoreFile and ignores the 
> size of MemStore. For a new region that has only been written to MemStore and 
> has not been flushed, will consider its size to be 0.
> When we use TableInputFormat to read HBase table data in Spark.
> {code:java}
> spark.sparkContext.newAPIHadoopRDD(
> conf,
> classOf[TableInputFormat],
> classOf[ImmutableBytesWritable],
> classOf[Result])
> }{code}
> Spark defaults to ignoring empty InputSplits, which is determined by the 
> configuration  "{{{}spark.hadoopRDD.ignoreEmptySplits{}}}".
> {code:java}
> private[spark] val HADOOP_RDD_IGNORE_EMPTY_SPLITS =
>   ConfigBuilder("spark.hadoopRDD.ignoreEmptySplits")
> .internal()
> .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for 
> empty input splits.")
> .version("2.3.0")
> .booleanConf
> .createWithDefault(true) {code}
> The above reasons lead to Spark missing data. So we should consider both the 
> size of the StoreFile and the MemStore in the RegionSizeCalculator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28756) RegionSizeCalculator ignored the size of memstore, which leads Spark miss data

2024-07-26 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868881#comment-17868881
 ] 

Sun Xin commented on HBASE-28756:
-

Thanks [~stoty] for the reminder, based on what you mentioned in SPARK-37660
{quote}I have encountered this.

There are several issues:
 - Hbase returns the HBase Region size, instead of the split size, which may 
not be the same.
 - HBase rounds the size to Megabytes.
 - Even if it didn't round to Megabytes, I suspect that it only tallies HFiles, 
so for new tables the size may still be zero until the first HFile is 
written.{quote}
This issue doesn't solve this problem completely. When we fetch data from HBase 
in Spark, we can only use scan directly instead of newAPIHadoopRDD, or set 
{color:#172b4d}spark.hadoopRDD.ignoreEmptySplits {color}to false.

> RegionSizeCalculator ignored the size of memstore, which leads Spark miss data
> --
>
> Key: HBASE-28756
> URL: https://issues.apache.org/jira/browse/HBASE-28756
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 2.6.0, 3.0.0-beta-1, 2.5.10
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>  Labels: pull-request-available
>
> RegionSizeCalculator only considers the size of StoreFile and ignores the 
> size of MemStore. For a new region that has only been written to MemStore and 
> has not been flushed, will consider its size to be 0.
> When we use TableInputFormat to read HBase table data in Spark.
> {code:java}
> spark.sparkContext.newAPIHadoopRDD(
> conf,
> classOf[TableInputFormat],
> classOf[ImmutableBytesWritable],
> classOf[Result])
> }{code}
> Spark defaults to ignoring empty InputSplits, which is determined by the 
> configuration  "{{{}spark.hadoopRDD.ignoreEmptySplits{}}}".
> {code:java}
> private[spark] val HADOOP_RDD_IGNORE_EMPTY_SPLITS =
>   ConfigBuilder("spark.hadoopRDD.ignoreEmptySplits")
> .internal()
> .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for 
> empty input splits.")
> .version("2.3.0")
> .booleanConf
> .createWithDefault(true) {code}
> The above reasons lead to Spark missing data. So we should consider both the 
> size of the StoreFile and the MemStore in the RegionSizeCalculator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28756) RegionSizeCalculator ignored the size of memstore, which leads Spark miss data

2024-07-26 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868879#comment-17868879
 ] 

Sun Xin commented on HBASE-28756:
-

Pushed to all active branch. Thanks [~zhangduo] [~ashwinpankaj] for reviewing.

> RegionSizeCalculator ignored the size of memstore, which leads Spark miss data
> --
>
> Key: HBASE-28756
> URL: https://issues.apache.org/jira/browse/HBASE-28756
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 2.6.0, 3.0.0-beta-1, 2.5.10
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>  Labels: pull-request-available
>
> RegionSizeCalculator only considers the size of StoreFile and ignores the 
> size of MemStore. For a new region that has only been written to MemStore and 
> has not been flushed, will consider its size to be 0.
> When we use TableInputFormat to read HBase table data in Spark.
> {code:java}
> spark.sparkContext.newAPIHadoopRDD(
> conf,
> classOf[TableInputFormat],
> classOf[ImmutableBytesWritable],
> classOf[Result])
> }{code}
> Spark defaults to ignoring empty InputSplits, which is determined by the 
> configuration  "{{{}spark.hadoopRDD.ignoreEmptySplits{}}}".
> {code:java}
> private[spark] val HADOOP_RDD_IGNORE_EMPTY_SPLITS =
>   ConfigBuilder("spark.hadoopRDD.ignoreEmptySplits")
> .internal()
> .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for 
> empty input splits.")
> .version("2.3.0")
> .booleanConf
> .createWithDefault(true) {code}
> The above reasons lead to Spark missing data. So we should consider both the 
> size of the StoreFile and the MemStore in the RegionSizeCalculator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28756) RegionSizeCalculator ignored the size of memstore, which leads Spark miss data

2024-07-24 Thread Sun Xin (Jira)
Sun Xin created HBASE-28756:
---

 Summary: RegionSizeCalculator ignored the size of memstore, which 
leads Spark miss data
 Key: HBASE-28756
 URL: https://issues.apache.org/jira/browse/HBASE-28756
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 2.5.10, 3.0.0-beta-1, 2.6.0
Reporter: Sun Xin
Assignee: Sun Xin


RegionSizeCalculator only considers the size of StoreFile and ignores the size 
of MemStore. For a new region that has only been written to MemStore and has 
not been flushed, will consider its size to be 0.

When we use TableInputFormat to read HBase table data in Spark.
{code:java}
spark.sparkContext.newAPIHadoopRDD(
conf,
classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result])
}{code}
Spark defaults to ignoring empty InputSplits, which is determined by the 
configuration  "{{{}spark.hadoopRDD.ignoreEmptySplits{}}}".
{code:java}
private[spark] val HADOOP_RDD_IGNORE_EMPTY_SPLITS =
  ConfigBuilder("spark.hadoopRDD.ignoreEmptySplits")
.internal()
.doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for 
empty input splits.")
.version("2.3.0")
.booleanConf
.createWithDefault(true) {code}
The above reasons lead to Spark missing data. So we should consider both the 
size of the StoreFile and the MemStore in the RegionSizeCalculator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28749) Remove the duplicate configurations named hbase.wal.batch.size

2024-07-22 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-28749.
-
Resolution: Fixed

> Remove the duplicate configurations named hbase.wal.batch.size
> --
>
> Key: HBASE-28749
> URL: https://issues.apache.org/jira/browse/HBASE-28749
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-beta-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0-beta-2
>
>
> The following code appears in two places: AsyncFSWAL and AbstractFSWAL
> {code:java}
> public static final String WAL_BATCH_SIZE = "hbase.wal.batch.size";
> public static final long DEFAULT_WAL_BATCH_SIZE = 64L * 1024; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28749) Remove the duplicate configurations named hbase.wal.batch.size

2024-07-22 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867921#comment-17867921
 ] 

Sun Xin commented on HBASE-28749:
-

Thanks [~zhangduo] and [~pankajkumar] for reviewing.

> Remove the duplicate configurations named hbase.wal.batch.size
> --
>
> Key: HBASE-28749
> URL: https://issues.apache.org/jira/browse/HBASE-28749
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-beta-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0-beta-2
>
>
> The following code appears in two places: AsyncFSWAL and AbstractFSWAL
> {code:java}
> public static final String WAL_BATCH_SIZE = "hbase.wal.batch.size";
> public static final long DEFAULT_WAL_BATCH_SIZE = 64L * 1024; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28749) Remove the duplicate configurations named hbase.wal.batch.size

2024-07-22 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867920#comment-17867920
 ] 

Sun Xin commented on HBASE-28749:
-

Pushed to master and branch-3.

> Remove the duplicate configurations named hbase.wal.batch.size
> --
>
> Key: HBASE-28749
> URL: https://issues.apache.org/jira/browse/HBASE-28749
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-beta-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0-beta-2
>
>
> The following code appears in two places: AsyncFSWAL and AbstractFSWAL
> {code:java}
> public static final String WAL_BATCH_SIZE = "hbase.wal.batch.size";
> public static final long DEFAULT_WAL_BATCH_SIZE = 64L * 1024; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28749) Remove the duplicate configurations named hbase.wal.batch.size

2024-07-22 Thread Sun Xin (Jira)
Sun Xin created HBASE-28749:
---

 Summary: Remove the duplicate configurations named 
hbase.wal.batch.size
 Key: HBASE-28749
 URL: https://issues.apache.org/jira/browse/HBASE-28749
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 3.0.0-beta-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-beta-2


The following code appears in two places: AsyncFSWAL and AbstractFSWAL
{code:java}
public static final String WAL_BATCH_SIZE = "hbase.wal.batch.size";
public static final long DEFAULT_WAL_BATCH_SIZE = 64L * 1024; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28330) TestUnknownServers.testListUnknownServers is flaky in branch-2

2024-01-25 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-28330.
-
Fix Version/s: 2.6.0
   2.5.8
   Resolution: Fixed

Pushed to branch-2, branch-2.5, branch-2.6. Thanks for the review [~zhangduo] 

> TestUnknownServers.testListUnknownServers is flaky in branch-2
> --
>
> Key: HBASE-28330
> URL: https://issues.apache.org/jira/browse/HBASE-28330
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.5.7
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 2.6.0, 2.5.8
>
>
> {code:java}
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.913 
> s <<< FAILURE! - in org.apache.hadoop.hbase.master.TestUnknownServers
> [ERROR] 
> org.apache.hadoop.hbase.master.TestUnknownServers.testListUnknownServers  
> Time elapsed: 0.204 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<2> {code}
> The value of TestUnknownServers.SLAVES is different between 
> [branch-2|https://github.com/apache/hbase/blob/68bc533f7116cedc681704b82319e5793b827621/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestUnknownServers.java#L44]
>  and 
> [master|https://github.com/apache/hbase/blob/b87b05c847f00c292664d894c21f83c73d48460d/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestUnknownServers.java#L43].
> It is 1 in master but 2 in branch-2.
> The RegionServer marked UNKNOWN_SERVER is the one that *holds regions* but is 
> not tracked by the ServerManager.
> Please see HMaster.getUnknownServers
> {code:java}
> private List getUnknownServers() {
>   if (serverManager != null) {
> final Set serverNames = 
> getAssignmentManager().getRegionStates().getRegionStates()
>   .stream().map(RegionState::getServerName).collect(Collectors.toSet());
> final List unknownServerNames = serverNames.stream()
>   .filter(sn -> sn != null && 
> serverManager.isServerUnknown(sn)).collect(Collectors.toList());
> return unknownServerNames;
>   }
>   return null;
> } {code}
> In UT TestUnknownServers.testListUnknownServers, we start a HBase cluster 
> with 2 RegionServer, if all region are assigned to ONE server, then only that 
> server is called UNKNOWN_SERVER, the UT will fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28330) TestUnknownServers.testListUnknownServers is flaky in branch-2

2024-01-25 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-28330:

Description: 
{code:java}
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.913 s 
<<< FAILURE! - in org.apache.hadoop.hbase.master.TestUnknownServers
[ERROR] 
org.apache.hadoop.hbase.master.TestUnknownServers.testListUnknownServers  Time 
elapsed: 0.204 s  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<2> {code}
The value of TestUnknownServers.SLAVES is different between 
[branch-2|https://github.com/apache/hbase/blob/68bc533f7116cedc681704b82319e5793b827621/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestUnknownServers.java#L44]
 and 
[master|https://github.com/apache/hbase/blob/b87b05c847f00c292664d894c21f83c73d48460d/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestUnknownServers.java#L43].

It is 1 in master but 2 in branch-2.

The RegionServer marked UNKNOWN_SERVER is the one that *holds regions* but is 
not tracked by the ServerManager.

Please see HMaster.getUnknownServers
{code:java}
private List getUnknownServers() {
  if (serverManager != null) {
final Set serverNames = 
getAssignmentManager().getRegionStates().getRegionStates()
  .stream().map(RegionState::getServerName).collect(Collectors.toSet());
final List unknownServerNames = serverNames.stream()
  .filter(sn -> sn != null && 
serverManager.isServerUnknown(sn)).collect(Collectors.toList());
return unknownServerNames;
  }
  return null;
} {code}
In UT TestUnknownServers.testListUnknownServers, we start a HBase cluster with 
2 RegionServer, if all region are assigned to ONE server, then only that server 
is called UNKNOWN_SERVER, the UT will fail.

  was:
{code:java}
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.913 s 
<<< FAILURE! - in org.apache.hadoop.hbase.master.TestUnknownServers
[ERROR] 
org.apache.hadoop.hbase.master.TestUnknownServers.testListUnknownServers  Time 
elapsed: 0.204 s  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<2> {code}
The value of TestUnknownServers.SLAVES is different between branch-2 and master.

It is 1 in master but 2 in branch-2.

The RegionServer marked UNKNOWN_SERVER is the one that *holds regions* but is 
not tracked by the ServerManager.

Please see HMaster.getUnknownServers
{code:java}
private List getUnknownServers() {
  if (serverManager != null) {
final Set serverNames = 
getAssignmentManager().getRegionStates().getRegionStates()
  .stream().map(RegionState::getServerName).collect(Collectors.toSet());
final List unknownServerNames = serverNames.stream()
  .filter(sn -> sn != null && 
serverManager.isServerUnknown(sn)).collect(Collectors.toList());
return unknownServerNames;
  }
  return null;
} {code}
In UT TestUnknownServers.testListUnknownServers, we start a HBase cluster with 
2 RegionServer, if all region are assigned to ONE server, then only that server 
is called UNKNOWN_SERVER, the UT will fail.


> TestUnknownServers.testListUnknownServers is flaky in branch-2
> --
>
> Key: HBASE-28330
> URL: https://issues.apache.org/jira/browse/HBASE-28330
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.5.7
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>
> {code:java}
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.913 
> s <<< FAILURE! - in org.apache.hadoop.hbase.master.TestUnknownServers
> [ERROR] 
> org.apache.hadoop.hbase.master.TestUnknownServers.testListUnknownServers  
> Time elapsed: 0.204 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<2> {code}
> The value of TestUnknownServers.SLAVES is different between 
> [branch-2|https://github.com/apache/hbase/blob/68bc533f7116cedc681704b82319e5793b827621/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestUnknownServers.java#L44]
>  and 
> [master|https://github.com/apache/hbase/blob/b87b05c847f00c292664d894c21f83c73d48460d/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestUnknownServers.java#L43].
> It is 1 in master but 2 in branch-2.
> The RegionServer marked UNKNOWN_SERVER is the one that *holds regions* but is 
> not tracked by the ServerManager.
> Please see HMaster.getUnknownServers
> {code:java}
> private List getUnknownServers() {
>   if (serverManager != null) {
> final Set serverNames = 
> getAssignmentManager().getRegionStates().getRegionStates()
>   .stream().map(RegionState::getServerName).collect(Collectors.toSet());
> final List unknownServerNames = serverNames.stream()
>   .filter(sn -> sn != null && 
> serverManager.isServerUnknown(sn)).collect(Collectors.toList());
> return unknownServerNames;
>   }
>   return null;
> } 

[jira] [Created] (HBASE-28330) TestUnknownServers.testListUnknownServers is flaky in branch-2

2024-01-25 Thread Sun Xin (Jira)
Sun Xin created HBASE-28330:
---

 Summary: TestUnknownServers.testListUnknownServers is flaky in 
branch-2
 Key: HBASE-28330
 URL: https://issues.apache.org/jira/browse/HBASE-28330
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.5.7
Reporter: Sun Xin
Assignee: Sun Xin


{code:java}
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.913 s 
<<< FAILURE! - in org.apache.hadoop.hbase.master.TestUnknownServers
[ERROR] 
org.apache.hadoop.hbase.master.TestUnknownServers.testListUnknownServers  Time 
elapsed: 0.204 s  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<2> {code}
The value of TestUnknownServers.SLAVES is different between branch-2 and master.

It is 1 in master but 2 in branch-2.

The RegionServer marked UNKNOWN_SERVER is the one that *holds regions* but is 
not tracked by the ServerManager.

Please see HMaster.getUnknownServers
{code:java}
private List getUnknownServers() {
  if (serverManager != null) {
final Set serverNames = 
getAssignmentManager().getRegionStates().getRegionStates()
  .stream().map(RegionState::getServerName).collect(Collectors.toSet());
final List unknownServerNames = serverNames.stream()
  .filter(sn -> sn != null && 
serverManager.isServerUnknown(sn)).collect(Collectors.toList());
return unknownServerNames;
  }
  return null;
} {code}
In UT TestUnknownServers.testListUnknownServers, we start a HBase cluster with 
2 RegionServer, if all region are assigned to ONE server, then only that server 
is called UNKNOWN_SERVER, the UT will fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28324) TestRegionNormalizerWorkQueue#testTake is flaky

2024-01-21 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-28324.
-
Fix Version/s: 2.6.0
   2.4.18
   2.5.8
   3.0.0-beta-2
   Resolution: Fixed

Pushed to all active branches. Thanks for the review [~zhangduo] 

> TestRegionNormalizerWorkQueue#testTake is flaky
> ---
>
> Key: HBASE-28324
> URL: https://issues.apache.org/jira/browse/HBASE-28324
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-beta-1, 2.5.7
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 2.5.8, 3.0.0-beta-2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28324) TestRegionNormalizerWorkQueue#testTake is flaky

2024-01-19 Thread Sun Xin (Jira)
Sun Xin created HBASE-28324:
---

 Summary: TestRegionNormalizerWorkQueue#testTake is flaky
 Key: HBASE-28324
 URL: https://issues.apache.org/jira/browse/HBASE-28324
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.5.7, 3.0.0-beta-1
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27469) IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when dropping a table

2022-11-14 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-27469.
-
Fix Version/s: 2.5.2
   2.4.16
   (was: 2.6.0)
   Resolution: Fixed

> IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when 
> dropping a table
> 
>
> Key: HBASE-27469
> URL: https://issues.apache.org/jira/browse/HBASE-27469
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-3, 2.5.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-4, 2.5.2, 2.4.16
>
>
> If enabled the feature about scan snapshot and grant the permissions of a 
> table and a namespace to the same user, an IllegalArgumentException will be 
> thrown when droping tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27469) IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when dropping a table

2022-11-14 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17634166#comment-17634166
 ] 

Sun Xin commented on HBASE-27469:
-

Thanks [~zhangduo] for reviewing.

Pushed to branch-2, branch-2.4, branch-2.5.

> IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when 
> dropping a table
> 
>
> Key: HBASE-27469
> URL: https://issues.apache.org/jira/browse/HBASE-27469
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-3, 2.5.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> If enabled the feature about scan snapshot and grant the permissions of a 
> table and a namespace to the same user, an IllegalArgumentException will be 
> thrown when droping tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27476) Recovered replication may be blocked if enabled hbase.separate.oldlogdir.by.regionserver

2022-11-08 Thread Sun Xin (Jira)
Sun Xin created HBASE-27476:
---

 Summary: Recovered replication may be blocked if enabled 
hbase.separate.oldlogdir.by.regionserver
 Key: HBASE-27476
 URL: https://issues.apache.org/jira/browse/HBASE-27476
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.4.15, 3.0.0-alpha-3
Reporter: Sun Xin
Assignee: Sun Xin


In other PR, I got a failed UT
{code:java}
[ERROR] Failures: 
[ERROR] 
org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSWithSeparateOldWALs.killOneMasterRS
[ERROR]   Run 1: 
TestReplicationKillMasterRSWithSeparateOldWALs>TestReplicationKillMasterRS.killOneMasterRS:47->TestReplicationKillRS.loadTableAndKillRS:84
 Waited too much time for queueFailover replication. Waited 61065ms.
[ERROR]   Run 2: 
TestReplicationKillMasterRSWithSeparateOldWALs>TestReplicationKillMasterRS.killOneMasterRS:47->TestReplicationKillRS.loadTableAndKillRS:84
 Waited too much time for queueFailover replication. Waited 58864ms.
[ERROR]   Run 3: 
TestReplicationKillMasterRSWithSeparateOldWALs>TestReplicationKillMasterRS.killOneMasterRS:47->TestReplicationKillRS.loadTableAndKillRS:84
 Waited too much time for queueFailover replication. Waited 57103ms. {code}
This should be caused by a bug.

If enabled {_}hbase.separate.oldlogdir.by.regionserver{_}, old wals will be 
moved into different dir by regionserver name like root/oldWALs/server1/wal1 . 
For recovered replication,  can't convert wal path(like root/oldWALs/wal1) into 
such paths, and throws FileNotFoundException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HBASE-27469) IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when dropping a table

2022-11-07 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-27469 started by Sun Xin.
---
> IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when 
> dropping a table
> 
>
> Key: HBASE-27469
> URL: https://issues.apache.org/jira/browse/HBASE-27469
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-3, 2.5.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> If enabled the feature about scan snapshot and grant the permissions of a 
> table and a namespace to the same user, an IllegalArgumentException will be 
> thrown when droping tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27469) IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when dropping a table

2022-11-07 Thread Sun Xin (Jira)
Sun Xin created HBASE-27469:
---

 Summary: IllegalArgumentException is thrown by 
SnapshotScannerHDFSAclController when dropping a table
 Key: HBASE-27469
 URL: https://issues.apache.org/jira/browse/HBASE-27469
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.5.1, 3.0.0-alpha-3
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 2.6.0, 3.0.0-alpha-4


If enabled the feature about scan snapshot and grant the permissions of a table 
and a namespace to the same user, an IllegalArgumentException will be thrown 
when droping tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27354) EOF thrown by WALEntryStream causes replication blocking

2022-09-01 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-27354:

Description: 
In 
[WALEntryStream#readNextEntryAndRecordReaderPosition|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L257],
 it is possible that we read uncommitted data.  If we read beyond the committed 
file length, then reopen inputStream and seek back.

In our use, we found that the position where seek back may be exactly the 
length of the file being written, which may cause EOF.

The thrown EOF is finally caught 
[ReplicationSourceWALReader.run|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L158],
 but 
[totalBufferUsed|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L78]
 is not cleanup up.

After a long run, all peers will go slow and eventually block completely.

  was:
In 
[WALEntryStream#readNextEntryAndRecordReaderPosition|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L257],
 it is possible that we read uncommitted data.  If we read beyond the committed 
file length, then reopen the 

inputStream and seek back.

In our use, we found that the position where seek back may be exactly the 
length of the file  being written, which may cause EOF.

The thrown EOF is finally caught 
[ReplicationSourceWALReader.run|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L158],
 but 
[totalBufferUsed|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L78]
 is not cleanup up.

After a long run, all peers will go slow and eventually block completely.


> EOF thrown by WALEntryStream causes replication blocking
> 
>
> Key: HBASE-27354
> URL: https://issues.apache.org/jira/browse/HBASE-27354
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.14
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>
> In 
> [WALEntryStream#readNextEntryAndRecordReaderPosition|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L257],
>  it is possible that we read uncommitted data.  If we read beyond the 
> committed file length, then reopen inputStream and seek back.
> In our use, we found that the position where seek back may be exactly the 
> length of the file being written, which may cause EOF.
> The thrown EOF is finally caught 
> [ReplicationSourceWALReader.run|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L158],
>  but 
> [totalBufferUsed|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L78]
>  is not cleanup up.
> After a long run, all peers will go slow and eventually block completely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27354) EOF thrown by WALEntryStream causes replication blocking

2022-09-01 Thread Sun Xin (Jira)
Sun Xin created HBASE-27354:
---

 Summary: EOF thrown by WALEntryStream causes replication blocking
 Key: HBASE-27354
 URL: https://issues.apache.org/jira/browse/HBASE-27354
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.4.14, 3.0.0-alpha-3, 2.5.0, 2.6.0
Reporter: Sun Xin
Assignee: Sun Xin


In 
[WALEntryStream#readNextEntryAndRecordReaderPosition|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L257],
 it is possible that we read uncommitted data.  If we read beyond the committed 
file length, then reopen the 

inputStream and seek back.

In our use, we found that the position where seek back may be exactly the 
length of the file  being written, which may cause EOF.

The thrown EOF is finally caught 
[ReplicationSourceWALReader.run|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L158],
 but 
[totalBufferUsed|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L78]
 is not cleanup up.

After a long run, all peers will go slow and eventually block completely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-26956) ExportSnapshot tool supports removing TTL

2022-06-20 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-26956.
-
Fix Version/s: 2.5.0
   2.6.0
   Resolution: Done

Pushed to branch-2 and branch-2.5

> ExportSnapshot tool supports removing TTL
> -
>
> Key: HBASE-26956
> URL: https://issues.apache.org/jira/browse/HBASE-26956
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> In our scenario, we use ExportSnapshot to copy snapshots to cold storage like 
> S3. But when we restored back to HBase cluster, it will be deleted directly 
> because TTL is set.
> So we need ExportSnapshot tool support removing TTL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Reopened] (HBASE-26956) ExportSnapshot tool supports removing TTL

2022-06-15 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin reopened HBASE-26956:
-

Will close this issue after porting to branch-2.x.

> ExportSnapshot tool supports removing TTL
> -
>
> Key: HBASE-26956
> URL: https://issues.apache.org/jira/browse/HBASE-26956
> Project: HBase
>  Issue Type: New Feature
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-3
>
>
> In our scenario, we use ExportSnapshot to copy snapshots to cold storage like 
> S3. But when we restored back to HBase cluster, it will be deleted directly 
> because TTL is set.
> So we need ExportSnapshot tool support removing TTL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26956) ExportSnapshot tool supports removing TTL

2022-06-15 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554518#comment-17554518
 ] 

Sun Xin commented on HBASE-26956:
-

Using this issue is OK, I submitted PR to port to branch-2.

> ExportSnapshot tool supports removing TTL
> -
>
> Key: HBASE-26956
> URL: https://issues.apache.org/jira/browse/HBASE-26956
> Project: HBase
>  Issue Type: New Feature
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-3
>
>
> In our scenario, we use ExportSnapshot to copy snapshots to cold storage like 
> S3. But when we restored back to HBase cluster, it will be deleted directly 
> because TTL is set.
> So we need ExportSnapshot tool support removing TTL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26956) ExportSnapshot tool supports removing TTL

2022-06-15 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554439#comment-17554439
 ] 

Sun Xin commented on HBASE-26956:
-

[~zhangduo] We need, I'll port to branch-2.x later

> ExportSnapshot tool supports removing TTL
> -
>
> Key: HBASE-26956
> URL: https://issues.apache.org/jira/browse/HBASE-26956
> Project: HBase
>  Issue Type: New Feature
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-3
>
>
> In our scenario, we use ExportSnapshot to copy snapshots to cold storage like 
> S3. But when we restored back to HBase cluster, it will be deleted directly 
> because TTL is set.
> So we need ExportSnapshot tool support removing TTL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-26956) ExportSnapshot tool supports removing TTL

2022-06-15 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-26956.
-
Fix Version/s: 3.0.0-alpha-4
 Release Note: ExportSnapshot tool support removing TTL of snapshot. If we 
use the ExportSnapshot tool to recover snapshot with TTL from cold storage to 
hbase cluster, we can set `-reset-ttl` to prevent snapshot from being deleted 
immediately.
   Resolution: Done

Thanks for the review.[~zhangduo] 

> ExportSnapshot tool supports removing TTL
> -
>
> Key: HBASE-26956
> URL: https://issues.apache.org/jira/browse/HBASE-26956
> Project: HBase
>  Issue Type: New Feature
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-4
>
>
> In our scenario, we use ExportSnapshot to copy snapshots to cold storage like 
> S3. But when we restored back to HBase cluster, it will be deleted directly 
> because TTL is set.
> So we need ExportSnapshot tool support removing TTL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26956) ExportSnapshot tool supports removing TTL

2022-04-15 Thread Sun Xin (Jira)
Sun Xin created HBASE-26956:
---

 Summary: ExportSnapshot tool supports removing TTL
 Key: HBASE-26956
 URL: https://issues.apache.org/jira/browse/HBASE-26956
 Project: HBase
  Issue Type: New Feature
Reporter: Sun Xin
Assignee: Sun Xin


In our scenario, we use ExportSnapshot to copy snapshots to cold storage like 
S3. But when we restored back to HBase cluster, it will be deleted directly 
because TTL is set.

So we need ExportSnapshot tool support removing TTL.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26406) Can not add peer replicating to non-HBase

2021-11-02 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-26406.
-
Fix Version/s: 2.4.9
   3.0.0-alpha-2
   Resolution: Fixed

Pushed to master and 2.x branchs. Thank all for reviewing.

> Can not add peer replicating to non-HBase
> -
>
> Key: HBASE-26406
> URL: https://issues.apache.org/jira/browse/HBASE-26406
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-2, 2.4.9
>
>
> Failed to add a peer replicating to non-HBase(like MQ) by implementing custom 
> ReplicationEndpoint,  got exception like this in my UT: 
> {code:java}
> 2021-10-29T15:14:47,632 INFO  [RPCClient-NioEventLoopGroup-5-3] 
> client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: 
> ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not 
> replicate to itself for 
> HBaseInterClusterReplicationEndpoint2021-10-29T15:14:47,632 INFO  
> [RPCClient-NioEventLoopGroup-5-3] 
> client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: 
> ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not 
> replicate to itself for HBaseInterClusterReplicationEndpoint
> org.apache.hadoop.hbase.DoNotRetryIOException: Invalid cluster key: , should 
> not replicate to itself for HBaseInterClusterReplicationEndpoint
>  at java.lang.Thread.getStackTrace(Thread.java:1559) at 
> org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:130) 
> at org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:149) at 
> org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:186) at 
> org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1948) at 
> org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1936) at 
> org.apache.hadoop.hbase.replication.TestNonHBaseReplicationEndpoint.test(TestNonHBaseReplicationEndpoint.java:97)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38) 
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748) at Future.get(Unknown 
> Source) at 
> org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkClusterId(ReplicationPeerManager.java:527)
>  at 
> org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkPeerConfig(ReplicationPeerManager.java:367)
>  at 
> org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.preAddPeer(ReplicationPeerManager.java:123)
>  at 
> org.apache.hadoop.hbase.master.replication.AddPeerProcedure.prePeerModification(AddPeerProcedure.java:101)
>  at 
> org.apache.hadoop.hbase.master.replication.ModifyPeerProcedure.executeFromState(ModifyPeerProcedure.java:162)
>  at 
> 

[jira] [Updated] (HBASE-26406) Can not add peer replicating to non-HBase

2021-10-29 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-26406:

Description: 
Failed to add a peer replicating to non-HBase(like MQ) by implementing custom 
ReplicationEndpoint,  got exception like this in my UT: 
{code:java}
2021-10-29T15:14:47,632 INFO  [RPCClient-NioEventLoopGroup-5-3] 
client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: 
ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not 
replicate to itself for 
HBaseInterClusterReplicationEndpoint2021-10-29T15:14:47,632 INFO  
[RPCClient-NioEventLoopGroup-5-3] 
client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: 
ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not 
replicate to itself for HBaseInterClusterReplicationEndpoint
org.apache.hadoop.hbase.DoNotRetryIOException: Invalid cluster key: , should 
not replicate to itself for HBaseInterClusterReplicationEndpoint
 at java.lang.Thread.getStackTrace(Thread.java:1559) at 
org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:130) at 
org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:149) at 
org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:186) at 
org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1948) at 
org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1936) at 
org.apache.hadoop.hbase.replication.TestNonHBaseReplicationEndpoint.test(TestNonHBaseReplicationEndpoint.java:97)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38) at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748) at Future.get(Unknown 
Source) at 
org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkClusterId(ReplicationPeerManager.java:527)
 at 
org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkPeerConfig(ReplicationPeerManager.java:367)
 at 
org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.preAddPeer(ReplicationPeerManager.java:123)
 at 
org.apache.hadoop.hbase.master.replication.AddPeerProcedure.prePeerModification(AddPeerProcedure.java:101)
 at 
org.apache.hadoop.hbase.master.replication.ModifyPeerProcedure.executeFromState(ModifyPeerProcedure.java:162)
 at 
org.apache.hadoop.hbase.master.replication.ModifyPeerProcedure.executeFromState(ModifyPeerProcedure.java:43)
 at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:190)
 at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953) 
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1667)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
{code}
HBASE-24743 ignored this situation and 

[jira] [Created] (HBASE-26406) Can not add peer replicating to non-HBase

2021-10-29 Thread Sun Xin (Jira)
Sun Xin created HBASE-26406:
---

 Summary: Can not add peer replicating to non-HBase
 Key: HBASE-26406
 URL: https://issues.apache.org/jira/browse/HBASE-26406
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.4.0, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


Failed to add a peer replicating to non-HBase(like MQ) by implementing custom 
ReplicationEndpoint,  got exception like this in my UT: 
{code:java}
2021-10-29T15:14:47,632 INFO  [RPCClient-NioEventLoopGroup-5-3] 
client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: 
ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not 
replicate to itself for 
HBaseInterClusterReplicationEndpoint2021-10-29T15:14:47,632 INFO  
[RPCClient-NioEventLoopGroup-5-3] 
client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: 
ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not 
replicate to itself for HBaseInterClusterReplicationEndpoint
org.apache.hadoop.hbase.DoNotRetryIOException: Invalid cluster key: , should 
not replicate to itself for HBaseInterClusterReplicationEndpoint
 at java.lang.Thread.getStackTrace(Thread.java:1559) at 
org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:130) at 
org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:149) at 
org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:186) at 
org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1948) at 
org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1936) at 
org.apache.hadoop.hbase.replication.TestNonHBaseReplicationEndpoint.test(TestNonHBaseReplicationEndpoint.java:97)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38) at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748) at Future.get(Unknown 
Source) at 
org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkClusterId(ReplicationPeerManager.java:527)
 at 
org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkPeerConfig(ReplicationPeerManager.java:367)
 at 
org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.preAddPeer(ReplicationPeerManager.java:123)
 at 
org.apache.hadoop.hbase.master.replication.AddPeerProcedure.prePeerModification(AddPeerProcedure.java:101)
 at 
org.apache.hadoop.hbase.master.replication.ModifyPeerProcedure.executeFromState(ModifyPeerProcedure.java:162)
 at 
org.apache.hadoop.hbase.master.replication.ModifyPeerProcedure.executeFromState(ModifyPeerProcedure.java:43)
 at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:190)
 at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953) 
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1667)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
 at 

[jira] [Resolved] (HBASE-25773) TestSnapshotScannerHDFSAclController.setupBeforeClass is flaky

2021-09-02 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25773.
-
Resolution: Fixed

Pushed to branch-2 and master, thanks [~zhangduo] for reviewing.

> TestSnapshotScannerHDFSAclController.setupBeforeClass is flaky
> --
>
> Key: HBASE-25773
> URL: https://issues.apache.org/jira/browse/HBASE-25773
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xiaolin Ha
>Assignee: Sun Xin
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3140/2/testReport/org.apache.hadoop.hbase.security.access/TestSnapshotScannerHDFSAclController/precommit_checks___yetus_jdk8_Hadoop3_checks__/]
> SnapshotScannerHDFSAclController.postStartMaster alters hbase:acl to add a 
> new cf "m", but 
> `TestSnapshotScannerHDFSAclController.setupBeforeClass(TestSnapshotScannerHDFSAclController.java:101)`
>  fails before the disable and enable hbase:acl complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25773) TestSnapshotScannerHDFSAclController.setupBeforeClass is flaky

2021-08-31 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407131#comment-17407131
 ] 

Sun Xin commented on HBASE-25773:
-

[~Xiaolin Ha] Is this issue still be working on? Can assign to me and let me 
try?

> TestSnapshotScannerHDFSAclController.setupBeforeClass is flaky
> --
>
> Key: HBASE-25773
> URL: https://issues.apache.org/jira/browse/HBASE-25773
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3140/2/testReport/org.apache.hadoop.hbase.security.access/TestSnapshotScannerHDFSAclController/precommit_checks___yetus_jdk8_Hadoop3_checks__/]
> SnapshotScannerHDFSAclController.postStartMaster alters hbase:acl to add a 
> new cf "m", but 
> `TestSnapshotScannerHDFSAclController.setupBeforeClass(TestSnapshotScannerHDFSAclController.java:101)`
>  fails before the disable and enable hbase:acl complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26194) Introduce a ReplicationServerSourceManager to simplify HReplicationServer

2021-08-17 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-26194.
-
Resolution: Done

Merged. Thank [~stack] for reviewing.

> Introduce a ReplicationServerSourceManager to simplify HReplicationServer
> -
>
> Key: HBASE-26194
> URL: https://issues.apache.org/jira/browse/HBASE-26194
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Affects Versions: 3.0.0-alpha-2
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26194) Introduce a ReplicationServerSourceManager to simplify HReplicationServer

2021-08-12 Thread Sun Xin (Jira)
Sun Xin created HBASE-26194:
---

 Summary: Introduce a ReplicationServerSourceManager to simplify 
HReplicationServer
 Key: HBASE-26194
 URL: https://issues.apache.org/jira/browse/HBASE-26194
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 3.0.0-alpha-2
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-2






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26084) Add owner of replication queue for ReplicationQueueInfo

2021-08-12 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-26084.
-
Fix Version/s: 3.0.0-alpha-2
   Resolution: Done

Merged.

Thank [~stack] [~zhangduo] for reviewing.

> Add owner of replication queue for ReplicationQueueInfo
> ---
>
> Key: HBASE-26084
> URL: https://issues.apache.org/jira/browse/HBASE-26084
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>
> The current ReplicationQueueInfo only has queueId, which is not enough to 
> distinguish queues in ReplicationServer,  so we need to add the RS holding 
> the queue for ReplicationQueueInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26084) Add owner of replication queue for ReplicationQueueInfo

2021-07-13 Thread Sun Xin (Jira)
Sun Xin created HBASE-26084:
---

 Summary: Add owner of replication queue for ReplicationQueueInfo
 Key: HBASE-26084
 URL: https://issues.apache.org/jira/browse/HBASE-26084
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


The current ReplicationQueueInfo only has queueId, which is not enough to 
distinguish queues in ReplicationServer,  so we need to add the RS holding the 
queue for ReplicationQueueInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25110) Add heartbeat for ReplicationServer and dispatch replication sources to ReplicationServer

2021-07-09 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377923#comment-17377923
 ] 

Sun Xin commented on HBASE-25110:
-

Divide this issue into two to achieve, HBASE-26077 and HBASE-26078

> Add heartbeat for ReplicationServer and dispatch replication sources to 
> ReplicationServer
> -
>
> Key: HBASE-25110
> URL: https://issues.apache.org/jira/browse/HBASE-25110
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Sun Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25110) Add heartbeat for ReplicationServer and dispatch replication sources to ReplicationServer

2021-07-09 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25110.
-
Release Note: Divide this issue into two to achieve, HBASE-26077 and 
HBASE-26078
  Resolution: Incomplete

> Add heartbeat for ReplicationServer and dispatch replication sources to 
> ReplicationServer
> -
>
> Key: HBASE-25110
> URL: https://issues.apache.org/jira/browse/HBASE-25110
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Sun Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26078) Dispatch replication sources to ReplicationServer

2021-07-09 Thread Sun Xin (Jira)
Sun Xin created HBASE-26078:
---

 Summary: Dispatch replication sources to ReplicationServer
 Key: HBASE-26078
 URL: https://issues.apache.org/jira/browse/HBASE-26078
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26077) Add heartbeat for ReplicationServer

2021-07-09 Thread Sun Xin (Jira)
Sun Xin created HBASE-26077:
---

 Summary: Add heartbeat for ReplicationServer
 Key: HBASE-26077
 URL: https://issues.apache.org/jira/browse/HBASE-26077
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25807) Move method reportProcedureDone from RegionServerStatus.proto to Master.proto

2021-05-23 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25807.
-
Fix Version/s: 3.0.0-alpha-1
   Resolution: Done

Merged. Thank [~zhangduo] for reviewing.

> Move method reportProcedureDone from RegionServerStatus.proto to Master.proto
> -
>
> Key: HBASE-25807
> URL: https://issues.apache.org/jira/browse/HBASE-25807
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> We next need use the procedure mechanism to implement enable/disable/refresh 
> peer, and  ReplicationServer also needs reportProcedureDone to master, so I 
> hope to move method reportProcedureDone to Master.proto from 
> RegionServerStatus.proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25820) Find a way to know whether logQueue goes empty when ReplicationSource is running on ReplicationServer

2021-04-28 Thread Sun Xin (Jira)
Sun Xin created HBASE-25820:
---

 Summary: Find a way to know whether logQueue goes empty when 
ReplicationSource is running on ReplicationServer
 Key: HBASE-25820
 URL: https://issues.apache.org/jira/browse/HBASE-25820
 Project: HBase
  Issue Type: Sub-task
Reporter: Sun Xin


HBASE-25110 we choose to use ZK to notify ReplicationServer that a new wal was 
generated, this is asynchronous. And then we got a problem, the shipper thread 
and the wal reader thread may go terminated as logQueue goes empty before 
receiving the notification of new wal.

So we now need find a way to know whether logQueue is really empty after the 
last wal in logQueue is consumed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24737) Find a way to resolve WALFileLengthProvider#getLogFileSizeIfBeingWritten problem

2021-04-26 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-24737.
-
Resolution: Done

> Find a way to resolve WALFileLengthProvider#getLogFileSizeIfBeingWritten 
> problem
> 
>
> Key: HBASE-24737
> URL: https://issues.apache.org/jira/browse/HBASE-24737
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Sun Xin
>Priority: Major
>
> Now we use WALFileLengthProvider#getLogFileSizeIfBeingWritten to get the 
> synced wal length and prevent replicating unacked log entries. But after 
> offload ReplicationSource to new ReplicationServer, we need a new way to 
> resolve this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24737) Find a way to resolve WALFileLengthProvider#getLogFileSizeIfBeingWritten problem

2021-04-26 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332899#comment-17332899
 ] 

Sun Xin commented on HBASE-24737:
-

Thank [~zhangduo] for reviewing.

Failed UTs are not related, I will fix these UTs in HBASE-25110.

> Find a way to resolve WALFileLengthProvider#getLogFileSizeIfBeingWritten 
> problem
> 
>
> Key: HBASE-24737
> URL: https://issues.apache.org/jira/browse/HBASE-24737
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Sun Xin
>Priority: Major
>
> Now we use WALFileLengthProvider#getLogFileSizeIfBeingWritten to get the 
> synced wal length and prevent replicating unacked log entries. But after 
> offload ReplicationSource to new ReplicationServer, we need a new way to 
> resolve this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25807) Move method reportProcedureDone from RegionServerStatus.proto to Master.proto

2021-04-23 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin reassigned HBASE-25807:
---

Assignee: Sun Xin

> Move method reportProcedureDone from RegionServerStatus.proto to Master.proto
> -
>
> Key: HBASE-25807
> URL: https://issues.apache.org/jira/browse/HBASE-25807
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>
> We next need use the procedure mechanism to implement enable/disable/refresh 
> peer, and  ReplicationServer also needs reportProcedureDone to master, so I 
> hope to move method reportProcedureDone to Master.proto from 
> RegionServerStatus.proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25807) Move method reportProcedureDone from RegionServerStatus.proto to Master.proto

2021-04-23 Thread Sun Xin (Jira)
Sun Xin created HBASE-25807:
---

 Summary: Move method reportProcedureDone from 
RegionServerStatus.proto to Master.proto
 Key: HBASE-25807
 URL: https://issues.apache.org/jira/browse/HBASE-25807
 Project: HBase
  Issue Type: Sub-task
Reporter: Sun Xin


We next need use the procedure mechanism to implement enable/disable/refresh 
peer, and  ReplicationServer also needs reportProcedureDone to master, so I 
hope to move method reportProcedureDone to Master.proto from 
RegionServerStatus.proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25562) ReplicationSourceWALReader log and handle exception immediately without retrying

2021-03-25 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25562.
-
Fix Version/s: 2.4.3
   2.3.5
   3.0.0-alpha-1
   Resolution: Fixed

> ReplicationSourceWALReader log and handle exception immediately without 
> retrying
> 
>
> Key: HBASE-25562
> URL: https://issues.apache.org/jira/browse/HBASE-25562
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.5, 2.4.3
>
>
> In [this piece of code about retrying in 
> ReplicationSourceWALReader#run|https://github.com/apache/hbase/blob/0353909bc268e3ff3def098963d021e973f1f153/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L151],
>  sleep time increases with the number of retries, if an exception happens 
> that cannot be recovered by itself, error logs will appear after 12 hours 
> (300 retries by default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25590) Bulkload replication HFileRefs cannot be cleared in some cases where set exclude-namespace/exclude-table-cfs

2021-03-25 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308406#comment-17308406
 ] 

Sun Xin commented on HBASE-25590:
-

{quote}is there anything needs to be done for this jira? Can it be resolved?
{quote}
Thanks [~huaxiangsun] for noticing this, I haven't close this jira yet as the 
PR backporting to branch-2.2 still need review.

> Bulkload replication HFileRefs cannot be cleared in some cases where set 
> exclude-namespace/exclude-table-cfs
> 
>
> Key: HBASE-25590
> URL: https://issues.apache.org/jira/browse/HBASE-25590
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.5
>
>
> In 
> [ReplicationSource#addHFileRefs|https://github.com/apache/hbase/blob/ed90a14995acd87111d2b9849f07d84418ca43d4/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L264],
>  we may add unwanted hfiles to the _HFileRefs_ if a peer is set 
> _replicate_all_ true and set _exclude-namespace/exclude-table-cfs_.
> These unwanted _HFileRefs_ will not be replicated to remote cluster and not 
> be cleared.
> Two problems are caused by this bug:
>  # The metric sizeOfHFileRefsQueue cannot be zeroed.
>  # Referenced HFiles cannot be deleted by _ReplicationHFileCleaner._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25683) Simplify UTs using DummyServer

2021-03-19 Thread Sun Xin (Jira)
Sun Xin created HBASE-25683:
---

 Summary: Simplify UTs using DummyServer
 Key: HBASE-25683
 URL: https://issues.apache.org/jira/browse/HBASE-25683
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25638) The master local region is constantly major compact

2021-03-05 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25638.
-
Resolution: Not A Problem

> The master local region is constantly major compact
> ---
>
> Key: HBASE-25638
> URL: https://issues.apache.org/jira/browse/HBASE-25638
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>
> In 
> [MasterRegionFlusherAndCompactor.compact|https://github.com/apache/hbase/blob/830d2895b27fa0cf39a28d3af9673a4126ea8258/hbase-server/src/main/java/org/apache/hadoop/hbase/master/region/MasterRegionFlusherAndCompactor.java#L164],
>  we call region.compact(true) constantly like recursion. This caused a lot of 
> logs to be flushed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25638) The master local region is constantly major compact

2021-03-05 Thread Sun Xin (Jira)
Sun Xin created HBASE-25638:
---

 Summary: The master local region is constantly major compact
 Key: HBASE-25638
 URL: https://issues.apache.org/jira/browse/HBASE-25638
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.1, 2.3.4, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


In 
[MasterRegionFlusherAndCompactor.compact|https://github.com/apache/hbase/blob/830d2895b27fa0cf39a28d3af9673a4126ea8258/hbase-server/src/main/java/org/apache/hadoop/hbase/master/region/MasterRegionFlusherAndCompactor.java#L164],
 we call region.compact(true) constantly like recursion. This caused a lot of 
logs to be flushed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24737) Find a way to resolve WALFileLengthProvider#getLogFileSizeIfBeingWritten problem

2021-03-03 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin reassigned HBASE-24737:
---

Assignee: Sun Xin

> Find a way to resolve WALFileLengthProvider#getLogFileSizeIfBeingWritten 
> problem
> 
>
> Key: HBASE-24737
> URL: https://issues.apache.org/jira/browse/HBASE-24737
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Sun Xin
>Priority: Major
>
> Now we use WALFileLengthProvider#getLogFileSizeIfBeingWritten to get the 
> synced wal length and prevent replicating unacked log entries. But after 
> offload ReplicationSource to new ReplicationServer, we need a new way to 
> resolve this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25598) TestFromClientSide5.testScanMetrics is flaky

2021-02-23 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25598.
-
Fix Version/s: 2.4.2
   2.3.5
   2.2.7
   3.0.0-alpha-1
   Resolution: Fixed

Thanks [~zhangduo] for reviewing.

Merged to master and all active branch-2.x.

> TestFromClientSide5.testScanMetrics is flaky
> 
>
> Key: HBASE-25598
> URL: https://issues.apache.org/jira/browse/HBASE-25598
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.5, 2.4.2
>
>
> In some PRs, I got the following errors in UT results.
> {code:java}
> [ERROR] Errors: 
> [ERROR] org.apache.hadoop.hbase.client.TestFromClientSide5.testScanMetrics[0]
> [ERROR]   Run 1: TestFromClientSide5.testScanMetrics:1018 Did not count the 
> result bytes expected:<60> but was:<120>
> [ERROR]   Run 2: TestFromClientSide5.testScanMetrics:1036 Did not count the 
> result bytes expected:<60> but was:<180>
> [ERROR]   Run 3: TestFromClientSide5.testScanMetrics:951 » 
> MasterRegistryFetch Exception making...
> [INFO] 
> [ERROR] 
> org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor5.testScanMetrics[1]
> [ERROR]   Run 1: 
> TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:1036 
> Did not count the result bytes expected:<60> but was:<120>
> [ERROR]   Run 2: 
> TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » 
> IO
> [ERROR]   Run 3: 
> TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » 
> IO
> [INFO] 
> {code}
> I read the code further and found that this UT is flaky.
> {code:java}
> // check byte counters
> scan2 = new Scan();
> scan2.setScanMetricsEnabled(true);
> scan2.setCaching(1);
> try (ResultScanner scanner = ht.getScanner(scan2)) {
>   int numBytes = 0;
>   for (Result result : scanner.next(1)) {
> for (Cell cell : result.listCells()) {
>   numBytes += PrivateCellUtil.estimatedSerializedSizeOf(cell);
> }
>   }
>   scanner.close();
>   ScanMetrics scanMetrics = scanner.getScanMetrics();
>   assertEquals("Did not count the result bytes", numBytes,
>   scanMetrics.countOfBytesInResults.get());
> }
> {code}
> In the code above, it is to check scanMetrics.countOfBytesInResults, but just 
> get only ONE row by scanner.next(1) . A total of 3 rows are inserted into the 
> table, and scanner prefetch from server in advance until maxCacheSize is 
> exceeded, see 
> [here|https://github.com/apache/hbase/blob/5fa15cfde3d77e77ffb1f09d60dce4db264f3831/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableResultScanner.java#L94].
> So if scanner prefetch more than one row before closing scanner, the UT 
> fails. we can reproduce this problem steadily by sleeping before 
> scanner.close().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25110) Add heartbeat for ReplicationServer and dispatch replication sources to ReplicationServer

2021-02-23 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin reassigned HBASE-25110:
---

Assignee: Sun Xin  (was: Guanghao Zhang)

> Add heartbeat for ReplicationServer and dispatch replication sources to 
> ReplicationServer
> -
>
> Key: HBASE-25110
> URL: https://issues.apache.org/jira/browse/HBASE-25110
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Sun Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25598) TestFromClientSide5.testScanMetrics is flaky

2021-02-23 Thread Sun Xin (Jira)
Sun Xin created HBASE-25598:
---

 Summary: TestFromClientSide5.testScanMetrics is flaky
 Key: HBASE-25598
 URL: https://issues.apache.org/jira/browse/HBASE-25598
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.1, 2.3.4, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


In some PRs, I got the following errors in UT results.
{code:java}
[ERROR] Errors: 
[ERROR] org.apache.hadoop.hbase.client.TestFromClientSide5.testScanMetrics[0]
[ERROR]   Run 1: TestFromClientSide5.testScanMetrics:1018 Did not count the 
result bytes expected:<60> but was:<120>
[ERROR]   Run 2: TestFromClientSide5.testScanMetrics:1036 Did not count the 
result bytes expected:<60> but was:<180>
[ERROR]   Run 3: TestFromClientSide5.testScanMetrics:951 » MasterRegistryFetch 
Exception making...
[INFO] 
[ERROR] 
org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor5.testScanMetrics[1]
[ERROR]   Run 1: 
TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:1036 Did 
not count the result bytes expected:<60> but was:<120>
[ERROR]   Run 2: 
TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » IO
[ERROR]   Run 3: 
TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » IO
[INFO] 
{code}
I read the code further and found that this UT is flaky.
{code:java}
// check byte counters
scan2 = new Scan();
scan2.setScanMetricsEnabled(true);
scan2.setCaching(1);
try (ResultScanner scanner = ht.getScanner(scan2)) {
  int numBytes = 0;
  for (Result result : scanner.next(1)) {
for (Cell cell : result.listCells()) {
  numBytes += PrivateCellUtil.estimatedSerializedSizeOf(cell);
}
  }
  scanner.close();
  ScanMetrics scanMetrics = scanner.getScanMetrics();
  assertEquals("Did not count the result bytes", numBytes,
  scanMetrics.countOfBytesInResults.get());
}
{code}
In the code above, it is to check scanMetrics.countOfBytesInResults, but just 
get only ONE row by scanner.next(1) . A total of 3 rows are inserted into the 
table, and scanner prefetch from server in advance until maxCacheSize is 
exceeded, see 
[here|https://github.com/apache/hbase/blob/5fa15cfde3d77e77ffb1f09d60dce4db264f3831/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableResultScanner.java#L94].

So if scanner prefetch more than one row before closing scanner, the UT fails. 
we can reproduce this problem steadily by sleeping before scanner.close().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25590) Bulkload replication HFileRefs cannot be cleared in some cases where set exclude-namespace/exclude-table-cfs

2021-02-20 Thread Sun Xin (Jira)
Sun Xin created HBASE-25590:
---

 Summary: Bulkload replication HFileRefs cannot be cleared in some 
cases where set exclude-namespace/exclude-table-cfs
 Key: HBASE-25590
 URL: https://issues.apache.org/jira/browse/HBASE-25590
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.4.1, 2.3.4, 2.2.6, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


In 
[ReplicationSource#addHFileRefs|https://github.com/apache/hbase/blob/ed90a14995acd87111d2b9849f07d84418ca43d4/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L264],
 we may add unwanted hfiles to the _HFileRefs_ if a peer is set _replicate_all_ 
true and set _exclude-namespace/exclude-table-cfs_.

These unwanted _HFileRefs_ will not be replicated to remote cluster and not be 
cleared.

Two problems are caused by this bug:
 # The metric sizeOfHFileRefsQueue cannot be zeroed.
 # Referenced HFiles cannot be deleted by _ReplicationHFileCleaner._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25562) ReplicationSourceWALReader log and handle exception immediately without retrying

2021-02-19 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17287473#comment-17287473
 ] 

Sun Xin commented on HBASE-25562:
-

Thanks all for reviewing.

Merged to master and branch-2.4.

> ReplicationSourceWALReader log and handle exception immediately without 
> retrying
> 
>
> Key: HBASE-25562
> URL: https://issues.apache.org/jira/browse/HBASE-25562
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>
> In [this piece of code about retrying in 
> ReplicationSourceWALReader#run|https://github.com/apache/hbase/blob/0353909bc268e3ff3def098963d021e973f1f153/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L151],
>  sleep time increases with the number of retries, if an exception happens 
> that cannot be recovered by itself, error logs will appear after 12 hours 
> (300 retries by default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-25559) Terminate threads of oldsources while RS is closing

2021-02-09 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281654#comment-17281654
 ] 

Sun Xin edited comment on HBASE-25559 at 2/9/21, 9:29 AM:
--

Merged to master and all active branch-2.x.

Thanks [~wchevreuil] [~vjasani] [~stack] for reviwing.


was (Author: ddupg):
Merged to master and all active branch-2.x.

> Terminate threads of oldsources while RS is closing
> ---
>
> Key: HBASE-25559
> URL: https://issues.apache.org/jira/browse/HBASE-25559
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.5, 2.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25559) Terminate threads of oldsources while RS is closing

2021-02-09 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25559.
-
Fix Version/s: 2.4.2
   2.3.5
   2.2.7
   3.0.0-alpha-1
   Resolution: Fixed

Merged to master and all active branch-2.x.

> Terminate threads of oldsources while RS is closing
> ---
>
> Key: HBASE-25559
> URL: https://issues.apache.org/jira/browse/HBASE-25559
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.5, 2.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25562) ReplicationSourceWALReader log and handle exception immediately without retrying

2021-02-08 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-25562:

Description: In [this piece of code about retrying in 
ReplicationSourceWALReader#run|https://github.com/apache/hbase/blob/0353909bc268e3ff3def098963d021e973f1f153/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L151],
 sleep time increases with the number of retries, if an exception happens that 
cannot be recovered by itself, error logs will appear after 12 hours (300 
retries by default).  (was: In this piece of code about retrying in 
ReplicationSourceWALReader#run, sleep time increases with the number of 
retries, if an exception happens that cannot be recovered by itself, error logs 
will appear after 12 hours (300 retries by default).)

> ReplicationSourceWALReader log and handle exception immediately without 
> retrying
> 
>
> Key: HBASE-25562
> URL: https://issues.apache.org/jira/browse/HBASE-25562
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>
> In [this piece of code about retrying in 
> ReplicationSourceWALReader#run|https://github.com/apache/hbase/blob/0353909bc268e3ff3def098963d021e973f1f153/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L151],
>  sleep time increases with the number of retries, if an exception happens 
> that cannot be recovered by itself, error logs will appear after 12 hours 
> (300 retries by default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25562) ReplicationSourceWALReader log and handle exception immediately without retrying

2021-02-08 Thread Sun Xin (Jira)
Sun Xin created HBASE-25562:
---

 Summary: ReplicationSourceWALReader log and handle exception 
immediately without retrying
 Key: HBASE-25562
 URL: https://issues.apache.org/jira/browse/HBASE-25562
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.4.1, 2.3.4, 2.2.6, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


In this piece of code about retrying in ReplicationSourceWALReader#run, sleep 
time increases with the number of retries, if an exception happens that cannot 
be recovered by itself, error logs will appear after 12 hours (300 retries by 
default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25560) Remove unused parameter named peerId in the constructor method of CatalogReplicationSourcePeer

2021-02-08 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-25560:

Affects Version/s: 2.4.2
   3.0.0-alpha-1

> Remove unused parameter named peerId in the constructor method of 
> CatalogReplicationSourcePeer
> --
>
> Key: HBASE-25560
> URL: https://issues.apache.org/jira/browse/HBASE-25560
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.4.2
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25560) Remove unused parameter named peerId in the constructor method of CatalogReplicationSourcePeer

2021-02-08 Thread Sun Xin (Jira)
Sun Xin created HBASE-25560:
---

 Summary: Remove unused parameter named peerId in the constructor 
method of CatalogReplicationSourcePeer
 Key: HBASE-25560
 URL: https://issues.apache.org/jira/browse/HBASE-25560
 Project: HBase
  Issue Type: Bug
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25559) Terminate threads of oldsources while RS is closing

2021-02-08 Thread Sun Xin (Jira)
Sun Xin created HBASE-25559:
---

 Summary: Terminate threads of oldsources while RS is closing
 Key: HBASE-25559
 URL: https://issues.apache.org/jira/browse/HBASE-25559
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.1, 2.3.4, 2.2.6, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25553) It is better for ReplicationTracker.getListOfRegionServers to return ServerName instead of String

2021-02-07 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25553.
-
Resolution: Fixed

> It is better for ReplicationTracker.getListOfRegionServers to return 
> ServerName instead of String
> -
>
> Key: HBASE-25553
> URL: https://issues.apache.org/jira/browse/HBASE-25553
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.3.5, 2.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25553) It is better for ReplicationTracker.getListOfRegionServers to return ServerName instead of String

2021-02-07 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280469#comment-17280469
 ] 

Sun Xin commented on HBASE-25553:
-

Merged to master and all active branch-2.x.

Thanks [~wchevreuil] [~vjasani] for reviewing.

> It is better for ReplicationTracker.getListOfRegionServers to return 
> ServerName instead of String
> -
>
> Key: HBASE-25553
> URL: https://issues.apache.org/jira/browse/HBASE-25553
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.3.5, 2.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25553) It is better for ReplicationTracker.getListOfRegionServers to return ServerName instead of String

2021-02-07 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-25553:

Fix Version/s: 2.4.2
   2.3.5
   2.5.0
   2.2.7
   3.0.0-alpha-1

> It is better for ReplicationTracker.getListOfRegionServers to return 
> ServerName instead of String
> -
>
> Key: HBASE-25553
> URL: https://issues.apache.org/jira/browse/HBASE-25553
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.3.5, 2.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25553) It is better for ReplicationTracker.getListOfRegionServers to return ServerName instead of String

2021-02-05 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-25553:

Issue Type: Umbrella  (was: Bug)

> It is better for ReplicationTracker.getListOfRegionServers to return 
> ServerName instead of String
> -
>
> Key: HBASE-25553
> URL: https://issues.apache.org/jira/browse/HBASE-25553
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25553) It is better for ReplicationTracker.getListOfRegionServers to return ServerName instead of String

2021-02-04 Thread Sun Xin (Jira)
Sun Xin created HBASE-25553:
---

 Summary: It is better for 
ReplicationTracker.getListOfRegionServers to return ServerName instead of String
 Key: HBASE-25553
 URL: https://issues.apache.org/jira/browse/HBASE-25553
 Project: HBase
  Issue Type: Bug
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25309) Support start/stop replication server by scripts

2020-11-19 Thread Sun Xin (Jira)
Sun Xin created HBASE-25309:
---

 Summary: Support start/stop replication server by scripts
 Key: HBASE-25309
 URL: https://issues.apache.org/jira/browse/HBASE-25309
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25305) Add master UI to show ReplicationServer

2020-11-18 Thread Sun Xin (Jira)
Sun Xin created HBASE-25305:
---

 Summary: Add master UI to show ReplicationServer
 Key: HBASE-25305
 URL: https://issues.apache.org/jira/browse/HBASE-25305
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25289) [testing] Clean up resources after tests in rsgroup_shell_test.rb

2020-11-17 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234290#comment-17234290
 ] 

Sun Xin commented on HBASE-25289:
-

{quote}But there are many conflict when cherry-pick to branch-2. [~Ddupg] Can 
you help to submit a new PR for branch-2/branch-2.3/branch-2.2? 
{quote}
Thank [~zghao] for reviewing. I've submited a new PR for branch-2.

> [testing] Clean up resources after tests in rsgroup_shell_test.rb
> -
>
> Key: HBASE-25289
> URL: https://issues.apache.org/jira/browse/HBASE-25289
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup, test
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> In rsgroup_shell_test.rb, some tests don't remove rsgroups and drop tables, 
> messing up adding new tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25300) 'Unknown table hbase:quota' happens when desc table in shell if quota disabled

2020-11-17 Thread Sun Xin (Jira)
Sun Xin created HBASE-25300:
---

 Summary: 'Unknown table hbase:quota' happens when desc table in 
shell if quota disabled
 Key: HBASE-25300
 URL: https://issues.apache.org/jira/browse/HBASE-25300
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25289) [testing] Clean up resources after tests in rsgroup_shell_test.rb

2020-11-16 Thread Sun Xin (Jira)
Sun Xin created HBASE-25289:
---

 Summary: [testing] Clean up resources after tests in 
rsgroup_shell_test.rb
 Key: HBASE-25289
 URL: https://issues.apache.org/jira/browse/HBASE-25289
 Project: HBase
  Issue Type: Improvement
  Components: rsgroup, test
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In rsgroup_shell_test.rb, some tests don't remove rsgroups and drop tables, 
messing up adding new tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25171) Remove ZNodePaths.namespaceZNode

2020-10-10 Thread Sun Xin (Jira)
Sun Xin created HBASE-25171:
---

 Summary: Remove ZNodePaths.namespaceZNode
 Key: HBASE-25171
 URL: https://issues.apache.org/jira/browse/HBASE-25171
 Project: HBase
  Issue Type: Improvement
  Components: Zookeeper
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In HBASE-21154, had removed the dependency on  ZNodePaths.namespaceZNode, so 
remove this field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24813) ReplicationSource should clear buffer usage on ReplicationSourceManager upon termination

2020-10-09 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210572#comment-17210572
 ] 

Sun Xin edited comment on HBASE-24813 at 10/9/20, 8:19 AM:
---

Please take a look at HBASE-25117, that may fix this problem?

 


was (Author: ddupg):
Using isActive() instead of isAlive() in this 
[PR|https://github.com/apache/hbase/pull/2191/files#], that may work?

!image-2020-10-09-10-50-00-372.png!

> ReplicationSource should clear buffer usage on ReplicationSourceManager upon 
> termination
> 
>
> Key: HBASE-24813
> URL: https://issues.apache.org/jira/browse/HBASE-24813
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.3.1, 2.2.6
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7
>
> Attachments: TestReplicationSyncUpTool.log, 
> image-2020-10-09-10-50-00-372.png
>
>
> Following investigations on the issue described by [~elserj] on HBASE-24779, 
> we found out that once a peer is removed, thus killing peers related 
> *ReplicationSource* instance, it may leave 
> *ReplicationSourceManager.totalBufferUsed* inconsistent. This can happen if 
> *ReplicationSourceWALReader* had put some entries on its queue to be 
> processed by *ReplicationSourceShipper,* but the peer removal killed the 
> shipper before it could process the pending entries. When 
> *ReplicationSourceWALReader* thread add entries to the queue, it increments 
> *ReplicationSourceManager.totalBufferUsed* with the sum of the entries sizes. 
> When those entries are read by *ReplicationSourceShipper,* 
> *ReplicationSourceManager.totalBufferUsed* is then decreased. We should also 
> decrease *ReplicationSourceManager.totalBufferUsed* when *ReplicationSource* 
> is terminated, otherwise those unprocessed entries size would be consuming 
> *ReplicationSourceManager.totalBufferUsed __*indefinitely, unless the RS gets 
> restarted. This may be a problem for deployments with multiple peers, or if 
> new peers are added.**



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25117) ReplicationSourceShipper thread can not be finished

2020-10-09 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-25117:

Description: 
See [Flaky 
Tests|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/master/161/console],
 some UTs about replication failed cause timeout.

In 
[HBaseInterClusterReplicationEndpoint.sleepForRetries|https://github.com/apache/hbase/blob/78ae1f176d4215dcc34067ed25d786a4fcd4d888/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L203],
 InterruptedException is caught but no further processing, the interrupted 
status of the current thread is cleared.

Below is the code comment of Thread.sleep.
{code:java}
/**
 * ...
 *
 * @throws  InterruptedException
 *  if any thread has interrupted the current thread. The
 *  interrupted status of the current thread is
 *  cleared when this exception is thrown.
 */
public static native void sleep(long millis) throws InterruptedException;
{code}
So InterruptedException must be processed, otherwise ReplicationSourceShipper 
thread cannot be terminated in some cases.

  was:
See [Flaky 
Tests|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/master/161/console],
 some UTs about replication failed cause timeout.

 

 


> ReplicationSourceShipper thread can not be finished
> ---
>
> Key: HBASE-25117
> URL: https://issues.apache.org/jira/browse/HBASE-25117
> Project: HBase
>  Issue Type: Bug
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>
> See [Flaky 
> Tests|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/master/161/console],
>  some UTs about replication failed cause timeout.
> In 
> [HBaseInterClusterReplicationEndpoint.sleepForRetries|https://github.com/apache/hbase/blob/78ae1f176d4215dcc34067ed25d786a4fcd4d888/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L203],
>  InterruptedException is caught but no further processing, the interrupted 
> status of the current thread is cleared.
> Below is the code comment of Thread.sleep.
> {code:java}
> /**
>  * ...
>  *
>  * @throws  InterruptedException
>  *  if any thread has interrupted the current thread. The
>  *  interrupted status of the current thread is
>  *  cleared when this exception is thrown.
>  */
> public static native void sleep(long millis) throws InterruptedException;
> {code}
> So InterruptedException must be processed, otherwise ReplicationSourceShipper 
> thread cannot be terminated in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25117) ReplicationSourceShipper thread can not be finished

2020-10-09 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-25117:

Description: 
See [Flaky 
Tests|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/master/161/console],
 some UTs about replication failed cause timeout.

 

 

  was:See [Flaky 
Tests|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/master/161/console],
 some UTs about replication failed cause timeout.


> ReplicationSourceShipper thread can not be finished
> ---
>
> Key: HBASE-25117
> URL: https://issues.apache.org/jira/browse/HBASE-25117
> Project: HBase
>  Issue Type: Bug
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>
> See [Flaky 
> Tests|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/master/161/console],
>  some UTs about replication failed cause timeout.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24813) ReplicationSource should clear buffer usage on ReplicationSourceManager upon termination

2020-10-08 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-24813:

Attachment: image-2020-10-09-10-50-00-372.png

> ReplicationSource should clear buffer usage on ReplicationSourceManager upon 
> termination
> 
>
> Key: HBASE-24813
> URL: https://issues.apache.org/jira/browse/HBASE-24813
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.3.1, 2.2.6
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7
>
> Attachments: TestReplicationSyncUpTool.log, 
> image-2020-10-09-10-50-00-372.png
>
>
> Following investigations on the issue described by [~elserj] on HBASE-24779, 
> we found out that once a peer is removed, thus killing peers related 
> *ReplicationSource* instance, it may leave 
> *ReplicationSourceManager.totalBufferUsed* inconsistent. This can happen if 
> *ReplicationSourceWALReader* had put some entries on its queue to be 
> processed by *ReplicationSourceShipper,* but the peer removal killed the 
> shipper before it could process the pending entries. When 
> *ReplicationSourceWALReader* thread add entries to the queue, it increments 
> *ReplicationSourceManager.totalBufferUsed* with the sum of the entries sizes. 
> When those entries are read by *ReplicationSourceShipper,* 
> *ReplicationSourceManager.totalBufferUsed* is then decreased. We should also 
> decrease *ReplicationSourceManager.totalBufferUsed* when *ReplicationSource* 
> is terminated, otherwise those unprocessed entries size would be consuming 
> *ReplicationSourceManager.totalBufferUsed __*indefinitely, unless the RS gets 
> restarted. This may be a problem for deployments with multiple peers, or if 
> new peers are added.**



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24813) ReplicationSource should clear buffer usage on ReplicationSourceManager upon termination

2020-10-08 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210572#comment-17210572
 ] 

Sun Xin commented on HBASE-24813:
-

Using isActive() instead of isAlive() in this 
[PR|https://github.com/apache/hbase/pull/2191/files#], that may work?

!image-2020-10-09-10-50-00-372.png!

> ReplicationSource should clear buffer usage on ReplicationSourceManager upon 
> termination
> 
>
> Key: HBASE-24813
> URL: https://issues.apache.org/jira/browse/HBASE-24813
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.3.1, 2.2.6
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7
>
> Attachments: TestReplicationSyncUpTool.log, 
> image-2020-10-09-10-50-00-372.png
>
>
> Following investigations on the issue described by [~elserj] on HBASE-24779, 
> we found out that once a peer is removed, thus killing peers related 
> *ReplicationSource* instance, it may leave 
> *ReplicationSourceManager.totalBufferUsed* inconsistent. This can happen if 
> *ReplicationSourceWALReader* had put some entries on its queue to be 
> processed by *ReplicationSourceShipper,* but the peer removal killed the 
> shipper before it could process the pending entries. When 
> *ReplicationSourceWALReader* thread add entries to the queue, it increments 
> *ReplicationSourceManager.totalBufferUsed* with the sum of the entries sizes. 
> When those entries are read by *ReplicationSourceShipper,* 
> *ReplicationSourceManager.totalBufferUsed* is then decreased. We should also 
> decrease *ReplicationSourceManager.totalBufferUsed* when *ReplicationSource* 
> is terminated, otherwise those unprocessed entries size would be consuming 
> *ReplicationSourceManager.totalBufferUsed __*indefinitely, unless the RS gets 
> restarted. This may be a problem for deployments with multiple peers, or if 
> new peers are added.**



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25117) ReplicationSourceShipper thread can not be finished

2020-09-29 Thread Sun Xin (Jira)
Sun Xin created HBASE-25117:
---

 Summary: ReplicationSourceShipper thread can not be finished
 Key: HBASE-25117
 URL: https://issues.apache.org/jira/browse/HBASE-25117
 Project: HBase
  Issue Type: Bug
Reporter: Sun Xin
Assignee: Sun Xin


See [Flaky 
Tests|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/master/161/console],
 some UTs about replication failed cause timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25113) [testing] HBaseCluster support ReplicationServer for UTs

2020-09-28 Thread Sun Xin (Jira)
Sun Xin created HBASE-25113:
---

 Summary: [testing] HBaseCluster support ReplicationServer for UTs
 Key: HBASE-25113
 URL: https://issues.apache.org/jira/browse/HBASE-25113
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25100) conf and conn are assigned twice in HBaseReplicationEndpoint and HBaseInterClusterReplicationEndpoint

2020-09-26 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-25100:

Description: 
In 
[HBaseReplicationEndpoint.init()|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java#L109]
 and  
[HBaseInterClusterReplicationEndpoint.init|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L145]
 , the latter is a sub-class of the former, conf and conn are assigned twice.

 

  was:
In 
[HBaseReplicationEndpoint.init()|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java#L109]
 and  
[HBaseInterClusterReplicationEndpoint.init|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L145]
 , the latter is a sub-class of the former, conn is assigned twice.

 


> conf and conn are assigned twice in HBaseReplicationEndpoint and 
> HBaseInterClusterReplicationEndpoint
> -
>
> Key: HBASE-25100
> URL: https://issues.apache.org/jira/browse/HBASE-25100
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> In 
> [HBaseReplicationEndpoint.init()|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java#L109]
>  and  
> [HBaseInterClusterReplicationEndpoint.init|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L145]
>  , the latter is a sub-class of the former, conf and conn are assigned twice.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25100) conf and conn are assigned twice in HBaseReplicationEndpoint and HBaseInterClusterReplicationEndpoint

2020-09-26 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-25100:

Summary: conf and conn are assigned twice in HBaseReplicationEndpoint and 
HBaseInterClusterReplicationEndpoint  (was: conn is assigned twice in 
HBaseReplicationEndpoint and HBaseInterClusterReplicationEndpoint)

> conf and conn are assigned twice in HBaseReplicationEndpoint and 
> HBaseInterClusterReplicationEndpoint
> -
>
> Key: HBASE-25100
> URL: https://issues.apache.org/jira/browse/HBASE-25100
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> In 
> [HBaseReplicationEndpoint.init()|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java#L109]
>  and  
> [HBaseInterClusterReplicationEndpoint.init|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L145]
>  , the latter is a sub-class of the former, conn is assigned twice.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25100) conn is assigned twice in HBaseReplicationEndpoint and HBaseInterClusterReplicationEndpoint

2020-09-26 Thread Sun Xin (Jira)
Sun Xin created HBASE-25100:
---

 Summary: conn is assigned twice in HBaseReplicationEndpoint and 
HBaseInterClusterReplicationEndpoint
 Key: HBASE-25100
 URL: https://issues.apache.org/jira/browse/HBASE-25100
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In 
[HBaseReplicationEndpoint.init()|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java#L109]
 and  
[HBaseInterClusterReplicationEndpoint.init|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L145]
 , the latter is a sub-class of the former, conn is assigned twice.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-25098) ReplicationStatisticsChore runs in wrong time unit

2020-09-25 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-25098 started by Sun Xin.
---
> ReplicationStatisticsChore runs in wrong time unit
> --
>
> Key: HBASE-25098
> URL: https://issues.apache.org/jira/browse/HBASE-25098
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25098) ReplicationStatisticsChore runs in wrong time unit

2020-09-25 Thread Sun Xin (Jira)
Sun Xin created HBASE-25098:
---

 Summary: ReplicationStatisticsChore runs in wrong time unit
 Key: HBASE-25098
 URL: https://issues.apache.org/jira/browse/HBASE-25098
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-25014) ScheduledChore is never triggered when initalDelay > 1.5*period

2020-09-11 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-25014 started by Sun Xin.
---
> ScheduledChore is never triggered when initalDelay > 1.5*period
> ---
>
> Key: HBASE-25014
> URL: https://issues.apache.org/jira/browse/HBASE-25014
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.3, 2.2.4, 2.2.5
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> In our recent tests, ScheduledChore is never triggered when initalDelay > 
> 1.5*period.
> The cause of the bug is the following:
> The trigger time for a ScheduleChore must be within an acceptable time window 
> that is 1.5 * period. see 
> [here|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L234]
> timeOfLastRun and timeOfThisRun are two variables that record two adjacent 
> trigger time. [The first initialization of 
> timeOfThisRun|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L273]
>  is when the ScheduleChore is created, it's not a real trigger time.
> If we set initialDelay > 1.5 period , after initialDelay, the first time when 
> chore is triggered has exceeded the allowed window. Then [cancel the chore 
> and schedule it 
> again|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ChoreService.java#L176].
> So it's stuck in loop when initialDelay > 1.5 period :
> 1.  init timeOfThisRun at a wrong time.
> 2. wait initalDelay
> 3. chore trigger, but exceeded the allowed window.
> 4. cancel chore and schedule it again
> 5. go step 1.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25014) ScheduledChore is never triggered when initalDelay > 1.5*period

2020-09-11 Thread Sun Xin (Jira)
Sun Xin created HBASE-25014:
---

 Summary: ScheduledChore is never triggered when initalDelay > 
1.5*period
 Key: HBASE-25014
 URL: https://issues.apache.org/jira/browse/HBASE-25014
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.2.5, 2.2.4, 2.2.3, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In our recent tests, ScheduledChore is never triggered when initalDelay > 
1.5*period.

The cause of the bug is the following:

The trigger time for a ScheduleChore must be within an acceptable time window 
that is 1.5 * period. see 
[here|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L234]

timeOfLastRun and timeOfThisRun are two variables that record two adjacent 
trigger time. [The first initialization of 
timeOfThisRun|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L273]
 is when the ScheduleChore is created, it's not a real trigger time.

If we set initialDelay > 1.5 period , after initialDelay, the first time when 
chore is triggered has exceeded the allowed window. Then [cancel the chore and 
schedule it 
again|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ChoreService.java#L176].

So it's stuck in loop when initialDelay > 1.5 period :

1.  init timeOfThisRun at a wrong time.

2. wait initalDelay

3. chore trigger, but exceeded the allowed window.

4. cancel chore and schedule it again

5. go step 1.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25012) HBASE-24359 causes replication missed log of some RemoteException

2020-09-11 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-25012:

Description: 
HBASE-24359 broken the logic of handling exception. In branch2, it even causes 
some RemoteException log missed.

[File 
changed|[https://github.com/apache/hbase/pull/1855/files#diff-1e3f171b19474698601a0752b618af0eL435]]
 in branch2.

!image-2020-09-11-14-30-27-898.png!

  was:[HBASE-24359|https://issues.apache.org/jira/browse/HBASE-24359] broken 
the logic of handling exception. In branch2, it even causes some 
RemoteException log missed.


> HBASE-24359 causes replication missed log of some RemoteException
> -
>
> Key: HBASE-25012
> URL: https://issues.apache.org/jira/browse/HBASE-25012
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.3.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
> Attachments: image-2020-09-11-14-30-27-898.png
>
>
> HBASE-24359 broken the logic of handling exception. In branch2, it even 
> causes some RemoteException log missed.
> [File 
> changed|[https://github.com/apache/hbase/pull/1855/files#diff-1e3f171b19474698601a0752b618af0eL435]]
>  in branch2.
> !image-2020-09-11-14-30-27-898.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25012) HBASE-24359 causes replication missed log of some RemoteException

2020-09-11 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-25012:

Attachment: image-2020-09-11-14-30-27-898.png

> HBASE-24359 causes replication missed log of some RemoteException
> -
>
> Key: HBASE-25012
> URL: https://issues.apache.org/jira/browse/HBASE-25012
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.3.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
> Attachments: image-2020-09-11-14-30-27-898.png
>
>
> [HBASE-24359|https://issues.apache.org/jira/browse/HBASE-24359] broken the 
> logic of handling exception. In branch2, it even causes some RemoteException 
> log missed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-25012) HBASE-24359 causes replication missed log of some RemoteException

2020-09-11 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-25012 started by Sun Xin.
---
> HBASE-24359 causes replication missed log of some RemoteException
> -
>
> Key: HBASE-25012
> URL: https://issues.apache.org/jira/browse/HBASE-25012
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.3.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> [HBASE-24359|https://issues.apache.org/jira/browse/HBASE-24359] broken the 
> logic of handling exception. In branch2, it even causes some RemoteException 
> log missed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25012) HBASE-24359 causes replication missed log of some RemoteException

2020-09-11 Thread Sun Xin (Jira)
Sun Xin created HBASE-25012:
---

 Summary: HBASE-24359 causes replication missed log of some 
RemoteException
 Key: HBASE-25012
 URL: https://issues.apache.org/jira/browse/HBASE-25012
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.3.1, 2.3.0, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


[HBASE-24359|https://issues.apache.org/jira/browse/HBASE-24359] broken the 
logic of handling exception. In branch2, it even causes some RemoteException 
log missed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-24999) Master manages ReplicationServers

2020-09-08 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-24999 started by Sun Xin.
---
> Master manages ReplicationServers
> -
>
> Key: HBASE-24999
> URL: https://issues.apache.org/jira/browse/HBASE-24999
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>
> In [HBASE-24683|https://issues.apache.org/jira/browse/HBASE-24683] add an 
> isolated ReplicationServer.
> What this issue is to do: 
>  # ReplicationServer reports to Master periodically.
>  # Add a basic ReplicationServerManager in Master to manage ReplicationServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24999) Master manages ReplicationServers

2020-09-08 Thread Sun Xin (Jira)
Sun Xin created HBASE-24999:
---

 Summary: Master manages ReplicationServers
 Key: HBASE-24999
 URL: https://issues.apache.org/jira/browse/HBASE-24999
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


In [HBASE-24683|https://issues.apache.org/jira/browse/HBASE-24683] add an 
isolated ReplicationServer.

What this issue is to do: 
 # ReplicationServer reports to Master periodically.
 # Add a basic ReplicationServerManager in Master to manage ReplicationServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24982) Disassemble the method replicateWALEntry from AdminService to a new interface ReplicationServerService

2020-09-07 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-24982:

Summary: Disassemble the method replicateWALEntry from AdminService to a 
new interface ReplicationServerService  (was: Disassemble the method 
replicateWALEntry from AdminService to a new interface ReplicationSinkService)

> Disassemble the method replicateWALEntry from AdminService to a new interface 
> ReplicationServerService
> --
>
> Key: HBASE-24982
> URL: https://issues.apache.org/jira/browse/HBASE-24982
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24982) Disassemble the method replicateWALEntry from AdminService to a new interface ReplicationSinkService

2020-09-04 Thread Sun Xin (Jira)
Sun Xin created HBASE-24982:
---

 Summary: Disassemble the method replicateWALEntry from 
AdminService to a new interface ReplicationSinkService
 Key: HBASE-24982
 URL: https://issues.apache.org/jira/browse/HBASE-24982
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24683) Add a basic ReplicationServer which only implement ReplicationSink Service

2020-09-04 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-24683.
-
Resolution: Resolved

> Add a basic ReplicationServer which only implement ReplicationSink Service
> --
>
> Key: HBASE-24683
> URL: https://issues.apache.org/jira/browse/HBASE-24683
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Sun Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24759) Refuse to update configuration of default group

2020-09-04 Thread Sun Xin (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190576#comment-17190576
 ] 

Sun Xin commented on HBASE-24759:
-

Thanks [~zghao] for reviewing, I've opened a PR for branch-2.

> Refuse to update configuration of default group
> ---
>
> Key: HBASE-24759
> URL: https://issues.apache.org/jira/browse/HBASE-24759
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> In the current scenario, we didn't store the default rsgroup information. But 
> after HBASE-24431 , we have added a config map, which need to be persisted to 
> avoid lossing config of default rsgroup.
> So refuse to update configuration of default group



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24759) Refuse to update configuration of default group

2020-08-29 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-24759:

Summary: Refuse to update configuration of default group  (was: Persisting 
configuration of default rsgroup)

> Refuse to update configuration of default group
> ---
>
> Key: HBASE-24759
> URL: https://issues.apache.org/jira/browse/HBASE-24759
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> In the current scenario, we didn't store the default rsgroup information. But 
> after HBASE-24431 , we have added a config map, which need to be persisted to 
> avoid lossing config of default rsgroup.
> So refuse to update configuration of default group



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24759) Persisting configuration of default rsgroup

2020-08-29 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin updated HBASE-24759:

Description: 
In the current scenario, we didn't store the default rsgroup information. But 
after HBASE-24431 , we have added a config map, which need to be persisted to 
avoid lossing config of default rsgroup.

So refuse to update configuration of default group

  was:In the current scenario, we didn't store the default rsgroup information. 
But after HBASE-24431 , we have added a config map, which need to be persisted 
to avoid lossing config of default rsgroup.


> Persisting configuration of default rsgroup
> ---
>
> Key: HBASE-24759
> URL: https://issues.apache.org/jira/browse/HBASE-24759
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> In the current scenario, we didn't store the default rsgroup information. But 
> after HBASE-24431 , we have added a config map, which need to be persisted to 
> avoid lossing config of default rsgroup.
> So refuse to update configuration of default group



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >