[jira] [Created] (HBASE-27414) Search order for locations in HFileLink
Huaxiang Sun created HBASE-27414: Summary: Search order for locations in HFileLink Key: HBASE-27414 URL: https://issues.apache.org/jira/browse/HBASE-27414 Project: HBase Issue Type: Improvement Components: Performance Reporter: Huaxiang Sun Found that search order for locations is following the order of these locations added to HFileLink object. setLocations(originPath, tempPath, mobPath, archivePath); archivePath is the last one to be searched. For most cases, hfile exists in archivePath, so we can move archivePath to the first parameter to avoid unnecessary NN query. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27366) split or merge removed region under snapshot
Huaxiang Sun created HBASE-27366: Summary: split or merge removed region under snapshot Key: HBASE-27366 URL: https://issues.apache.org/jira/browse/HBASE-27366 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 2.4.10 Reporter: Huaxiang Sun We run into snapshot failures for one table with large number of regions. The event sequence is like the following: # Snapshot process lists all regions for one table. # Normalize kicks in to split some regions for the table under snapshot. # split finishes and major compaction finishes. The parent region is moved to archive. # When the Snapshot processes the parent region, it does not exist and snapshot fails. Since snapshot process acquires the table lock, but there is no table lock acquired in split or merge process, they crash into each other. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27345) Add 2.4.14 to the downloads page
[ https://issues.apache.org/jira/browse/HBASE-27345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-27345. -- Fix Version/s: 3.0.0-alpha-4 Assignee: Huaxiang Sun Resolution: Fixed > Add 2.4.14 to the downloads page > > > Key: HBASE-27345 > URL: https://issues.apache.org/jira/browse/HBASE-27345 > Project: HBase > Issue Type: Task > Components: documentation >Affects Versions: 2.4.14 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Minor > Fix For: 3.0.0-alpha-4 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27345) Add 2.4.14 to the downloads page
Huaxiang Sun created HBASE-27345: Summary: Add 2.4.14 to the downloads page Key: HBASE-27345 URL: https://issues.apache.org/jira/browse/HBASE-27345 Project: HBase Issue Type: Task Components: documentation Affects Versions: 2.4.14 Reporter: Huaxiang Sun -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27296) Some Cell's implementation of toString() such as IndividualBytesFieldCell prints out value and tags which is too verbose
[ https://issues.apache.org/jira/browse/HBASE-27296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-27296. -- Fix Version/s: 2.5.0 3.0.0-alpha-4 2.4.14 Resolution: Fixed > Some Cell's implementation of toString() such as IndividualBytesFieldCell > prints out value and tags which is too verbose > > > Key: HBASE-27296 > URL: https://issues.apache.org/jira/browse/HBASE-27296 > Project: HBase > Issue Type: Improvement > Components: logging >Affects Versions: 2.4.12 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Minor > Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14 > > > One of users sees cells >10Mb are logged when over limit at their client log. > Checked the code, toString() behavior is not consistent, mostly does not > include values and tags. Change toString() to exclude tags/value. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27296) IndividualBytesFieldCell#toString() prints out value and tags which is too verbose.
Huaxiang Sun created HBASE-27296: Summary: IndividualBytesFieldCell#toString() prints out value and tags which is too verbose. Key: HBASE-27296 URL: https://issues.apache.org/jira/browse/HBASE-27296 Project: HBase Issue Type: Improvement Components: logging Affects Versions: 2.4.12 Reporter: Huaxiang Sun Assignee: Huaxiang Sun One of users sees cells >10Mb are logged when over limit. Checked the code, toString() behavior is not consistent, mostly does not include values and tags. Change toString() to exclude tags/value. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27250) MasterRpcService#setRegionStateInMeta does not support replica region encodedNames or region names
Huaxiang Sun created HBASE-27250: Summary: MasterRpcService#setRegionStateInMeta does not support replica region encodedNames or region names Key: HBASE-27250 URL: https://issues.apache.org/jira/browse/HBASE-27250 Project: HBase Issue Type: Bug Affects Versions: 2.4.13 Reporter: Huaxiang Sun Assignee: Huaxiang Sun MasterRpcServices#setRegionStateInMeta does not support replica region names, it assumes the primary region only. This makes HBCK2's setRegionState for replica region fails. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27181) Replica region support in HBCK2 setRegionState option
Huaxiang Sun created HBASE-27181: Summary: Replica region support in HBCK2 setRegionState option Key: HBASE-27181 URL: https://issues.apache.org/jira/browse/HBASE-27181 Project: HBase Issue Type: Improvement Components: hbck2 Affects Versions: 2.4.13 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Replica region id is not recognized by hbck2's setRegionState as it does not show up in meta. We run into cases that it needs to set region state in meta for replica regions in order to fix inconsistency. We ended up writing the state manually into meta table and did a master failover to sync state from meta table. hbck2's setRegionState needs to support replica region id and handles it nicely. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27025) Change Hbase book's description for "74.7.3. Load Balancing META table load"
[ https://issues.apache.org/jira/browse/HBASE-27025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-27025. -- Fix Version/s: 3.0.0-alpha-4 Resolution: Fixed Merged into the master branch. > Change Hbase book's description for "74.7.3. Load Balancing META table load" > > > Key: HBASE-27025 > URL: https://issues.apache.org/jira/browse/HBASE-27025 > Project: HBase > Issue Type: Improvement > Components: documentation >Affects Versions: 2.4.12 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Minor > Fix For: 3.0.0-alpha-4 > > > HBASE-26618 involves primary meta region in meta scan. The description in > hbase book is inaccurate. Update it accordingly. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HBASE-26649) Support meta replica LoadBalance mode for RegionLocator#getAllRegionLocations()
[ https://issues.apache.org/jira/browse/HBASE-26649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-26649. -- Fix Version/s: 2.5.0 3.0.0-alpha-3 2.4.13 Release Note: When setting 'hbase.locator.meta.replicas.mode' to "LoadBalance" at HBase client, RegionLocator#getAllRegionLocations() now load balances across all Meta Replica Regions. Please note, results from non-primary meta replica regions may contain stale data. Resolution: Fixed > Support meta replica LoadBalance mode for > RegionLocator#getAllRegionLocations() > --- > > Key: HBASE-26649 > URL: https://issues.apache.org/jira/browse/HBASE-26649 > Project: HBase > Issue Type: Improvement > Components: meta replicas >Affects Versions: 2.4.9 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3, 2.4.13 > > > When HBase application restarts, its meta cache is empty. Normally, it will > fill the meta cache one region at a time by scanning the meta region. This > will cause huge pressure to the region server hosting meta during application > restart. > It can prefetching all region locations by calling > RegionLocator#getAllRegionLocations().Meta replica LoadBalance mode is > support in 2.4, it will be nice to load balance > RegionLocator#getAllRegionLocations() to all meta replica regions so batch > scan can spread across all meta replica regions. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-27087) TestQuotaThrottle times out in branch-2.5.
Huaxiang Sun created HBASE-27087: Summary: TestQuotaThrottle times out in branch-2.5. Key: HBASE-27087 URL: https://issues.apache.org/jira/browse/HBASE-27087 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.5.0 Reporter: Huaxiang Sun With branch-2.5, TestQuotaThrottle times out. Need to investigate. h3. Error Message Failed after attempts=7, exceptions: 2022-06-03T11:26:33.418Z, RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Reopened] (HBASE-26962) Add mob info in web UI
[ https://issues.apache.org/jira/browse/HBASE-26962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun reopened HBASE-26962: -- The commit caused branch-2 build failure. Can you fix the build error and resubmit a patch? Thanks. > Add mob info in web UI > -- > > Key: HBASE-26962 > URL: https://issues.apache.org/jira/browse/HBASE-26962 > Project: HBase > Issue Type: Improvement > Components: UI >Reporter: Xuesen Liang >Assignee: Xuesen Liang >Priority: Minor > Fix For: 2.6.0, 3.0.0-alpha-3 > > > Add mob store info in web UI. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-27025) Change Hbase book's description for "74.7.3. Load Balancing META table load"
Huaxiang Sun created HBASE-27025: Summary: Change Hbase book's description for "74.7.3. Load Balancing META table load" Key: HBASE-27025 URL: https://issues.apache.org/jira/browse/HBASE-27025 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 2.4.12 Reporter: Huaxiang Sun Assignee: Huaxiang Sun HBASE-26618 involves primary meta region in meta scan. The description in hbase book is inaccurate. Update it accordingly. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HBASE-26984) Chaos Monkey thread dies in ITBLL Chaos GracefulRollingRestartRsAction
[ https://issues.apache.org/jira/browse/HBASE-26984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-26984. -- Fix Version/s: 2.5.0 3.0.0-alpha-3 Resolution: Fixed > Chaos Monkey thread dies in ITBLL Chaos GracefulRollingRestartRsAction > --- > > Key: HBASE-26984 > URL: https://issues.apache.org/jira/browse/HBASE-26984 > Project: HBase > Issue Type: Bug > Components: integration tests >Affects Versions: 2.4.11 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3 > > > Run itbll chaos monkey in k8s cluster, found chaos monkey thread died in > GracefulRollingRestartRsAction. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-26984) Chaos Monkey thread dies in ITBLL Chaos GracefulRollingRestartRsAction
Huaxiang Sun created HBASE-26984: Summary: Chaos Monkey thread dies in ITBLL Chaos GracefulRollingRestartRsAction Key: HBASE-26984 URL: https://issues.apache.org/jira/browse/HBASE-26984 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 2.4.11 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Run itbll chaos monkey in k8s cluster, found chaos monkey thread died in GracefulRollingRestartRsAction. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HBASE-26618) Involving primary meta region in meta scan with CatalogReplicaLoadBalanceSimpleSelector
[ https://issues.apache.org/jira/browse/HBASE-26618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-26618. -- Fix Version/s: 2.5.0 3.0.0-alpha-3 2.4.12 Release Note: When META replica LoadBalance mode is enabled at client-side, clients will try to read from one META region first. If META location is from any non-primary META regions, in case of errors, it will fall back to the primary META region. Resolution: Fixed > Involving primary meta region in meta scan with > CatalogReplicaLoadBalanceSimpleSelector > --- > > Key: HBASE-26618 > URL: https://issues.apache.org/jira/browse/HBASE-26618 > Project: HBase > Issue Type: Improvement > Components: meta replicas >Affects Versions: 2.4.9 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Minor > Fix For: 2.5.0, 3.0.0-alpha-3, 2.4.12 > > > In the current release with Meta replica LoadBalance mode, the primary meta > region is not serving the meta scan (only meta replica region serves the > read). When the result from meta replica region is stale, it will go to > primary meta region for up-to-date location. > From our experience, the primary meta region serves very less read traffic, > so it will be better to load balance read traffic across the primary meta > region as well. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26864) Region Server does not send Ack back to master after receiving an OpenRegionReq for open regions, causing OpenRegionProcedure stuck forever.
Huaxiang Sun created HBASE-26864: Summary: Region Server does not send Ack back to master after receiving an OpenRegionReq for open regions, causing OpenRegionProcedure stuck forever. Key: HBASE-26864 URL: https://issues.apache.org/jira/browse/HBASE-26864 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 2.4.10 Reporter: Huaxiang Sun Assignee: Huaxiang Sun For some upgrading cases, we found that master issues RegionOpen for an already open region and Region Sever simply logs {code:java} 2022-03-17 22:16:55,595 WARN org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler: Received OPEN for foo,b2875fcb-7bc0-4fa9-a980-e902faf7f151,1631771037620.def199cc7208615b783b285f582ddfa4. which is already online {code} and it does not ack or nack master. This OpenRegionProceduce is stuck forever. In this specific case, it needs to ack master that region is open. For the cause of why it sent an OpenRegion request for an already open region, it will be followed by another issue. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26649) Support meta replica LoadBalance mode for RegionLocator#getAllRegionLocations()
Huaxiang Sun created HBASE-26649: Summary: Support meta replica LoadBalance mode for RegionLocator#getAllRegionLocations() Key: HBASE-26649 URL: https://issues.apache.org/jira/browse/HBASE-26649 Project: HBase Issue Type: Improvement Components: meta replicas Affects Versions: 2.4.9 Reporter: Huaxiang Sun Assignee: Huaxiang Sun When HBase application restarts, its meta cache is empty. Normally, it will fill the meta cache one region at a time by scanning the meta region. This will cause huge pressure to the region server hosting meta during application restart. It can prefetching all region locations by calling RegionLocator#getAllRegionLocations().Meta replica LoadBalance mode is support in 2.4, it will be nice to load balance RegionLocator#getAllRegionLocations() to all meta replica regions so batch scan can spread across all meta replica regions. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-26590. -- Fix Version/s: 2.5.0 2.4.10 Resolution: Fixed Resolved it for now, will reopen if there is new finding. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.5.0, 2.4.10 > > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26618) Involving primary meta region in meta scan with Meta Replica Mode
Huaxiang Sun created HBASE-26618: Summary: Involving primary meta region in meta scan with Meta Replica Mode Key: HBASE-26618 URL: https://issues.apache.org/jira/browse/HBASE-26618 Project: HBase Issue Type: Improvement Components: meta replicas Affects Versions: 2.4.9 Reporter: Huaxiang Sun Assignee: Huaxiang Sun In the current release with Meta replica LoadBalance mode, the primary meta region is not serving the meta scan (only meta replica region serves the read). When the result from meta replica region is stale, it will go to primary meta region for up-to-date location. >From our experience, the primary meta region serves very less read traffic, so >it will be better to load balance read traffic across the primary meta region >as well. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
Huaxiang Sun created HBASE-26590: Summary: Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 Key: HBASE-26590 URL: https://issues.apache.org/jira/browse/HBASE-26590 Project: HBase Issue Type: Improvement Components: meta Affects Versions: 2.3.7, 3.0.0-alpha-1 Environment: ||Version ||Meta Replica Load Balance Enabled||Time || ||2.4.5-with-fixed||Yes||336458ms|| ||2.4.5-with-fixed||No||333253ms|| ||2.4.5||Yes||469980ms|| ||2.4.5||No||470515ms|| | *cdh-5.16.2*| *No* | *323412ms*| Reporter: Huaxiang Sun Assignee: Huaxiang Sun One of our users complained higher latency after application upgrades from hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load Balance mode during app restart. I reproduced the regression by a test for meta lookup. At my test cluster, there are 160k regions for the test table, so there are 160k entries in meta region. Used one thread to do 1 million meta lookup against the meta region server. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HBASE-26338) hbck2 setRegionState cannot set replica region state
[ https://issues.apache.org/jira/browse/HBASE-26338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-26338. -- Fix Version/s: hbase-operator-tools-1.2.0 Release Note: To set the replica region's state, it needs the primary region's encoded regionname and replica id, the command will be "setRegionState , ". Resolution: Fixed > hbck2 setRegionState cannot set replica region state > > > Key: HBASE-26338 > URL: https://issues.apache.org/jira/browse/HBASE-26338 > Project: HBase > Issue Type: Bug > Components: hbck2 >Affects Versions: hbase-operator-tools-1.1.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: hbase-operator-tools-1.2.0 > > > Currently, there is no way to use hbck2 setRegionState to set a replica > region's state, which makes hard to fix inconsistency related with replica > regions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26338) hbck2 setRegionState cannot set replica region state
Huaxiang Sun created HBASE-26338: Summary: hbck2 setRegionState cannot set replica region state Key: HBASE-26338 URL: https://issues.apache.org/jira/browse/HBASE-26338 Project: HBase Issue Type: Bug Components: hbck2 Affects Versions: hbase-operator-tools-1.1.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Currently, there is no way to use hbck2 setRegionState to set a replica region's state, which makes hard to fix inconsistency related with replica regions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-26255) Add an option to use region location from meta table in TableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HBASE-26255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-26255. -- Fix Version/s: 2.4.7 2.3.7 3.0.0-alpha-2 Resolution: Fixed > Add an option to use region location from meta table in > TableSnapshotInputFormat > > > Key: HBASE-26255 > URL: https://issues.apache.org/jira/browse/HBASE-26255 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.3.6 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 3.0.0-alpha-2, 2.3.7, 2.4.7 > > > TableSnapshotInputFormat currently calculates block locality of a region to > decide the best location to run the task. While this works for a small scale > table snapshot, we found that for a table snapshot with many regions, the > locality calculation takes too much time. > In the case of a table with high locality, we can use region location from > meta table to decide a snapshot region's location. Add an option to support > it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26272) TestTableMapReduceUntil failure in branch-2
Huaxiang Sun created HBASE-26272: Summary: TestTableMapReduceUntil failure in branch-2 Key: HBASE-26272 URL: https://issues.apache.org/jira/browse/HBASE-26272 Project: HBase Issue Type: Test Components: test Reporter: Huaxiang Sun {code:java} [ERROR] org.apache.hadoop.hbase.mapreduce.TestTableMapReduceUtil.testInitCredentialsForCluster3 Time elapsed: 8.122 s <<< ERROR! org.apache.hadoop.security.KerberosAuthException: Login failure for user: hsun/localh...@example.com from keytab /Users/hsun/work/hbase-hs/hbase-1/hbase-mapreduce/target/test-data/b12f4926-d8ec-1129-0101-1ba76e65f3c2/keytab javax.security.auth.login.LoginException: java.lang.IllegalArgumentException: Illegal principal name hsun/localh...@example.com: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to hsun/localh...@example.com at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1104) at org.apache.hadoop.hbase.mapreduce.TestTableMapReduceUtil.testInitCredentialsForCluster3(TestTableMapReduceUtil.java:233) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) Caused by: javax.security.auth.login.LoginException: java.lang.IllegalArgumentException: Illegal principal name hsun/localh...@example.com: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to hsun/localh...@example.com at org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule.commit(UserGroupInformation.java:224) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:588) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1095) ... 25 more Caused by: java.lang.IllegalArgumentException: Illegal principal name hsun/localh...@example.com: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to hsun/localh...@example.com at org.apache.hadoop.security.User.(User.java:50) at org.apache.hadoop.security.User.(User.java:43) at org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule.commit(UserGroupInformation.java:222) ... 37 more Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to hsun/localh...@example.com at org.apache.hadoop.security.authentication.u
[jira] [Created] (HBASE-26255) Add an option to use region location from meta table in TableSnapshotInputFormat
Huaxiang Sun created HBASE-26255: Summary: Add an option to use region location from meta table in TableSnapshotInputFormat Key: HBASE-26255 URL: https://issues.apache.org/jira/browse/HBASE-26255 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 2.3.6 Reporter: Huaxiang Sun Assignee: Huaxiang Sun TableSnapshotInputFormat currently calculates block locality of a region to decide the best location to run the task. While this works for a small scale table snapshot, we found that for a table snapshot with many regions, the locality calculation takes too much time. In the case of a table with high locality, we can use region location from meta table to decide a snapshot region's location. Add an option to support it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-26108) add option to disable scanMetrics in TableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HBASE-26108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-26108. -- Fix Version/s: 2.4.5 3.0.0-alpha-2 2.3.6 Resolution: Fixed > add option to disable scanMetrics in TableSnapshotInputFormat > - > > Key: HBASE-26108 > URL: https://issues.apache.org/jira/browse/HBASE-26108 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.3.5 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.3.6, 3.0.0-alpha-2, 2.4.5 > > > When running spark job with TableSnapshotInputFormat, we found that scan is > very slower. We found that scanMetrics is hardcoded as enabled, spark's > newAPIHadoopRDD uses DummyReporter in hadoop, which causes the following > exception and 80% cpu time is spent on this exception handling. > Need to provide an option to disable scanMetrics. > java.base@11.0.5/java.lang.Throwable.fillInStackTrace(Native Method) > java.base@11.0.5/java.lang.Throwable.fillInStackTrace(Throwable.java:787) => > holding Monitor(java.util.MissingResourceException@258206255}) > java.base@11.0.5/java.lang.Throwable.(Throwable.java:292) > java.base@11.0.5/java.lang.Exception.(Exception.java:84) > java.base@11.0.5/java.lang.RuntimeException.(RuntimeException.java:80) > java.base@11.0.5/java.util.MissingResourceException.(MissingResourceException.java:85) > java.base@11.0.5/java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:2055) > java.base@11.0.5/java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1689) > java.base@11.0.5/java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1593) > java.base@11.0.5/java.util.ResourceBundle.getBundle(ResourceBundle.java:1284) > app//org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37) > app//org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56) > => holding Monitor(java.lang.Class@545605549}) > app//org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterGroupName(ResourceBundles.java:77) > app//org.apache.hadoop.mapreduce.counters.CounterGroupFactory.newGroup(CounterGroupFactory.java:94) > app//org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:227) > app//org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154) > app//org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl$DummyReporter.getCounter(TaskAttemptContextImpl.java:110) > app//org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAttemptContextImpl.java:76) > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:311) > org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat$TableSnapshotRegionRecordReader.nextKeyValue(TableSnapshotInputFormat.java:167) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26108) add option to disable scanMetrics in TableSnapshotInputFormat
Huaxiang Sun created HBASE-26108: Summary: add option to disable scanMetrics in TableSnapshotInputFormat Key: HBASE-26108 URL: https://issues.apache.org/jira/browse/HBASE-26108 Project: HBase Issue Type: Improvement Affects Versions: 2.3.5 Reporter: Huaxiang Sun Assignee: Huaxiang Sun When running spark job with TableSnapshotInputFormat, we found that scan is very slower. We found that scanMetrics is hardcoded as enabled, spark's newAPIHadoopRDD uses DummyReporter in hadoop, which causes the following exception and 80% cpu time is spent on this exception handling. Need to provide an option to disable scanMetrics. java.base@11.0.5/java.lang.Throwable.fillInStackTrace(Native Method) java.base@11.0.5/java.lang.Throwable.fillInStackTrace(Throwable.java:787) => holding Monitor(java.util.MissingResourceException@258206255}) java.base@11.0.5/java.lang.Throwable.(Throwable.java:292) java.base@11.0.5/java.lang.Exception.(Exception.java:84) java.base@11.0.5/java.lang.RuntimeException.(RuntimeException.java:80) java.base@11.0.5/java.util.MissingResourceException.(MissingResourceException.java:85) java.base@11.0.5/java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:2055) java.base@11.0.5/java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1689) java.base@11.0.5/java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1593) java.base@11.0.5/java.util.ResourceBundle.getBundle(ResourceBundle.java:1284) app//org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37) app//org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56) => holding Monitor(java.lang.Class@545605549}) app//org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterGroupName(ResourceBundles.java:77) app//org.apache.hadoop.mapreduce.counters.CounterGroupFactory.newGroup(CounterGroupFactory.java:94) app//org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:227) app//org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154) app//org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl$DummyReporter.getCounter(TaskAttemptContextImpl.java:110) app//org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAttemptContextImpl.java:76) org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:311) org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat$TableSnapshotRegionRecordReader.nextKeyValue(TableSnapshotInputFormat.java:167) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26092) JVM core dump in the replication path
Huaxiang Sun created HBASE-26092: Summary: JVM core dump in the replication path Key: HBASE-26092 URL: https://issues.apache.org/jira/browse/HBASE-26092 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 2.3.5 Reporter: Huaxiang Sun When replication is turned on, we found the following code dump in the region server. I checked the code dump for replication. I think I got some ideas. For replication, when RS receives walEdits from remote cluster, it needs to send them out to final RS. In this case, NettyRpcConnection is deployed, calls are queued while it refers to ByteBuffer in the context of replicationHandler (returned to the pool once it returns). Code dump will happen since the byteBuffer has been reused. Needs ref count in this asynchronous processing. Feel free to take it, otherwise, I will try to work on a patch later. {code:java} Stack: [0x7fb1bf039000,0x7fb1bf13a000], sp=0x7fb1bf138560, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) J 28175 C2 org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I (21 bytes) @ 0x7fd2663c [0x7fd263c0+0x27c] J 14912 C2 org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.writeRequest(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Lorg/apache/hadoop/hbase/ipc/Call;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V (370 bytes) @ 0x7fdbbb94b590 [0x7fdbbb949c00+0x1990] J 14911 C2 org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V (30 bytes) @ 0x7fdbb972d1d4 [0x7fdbb972d1a0+0x34] J 30476 C2 org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V (149 bytes) @ 0x7fdbbd4e7084 [0x7fdbbd4e6900+0x784] J 14914 C2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$6$1.run()V (22 bytes) @ 0x7fdbbb9344ec [0x7fdbbb934280+0x26c] J 23528 C2 org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z (106 bytes) @ 0x7fdbbcbb0efc [0x7fdbbcbb0c40+0x2bc] J 15987% C2 org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (461 bytes) @ 0x7fdbbbaf1580 [0x7fdbbbaf1360+0x220] j org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44 j org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run()V+11 j org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run()V+4 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25724) update download area for 2.3.5 as new stable build
Huaxiang Sun created HBASE-25724: Summary: update download area for 2.3.5 as new stable build Key: HBASE-25724 URL: https://issues.apache.org/jira/browse/HBASE-25724 Project: HBase Issue Type: Sub-task Components: community Reporter: Sean Busbey Assignee: Sean Busbey * update the stable symlink to point to 2.3.4 * Remove the 2.3.3 release from downloads.a.o -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25721) Add 2.3.5 to the downloads page
[ https://issues.apache.org/jira/browse/HBASE-25721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25721. -- Resolution: Fixed > Add 2.3.5 to the downloads page > --- > > Key: HBASE-25721 > URL: https://issues.apache.org/jira/browse/HBASE-25721 > Project: HBase > Issue Type: Task > Components: community >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25722) Update reporter tool with new release, 2.3.5
Huaxiang Sun created HBASE-25722: Summary: Update reporter tool with new release, 2.3.5 Key: HBASE-25722 URL: https://issues.apache.org/jira/browse/HBASE-25722 Project: HBase Issue Type: Sub-task Reporter: Huaxiang Sun Assignee: Viraj Jasani Reporter tool: [https://reporter.apache.org/addrelease.html?hbase] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25721) Add 2.3.5 to the downloads page
Huaxiang Sun created HBASE-25721: Summary: Add 2.3.5 to the downloads page Key: HBASE-25721 URL: https://issues.apache.org/jira/browse/HBASE-25721 Project: HBase Issue Type: Task Components: community Reporter: Huaxiang Sun Assignee: Huaxiang Sun -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25590) Bulkload replication HFileRefs cannot be cleared in some cases where set exclude-namespace/exclude-table-cfs
[ https://issues.apache.org/jira/browse/HBASE-25590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25590. -- Resolution: Fixed Resolving it for 2.3.5 release. Please reopen when landing the 2.2 patch. > Bulkload replication HFileRefs cannot be cleared in some cases where set > exclude-namespace/exclude-table-cfs > > > Key: HBASE-25590 > URL: https://issues.apache.org/jira/browse/HBASE-25590 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.5 > > > In > [ReplicationSource#addHFileRefs|https://github.com/apache/hbase/blob/ed90a14995acd87111d2b9849f07d84418ca43d4/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L264], > we may add unwanted hfiles to the _HFileRefs_ if a peer is set > _replicate_all_ true and set _exclude-namespace/exclude-table-cfs_. > These unwanted _HFileRefs_ will not be replicated to remote cluster and not > be cleared. > Two problems are caused by this bug: > # The metric sizeOfHFileRefsQueue cannot be zeroed. > # Referenced HFiles cannot be deleted by _ReplicationHFileCleaner._ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25691) Test failure: TestVerifyBucketCacheFile.testRetrieveFromFile
[ https://issues.apache.org/jira/browse/HBASE-25691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25691. -- Resolution: Fixed > Test failure: TestVerifyBucketCacheFile.testRetrieveFromFile > > > Key: HBASE-25691 > URL: https://issues.apache.org/jira/browse/HBASE-25691 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 2.3.4 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.5, 2.4.3 > > > Saw this test failure from 2.3 nightly. > https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/testReport/junit/org.apache.hadoop.hbase.io.hfile.bucket/TestVerifyBucketCacheFile/health_checks___yetus_jdk8_hadoop2_checks___testRetrieveFromFile_1__blockSize_16_384__bucketSizes__I_371a67ec_/ > h1. Regression > health checks / yetus jdk8 hadoop2 checks / > org.apache.hadoop.hbase.io.hfile.bucket.TestVerifyBucketCacheFile.testRetrieveFromFile[1: > blockSize=16,384, bucketSizes=[I@371a67ec] > Failing for the past 1 build (Since > [!https://ci-hadoop.apache.org/static/e247241e/images/16x16/red.png! > #190|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/] > ) > [Took 0.32 > sec.|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/testReport/junit/org.apache.hadoop.hbase.io.hfile.bucket/TestVerifyBucketCacheFile/health_checks___yetus_jdk8_hadoop2_checks___testRetrieveFromFile_1__blockSize_16_384__bucketSizes__I_371a67ec_/history] > > h3. Stacktrace > java.lang.AssertionError at > org.apache.hadoop.hbase.io.hfile.bucket.TestVerifyBucketCacheFile.testRetrieveFromFile(TestVerifyBucketCacheFile.java:136) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25691) Test failure: TestVerifyBucketCacheFile.testRetrieveFromFile
Huaxiang Sun created HBASE-25691: Summary: Test failure: TestVerifyBucketCacheFile.testRetrieveFromFile Key: HBASE-25691 URL: https://issues.apache.org/jira/browse/HBASE-25691 Project: HBase Issue Type: Test Components: test Affects Versions: 2.3.4 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Saw this test failure from 2.3 nightly. https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/testReport/junit/org.apache.hadoop.hbase.io.hfile.bucket/TestVerifyBucketCacheFile/health_checks___yetus_jdk8_hadoop2_checks___testRetrieveFromFile_1__blockSize_16_384__bucketSizes__I_371a67ec_/ h1. Regression health checks / yetus jdk8 hadoop2 checks / org.apache.hadoop.hbase.io.hfile.bucket.TestVerifyBucketCacheFile.testRetrieveFromFile[1: blockSize=16,384, bucketSizes=[I@371a67ec] Failing for the past 1 build (Since [!https://ci-hadoop.apache.org/static/e247241e/images/16x16/red.png! #190|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/] ) [Took 0.32 sec.|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/testReport/junit/org.apache.hadoop.hbase.io.hfile.bucket/TestVerifyBucketCacheFile/health_checks___yetus_jdk8_hadoop2_checks___testRetrieveFromFile_1__blockSize_16_384__bucketSizes__I_371a67ec_/history] h3. Stacktrace java.lang.AssertionError at org.apache.hadoop.hbase.io.hfile.bucket.TestVerifyBucketCacheFile.testRetrieveFromFile(TestVerifyBucketCacheFile.java:136) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25639) meta replica state is not respected during active master switch
[ https://issues.apache.org/jira/browse/HBASE-25639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25639. -- Fix Version/s: 2.3.5 Resolution: Fixed > meta replica state is not respected during active master switch > --- > > Key: HBASE-25639 > URL: https://issues.apache.org/jira/browse/HBASE-25639 > Project: HBase > Issue Type: Bug > Components: meta replicas >Affects Versions: 2.0.6, 2.1.9, 2.2.6, 2.3.4 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Critical > Fix For: 2.3.5 > > > We saw this warning in master log. > WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: No > RegionStateNode for hbase:meta,,1_0003 but reported as up on > server1.example.com,16020,1614958467735; closing... > > The root cause is that meta replica regions are in zookeeper, and these state > are not iterated by the new active master so it loses track. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25640) Support hbase rpc compression for remote rpc only
Huaxiang Sun created HBASE-25640: Summary: Support hbase rpc compression for remote rpc only Key: HBASE-25640 URL: https://issues.apache.org/jira/browse/HBASE-25640 Project: HBase Issue Type: Improvement Components: rpc Affects Versions: 2.3.4 Reporter: Huaxiang Sun Assignee: Huaxiang Sun The purpose of Rpc compression is to save network bandwidth. For local communication (both hbase client and RS are on the same node), rpc compression is unnecessary as local communication is memory copy only and does not go through nic. Rpc compression for local communication will be a waste of cpu computation power as compress/decompress is cpu intensive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25639) meta replica state is not respected during active master switch
Huaxiang Sun created HBASE-25639: Summary: meta replica state is not respected during active master switch Key: HBASE-25639 URL: https://issues.apache.org/jira/browse/HBASE-25639 Project: HBase Issue Type: Bug Components: meta replicas Reporter: Huaxiang Sun Assignee: Huaxiang Sun We saw this warning in master log. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: No RegionStateNode for hbase:meta,,1_0003 but reported as up on server1.example.com,16020,1614958467735; closing... The root cause is that meta replica regions are in zookeeper, and these state are not iterated by the new active master so it loses track. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25537) Misleading Range metrcis
Huaxiang Sun created HBASE-25537: Summary: Misleading Range metrcis Key: HBASE-25537 URL: https://issues.apache.org/jira/browse/HBASE-25537 Project: HBase Issue Type: Bug Components: metrics Reporter: Huaxiang Sun Assignee: Huaxiang Sun Fix For: 2.3.4 Attachments: Screen Shot 2021-01-27 at 1.09.32 PM.png Found some cases that max value is included in a smaller range, which is confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25417) Send announce email
[ https://issues.apache.org/jira/browse/HBASE-25417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25417. -- Resolution: Fixed > Send announce email > --- > > Key: HBASE-25417 > URL: https://issues.apache.org/jira/browse/HBASE-25417 > Project: HBase > Issue Type: Sub-task >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25409) Release 2.3.4
[ https://issues.apache.org/jira/browse/HBASE-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25409. -- Resolution: Fixed > Release 2.3.4 > - > > Key: HBASE-25409 > URL: https://issues.apache.org/jira/browse/HBASE-25409 > Project: HBase > Issue Type: Task > Components: community >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25416) Add 2.3.4 to the downloads page
[ https://issues.apache.org/jira/browse/HBASE-25416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25416. -- Fix Version/s: (was: 3.0.0-alpha-1) Resolution: Fixed > Add 2.3.4 to the downloads page > --- > > Key: HBASE-25416 > URL: https://issues.apache.org/jira/browse/HBASE-25416 > Project: HBase > Issue Type: Sub-task >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25412) Release version 2.3.4 in Jira
[ https://issues.apache.org/jira/browse/HBASE-25412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25412. -- Resolution: Fixed > Release version 2.3.4 in Jira > - > > Key: HBASE-25412 > URL: https://issues.apache.org/jira/browse/HBASE-25412 > Project: HBase > Issue Type: Sub-task >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25411) "Release" staged nexus repository
[ https://issues.apache.org/jira/browse/HBASE-25411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25411. -- Resolution: Fixed > "Release" staged nexus repository > - > > Key: HBASE-25411 > URL: https://issues.apache.org/jira/browse/HBASE-25411 > Project: HBase > Issue Type: Sub-task >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25410) Spin RCs
[ https://issues.apache.org/jira/browse/HBASE-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25410. -- Resolution: Fixed > Spin RCs > > > Key: HBASE-25410 > URL: https://issues.apache.org/jira/browse/HBASE-25410 > Project: HBase > Issue Type: Sub-task >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25413) Promote 2.3.4 RC artifacts in svn
[ https://issues.apache.org/jira/browse/HBASE-25413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25413. -- Resolution: Fixed > Promote 2.3.4 RC artifacts in svn > - > > Key: HBASE-25413 > URL: https://issues.apache.org/jira/browse/HBASE-25413 > Project: HBase > Issue Type: Sub-task >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25415) Push signed release tag
[ https://issues.apache.org/jira/browse/HBASE-25415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25415. -- Resolution: Fixed > Push signed release tag > --- > > Key: HBASE-25415 > URL: https://issues.apache.org/jira/browse/HBASE-25415 > Project: HBase > Issue Type: Sub-task >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25368) Filter out more invalid encoded name in isEncodedRegionName(byte[] regionName)
[ https://issues.apache.org/jira/browse/HBASE-25368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25368. -- Fix Version/s: 3.0.0-alpha-1 Resolution: Fixed > Filter out more invalid encoded name in isEncodedRegionName(byte[] > regionName) > --- > > Key: HBASE-25368 > URL: https://issues.apache.org/jira/browse/HBASE-25368 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 3.0.0-alpha-1 > > > {code:java} > public static boolean isEncodedRegionName(byte[] regionName) { > // If not parseable as region name, presume encoded. TODO: add stringency; > e.g. if hex. > return parseRegionNameOrReturnNull(regionName) == null && regionName.length > <= MD5_HEX_LENGTH; > } > Right now, if it passes in an table name, it still thinks it is a encoded > region name and will result in unnecessary registry query for meta regions. > This can be avoided if table names can be filtered out early in this > method.{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25418) Run a correctness test with ITBLL
[ https://issues.apache.org/jira/browse/HBASE-25418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25418. -- Resolution: Fixed Run itbll with chaos monkey, inserted 3billion rows and it was verified successfully, it is for 2.3.4RC4. > Run a correctness test with ITBLL > - > > Key: HBASE-25418 > URL: https://issues.apache.org/jira/browse/HBASE-25418 > Project: HBase > Issue Type: Sub-task >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-25371) When openRegion fails during initial verification(before initializing and setting seq num), exception is observed during region close.
[ https://issues.apache.org/jira/browse/HBASE-25371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun reopened HBASE-25371: -- I just found that this Jira has not been merged to branch-2, branch-2.3,branch-2.4 yet. Can you do backport and set the release version correctly? Thanks. > When openRegion fails during initial verification(before initializing and > setting seq num), exception is observed during region close. > -- > > Key: HBASE-25371 > URL: https://issues.apache.org/jira/browse/HBASE-25371 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 2.2.3 >Reporter: Ajeet Rai >Assignee: Mohammad Arshad >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.1, 2.3.5 > > > When openRegion fails during initial verification(before initializing and > setting seq num), exception is observed during region close: > > 2020-12-03 16:34:47,133 ERROR > [RS_OPEN_REGION-regionserver/AA:16040-0] handler.OpenRegionHandler: > Failed open of > region=ns2:testtable4,15,1606912406234.cd386135276b7d3c57416df3666e4aea.2020-12-03 > 16:34:47,133 ERROR [RS_OPEN_REGION-regionserver/blrphispra01054:16040-0] > handler.OpenRegionHandler: Failed open of > region=ns2:testtable4,15,1606912406234.cd386135276b7d3c57416df3666e4aea.java.io.IOException: > The new max sequence id 1 is less than the old max sequence id 7134 at > org.apache.hadoop.hbase.wal.WALSplitUtil.writeRegionSequenceIdFile(WALSplitUtil.java:418) > at > org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1253) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1793) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1606) at > org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1552) at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7522) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7467) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7439) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7397) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7348) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:286) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:111) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25083) make sure the next hbase 1.y release has Hadoop 2.10 as a minimum version
[ https://issues.apache.org/jira/browse/HBASE-25083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25083. -- Resolution: Fixed > make sure the next hbase 1.y release has Hadoop 2.10 as a minimum version > - > > Key: HBASE-25083 > URL: https://issues.apache.org/jira/browse/HBASE-25083 > Project: HBase > Issue Type: Task > Components: documentation, hadoop2 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.3.4, 2.5.0, 2.4.1 > > > Our reference guide list of prerequisites still has Hadoop 2.8 and 2.9 listed > for HBase 1 releases. > * [hadoop 2.8 is > EOM|https://lists.apache.org/thread.html/r348f7bc93a522f05b7cce78a911854d128a6b1b8bd8124bad4d06ce6%40%3Cuser.hadoop.apache.org%3E] > * [hadoop 2.9 is > EOM|https://lists.apache.org/thread.html/r16b14cce9504f7a9d228612c6b808e72d8dd20863c78be51a7e04ed5%40%3Cuser.hadoop.apache.org%3E] > The current list in the reference guide for HBase 1.6 is just the 1.5 list > copied. we should update it to remove 2.8 and 2.9 and make sure we're no > longer doing build/test based on those versions for branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25356) HBaseAdmin#getRegion() needs to filter out non-regionName and non-encodedRegionName
[ https://issues.apache.org/jira/browse/HBASE-25356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25356. -- Fix Version/s: 2.4.1 2.5.0 2.3.4 Resolution: Fixed > HBaseAdmin#getRegion() needs to filter out non-regionName and > non-encodedRegionName > --- > > Key: HBASE-25356 > URL: https://issues.apache.org/jira/browse/HBASE-25356 > Project: HBase > Issue Type: Bug > Components: shell >Affects Versions: 2.3.3, 2.4.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.3.4, 2.5.0, 2.4.1 > > > I was running shell command to major compact meta table. The implementation > is wrong because it tries to search the meta table with meta table name. This > also results in an unnecessary scan of meta table. > > majorCompactRegion() is calling HBaseAdmin#getRegion() which basically scan > meta table itself. > This command is being used by operator quite often, we need to correct it. > > This applies to split/flush command as well, which calls getRegion() with > tableName as an input. > > The solution is that getRegion() needs to filter out non-regionName and > non-encodedRegionName, this will save a query of meta table and a heavy scan > of meta table. If meta table size is large, the overhead is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25470) Add unitest for HBASE-25445 - SplitWALRemoteProcedure failed to archive split WAL
Huaxiang Sun created HBASE-25470: Summary: Add unitest for HBASE-25445 - SplitWALRemoteProcedure failed to archive split WAL Key: HBASE-25470 URL: https://issues.apache.org/jira/browse/HBASE-25470 Project: HBase Issue Type: Bug Components: wal Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.2.6, 2.3.2 Reporter: mokai Assignee: Anjan Das Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.4, 2.5.0, 2.4.1 If 'hbase.wal.dir' and 'hbase.rootdir' are configured to different filesystem, SplitWALRemoteProcedure archived split WAL failed since SplitWALManager using wrong fs instance. SplitWALManager should use WAL corresponding fs instance. Steps to Reproduce: * Configure 'hbase.wal.dir' and 'hbase.rootdir' so that they point to different fs instances. * Start HBase with multiple RS. * Create a couple of tables and some rows in them so that the RSs get assigned with some regions. * Take any RS with non-zero number of regions offline. * Check master logs for "Wrong FS" error as shown in the screenshot attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25293) Followup jira to address the client handling issue when chaning from meta replica to non-meta-replica at the server side.
[ https://issues.apache.org/jira/browse/HBASE-25293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25293. -- Fix Version/s: 2.4.1 Resolution: Fixed > Followup jira to address the client handling issue when chaning from meta > replica to non-meta-replica at the server side. > - > > Key: HBASE-25293 > URL: https://issues.apache.org/jira/browse/HBASE-25293 > Project: HBase > Issue Type: Sub-task >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Minor > Fix For: 2.4.1 > > > [https://github.com/apache/hbase/pull/2643] > > {quote} > With my operator hat on, I'd assume that LOAD_BALANCE with 1 replica count > works like no read replicas configured (logic wise at-least, even though the > code paths are different). > {quote}If the server side does not support meta replica, the client side > cannot be configured to support this mode > {quote} > Since clients are usually long running (meaning we may not be able to restart > client or they using cached HBase connection) and meta replica count can be > altered on the service side on the fly, I'd expect client to work across > these changes without any configuration changes. WDYT? > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25418) Run a correctness test with ITBLL
Huaxiang Sun created HBASE-25418: Summary: Run a correctness test with ITBLL Key: HBASE-25418 URL: https://issues.apache.org/jira/browse/HBASE-25418 Project: HBase Issue Type: Task Reporter: Huaxiang Sun Assignee: Huaxiang Sun -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25411) CLONE - "Release" staged nexus repository
Huaxiang Sun created HBASE-25411: Summary: CLONE - "Release" staged nexus repository Key: HBASE-25411 URL: https://issues.apache.org/jira/browse/HBASE-25411 Project: HBase Issue Type: Sub-task Reporter: Viraj Jasani Assignee: Viraj Jasani -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25414) CLONE - Update reporter tool with new release
Huaxiang Sun created HBASE-25414: Summary: CLONE - Update reporter tool with new release Key: HBASE-25414 URL: https://issues.apache.org/jira/browse/HBASE-25414 Project: HBase Issue Type: Sub-task Reporter: Viraj Jasani Assignee: Nick Dimiduk Reporter tool: [https://reporter.apache.org/addrelease.html?hbase] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25415) CLONE - Push signed release tag
Huaxiang Sun created HBASE-25415: Summary: CLONE - Push signed release tag Key: HBASE-25415 URL: https://issues.apache.org/jira/browse/HBASE-25415 Project: HBase Issue Type: Sub-task Reporter: Viraj Jasani Assignee: Viraj Jasani -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25410) CLONE - Spin RCs
Huaxiang Sun created HBASE-25410: Summary: CLONE - Spin RCs Key: HBASE-25410 URL: https://issues.apache.org/jira/browse/HBASE-25410 Project: HBase Issue Type: Sub-task Reporter: Viraj Jasani Assignee: Viraj Jasani -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25417) CLONE - Send announce email
Huaxiang Sun created HBASE-25417: Summary: CLONE - Send announce email Key: HBASE-25417 URL: https://issues.apache.org/jira/browse/HBASE-25417 Project: HBase Issue Type: Sub-task Reporter: Viraj Jasani Assignee: Viraj Jasani -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25412) CLONE - Release version 2.3.2 in Jira
Huaxiang Sun created HBASE-25412: Summary: CLONE - Release version 2.3.2 in Jira Key: HBASE-25412 URL: https://issues.apache.org/jira/browse/HBASE-25412 Project: HBase Issue Type: Sub-task Reporter: Viraj Jasani Assignee: Viraj Jasani -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25416) CLONE - Add 2.3.2 to the downloads page
Huaxiang Sun created HBASE-25416: Summary: CLONE - Add 2.3.2 to the downloads page Key: HBASE-25416 URL: https://issues.apache.org/jira/browse/HBASE-25416 Project: HBase Issue Type: Sub-task Reporter: Viraj Jasani Assignee: Viraj Jasani Fix For: 3.0.0-alpha-1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25413) CLONE - Promote 2.3.2 RC artifacts in svn
Huaxiang Sun created HBASE-25413: Summary: CLONE - Promote 2.3.2 RC artifacts in svn Key: HBASE-25413 URL: https://issues.apache.org/jira/browse/HBASE-25413 Project: HBase Issue Type: Sub-task Reporter: Viraj Jasani Assignee: Nick Dimiduk -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25409) Release 2.3.4
Huaxiang Sun created HBASE-25409: Summary: Release 2.3.4 Key: HBASE-25409 URL: https://issues.apache.org/jira/browse/HBASE-25409 Project: HBase Issue Type: Task Components: community Reporter: Viraj Jasani Assignee: Viraj Jasani -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25358) meta replica regions are assigned to the same region server during SCP.
[ https://issues.apache.org/jira/browse/HBASE-25358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25358. -- Resolution: Invalid I checked the code and there is guard in the code to avoid assigning two replica regions to the same region server. So kind of lost and went back to rerun itbll and was able to reproduce it. It is itbll actions which moves regions around and in some cases, it moves meta replica regions to the same region server. This is not a bug and resolve it. > meta replica regions are assigned to the same region server during SCP. > --- > > Key: HBASE-25358 > URL: https://issues.apache.org/jira/browse/HBASE-25358 > Project: HBase > Issue Type: Bug > Components: read replicas >Affects Versions: 2.4.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > When running 2.4.0 RC1 with meta replica enabled, during SCP, meta replica > regions are assigned to the same region server. I think the reason is that > SCP uses round robin algo to assign meta replicas and do not exclude region > servers hosting replica regions. This is not a new issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25368) Filter out more invalid encoded name in isEncodedRegionName(byte[] regionName)
Huaxiang Sun created HBASE-25368: Summary: Filter out more invalid encoded name in isEncodedRegionName(byte[] regionName) Key: HBASE-25368 URL: https://issues.apache.org/jira/browse/HBASE-25368 Project: HBase Issue Type: Improvement Components: Client Reporter: Huaxiang Sun {code:java} public static boolean isEncodedRegionName(byte[] regionName) { // If not parseable as region name, presume encoded. TODO: add stringency; e.g. if hex. return parseRegionNameOrReturnNull(regionName) == null && regionName.length <= MD5_HEX_LENGTH; } Right now, if it passes in an table name, it still thinks it is a encoded region name and will result in unnecessary registry query for meta regions. This can be avoided if table names can be filtered out early in this method.{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25358) meta replica regions are assigned to
Huaxiang Sun created HBASE-25358: Summary: meta replica regions are assigned to Key: HBASE-25358 URL: https://issues.apache.org/jira/browse/HBASE-25358 Project: HBase Issue Type: Bug Components: read replicas Reporter: Huaxiang Sun -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25356) shell command major_compact misbehave for hbase:meta
Huaxiang Sun created HBASE-25356: Summary: shell command major_compact misbehave for hbase:meta Key: HBASE-25356 URL: https://issues.apache.org/jira/browse/HBASE-25356 Project: HBase Issue Type: Bug Components: shell Affects Versions: 1.6.0, 2.4.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun I was running shell command to major compact meta table. The implementation is wrong because it tries to search the meta table with meta table name. This also results in an unnecessary scan of meta table. majorCompactRegion() is calling getRegion() which basically scan meta table itself. This command is being used by operator quite often, we need to correct it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-25343) Avoid the failed meta replica region temporarily in Load Balance mode
[ https://issues.apache.org/jira/browse/HBASE-25343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun reopened HBASE-25343: -- Reopen to reflect the new scope. > Avoid the failed meta replica region temporarily in Load Balance mode > - > > Key: HBASE-25343 > URL: https://issues.apache.org/jira/browse/HBASE-25343 > Project: HBase > Issue Type: Sub-task > Components: meta replicas >Affects Versions: 2.4.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.4.1 > > > This is a follow-up enhancement with Stack, Duo. With the newly introduced > meta replica LoadBalance mode, if there is something wrong with one of meta > replica regions, the current logic is that it keeps trying until the meta > replica region is onlined again or it reports error, i.e, there is no HA at > LoadBalance mode. HA can be implemented if it reports timeout with one meta > replica region and tries another meta replica region. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25343) Add HA support on top of Load Balance mode
[ https://issues.apache.org/jira/browse/HBASE-25343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-25343. -- Resolution: Won't Do > Add HA support on top of Load Balance mode > -- > > Key: HBASE-25343 > URL: https://issues.apache.org/jira/browse/HBASE-25343 > Project: HBase > Issue Type: Sub-task > Components: meta replicas >Affects Versions: 2.4.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.4.1 > > > This is a follow-up enhancement with Stack, Duo. With the newly introduced > meta replica LoadBalance mode, if there is something wrong with one of meta > replica regions, the current logic is that it keeps trying until the meta > replica region is onlined again or it reports error, i.e, there is no HA at > LoadBalance mode. HA can be implemented if it reports timeout with one meta > replica region and tries another meta replica region. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25343) Add HA support on top of Load Balance mode
Huaxiang Sun created HBASE-25343: Summary: Add HA support on top of Load Balance mode Key: HBASE-25343 URL: https://issues.apache.org/jira/browse/HBASE-25343 Project: HBase Issue Type: Sub-task Components: meta replicas Affects Versions: 2.4.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Fix For: 2.4.1 This is a follow-up enhancement with Stack, Duo. With the newly introduced meta replica LoadBalance mode, if there is something wrong with one of meta replica regions, the current logic is that it keeps trying until the meta replica region is onlined again or it reports error, i.e, there is no HA at LoadBalance mode. HA can be implemented if it reports timeout with one meta replica region and tries another meta replica region. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25293) Followup jira to address the client handling issue when chaning from meta replica to non-meta-replica at the server side.
Huaxiang Sun created HBASE-25293: Summary: Followup jira to address the client handling issue when chaning from meta replica to non-meta-replica at the server side. Key: HBASE-25293 URL: https://issues.apache.org/jira/browse/HBASE-25293 Project: HBase Issue Type: Sub-task Reporter: Huaxiang Sun [https://github.com/apache/hbase/pull/2643] {quote} With my operator hat on, I'd assume that LOAD_BALANCE with 1 replica count works like no read replicas configured (logic wise at-least, even though the code paths are different). {quote}If the server side does not support meta replica, the client side cannot be configured to support this mode {quote} Since clients are usually long running (meaning we may not be able to restart client or they using cached HBase connection) and meta replica count can be altered on the service side on the fly, I'd expect client to work across these changes without any configuration changes. WDYT? {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25291) Document how to enable the meta replica load balance mode for the client
Huaxiang Sun created HBASE-25291: Summary: Document how to enable the meta replica load balance mode for the client Key: HBASE-25291 URL: https://issues.apache.org/jira/browse/HBASE-25291 Project: HBase Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Need to document how to enable meta replica Load Balance mode for clients. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25248) Followup jira to create single thread ScheduledExecutorService in AsyncConnImpl, and schedule all these periodic tasks
Huaxiang Sun created HBASE-25248: Summary: Followup jira to create single thread ScheduledExecutorService in AsyncConnImpl, and schedule all these periodic tasks Key: HBASE-25248 URL: https://issues.apache.org/jira/browse/HBASE-25248 Project: HBase Issue Type: Sub-task Reporter: Huaxiang Sun This is a followup Jira for comments in [https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0.] {quote} h4. *[saintstack|https://github.com/saintstack]* [18 hours ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517040579] Member So, implements Stoppable rather than do what the likes of AuthUtil does where it does createDummyStoppable and then has an internal do-nothing Stoppable? Makes sense. Perhaps add comment that it is a do-nothing stop required by ScheduledChore impls. s/isStopped/stopped/ [!https://avatars1.githubusercontent.com/u/62515050?s=60&v=4|width=28,height=28!|https://github.com/huaxiangsun] h4. *[huaxiangsun|https://github.com/huaxiangsun]* [18 hours ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517042290] Author Member Will do. [!https://avatars2.githubusercontent.com/u/45484?s=60&v=4|width=28,height=28!|https://github.com/ndimiduk] h4. *[ndimiduk|https://github.com/ndimiduk]* [17 hours ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517057141] Member Maybe in the future we can put a default empty implementation on the interface, and then implementers who don't need it can ignore it. [!https://avatars3.githubusercontent.com/u/4958168?s=60&u=fc28b222c03c02201d705b025a5293d6c471f7b3&v=4|width=28,height=28!|https://github.com/Apache9] h4. *[Apache9|https://github.com/Apache9]* [17 hours ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517057999] Member Maybe we could just use a ScheduledExecutorService at client side, the ChoreService is designed to be used at server side I believe. Anyway, not a blocker for now. {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25247) Followup jira to encap all meta replica mode/selector processing into CatalogReplicaModeManager
Huaxiang Sun created HBASE-25247: Summary: Followup jira to encap all meta replica mode/selector processing into CatalogReplicaModeManager Key: HBASE-25247 URL: https://issues.apache.org/jira/browse/HBASE-25247 Project: HBase Issue Type: Sub-task Components: meta Reporter: Huaxiang Sun Assignee: Huaxiang Sun This is follow up with Stack's comments in [https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0.] {quote} h4. *[saintstack|https://github.com/saintstack]* [6 days ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r514558880] Member Yeah, said this before but in follow-on, would be good to shove all this stuff into a CatalogReplicaMode class. Internally this class would figure which policy to run. It would have a method that took a Scan that allowed decorating the Scan w/ whatever the mode needed to implement its policy. Later. [!https://avatars1.githubusercontent.com/u/62515050?s=60&v=4|width=28,height=28!|https://github.com/huaxiangsun] h4. *[huaxiangsun|https://github.com/huaxiangsun]* [6 days ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r514587250] Author Member Now I thought about it, it makes sense. Maybe a CatalogReplicaModeManager class which encaps mode and selector? Let me create a followup jira after this is merged. {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25241) Add integration test for meta replica load balance mode
Huaxiang Sun created HBASE-25241: Summary: Add integration test for meta replica load balance mode Key: HBASE-25241 URL: https://issues.apache.org/jira/browse/HBASE-25241 Project: HBase Issue Type: Sub-task Components: integration tests Reporter: Huaxiang Sun We need to create an integration test which has meta replica load balance mode enabled and make sure its correctness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25158) Enhance balancer to make sure no meta primary/replica regions are going to be assigned to one same region server.
Huaxiang Sun created HBASE-25158: Summary: Enhance balancer to make sure no meta primary/replica regions are going to be assigned to one same region server. Key: HBASE-25158 URL: https://issues.apache.org/jira/browse/HBASE-25158 Project: HBase Issue Type: Sub-task Reporter: Huaxiang Sun Region replica has enhancement in balancer that primary region and its replicas are not going to be assigned to the same region server. Today, there is only one meta region, so this enhancement is still enough. With split meta coming in, it needs to make sure that no meta regoin/replicas is going to be assigned to the same region server in order to avoid hotspot issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25129) serial replication, addReplicationBarrier is writing to rep_barrier family even there is no serial replication peer.
Huaxiang Sun created HBASE-25129: Summary: serial replication, addReplicationBarrier is writing to rep_barrier family even there is no serial replication peer. Key: HBASE-25129 URL: https://issues.apache.org/jira/browse/HBASE-25129 Project: HBase Issue Type: Bug Reporter: Huaxiang Sun Assignee: Huaxiang Sun We found that there are quite some data in rep_barrier family even there is no serial replication enabled. Checked the code, it is checking if table has replication enabled. Think there is another check needed (i.e, is there any serial replication peers configured). [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java#L215] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.
Huaxiang Sun created HBASE-25127: Summary: Enhance PerformanceEvaluation to profile meta replica performance. Key: HBASE-25127 URL: https://issues.apache.org/jira/browse/HBASE-25127 Project: HBase Issue Type: Sub-task Reporter: Huaxiang Sun Assignee: Huaxiang Sun -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25126) Add load balance logic in hbase-client to distribute read load over meta replica regions.
Huaxiang Sun created HBASE-25126: Summary: Add load balance logic in hbase-client to distribute read load over meta replica regions. Key: HBASE-25126 URL: https://issues.apache.org/jira/browse/HBASE-25126 Project: HBase Issue Type: Sub-task Affects Versions: 3.0.0-alpha-1 Reporter: Huaxiang Sun Assignee: Huaxiang Sun -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25125) Create a ReplicationEndPoint for meta/look table.
Huaxiang Sun created HBASE-25125: Summary: Create a ReplicationEndPoint for meta/look table. Key: HBASE-25125 URL: https://issues.apache.org/jira/browse/HBASE-25125 Project: HBase Issue Type: Sub-task Components: read replicas Affects Versions: 3.0.0-alpha-1 Reporter: Huaxiang Sun Assignee: Huaxiang Sun -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24563) Make hbck chore aware of replica region and check/fix replica region consistency
[ https://issues.apache.org/jira/browse/HBASE-24563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-24563. -- Resolution: Duplicate It is covered by other jiras, no need for this one. > Make hbck chore aware of replica region and check/fix replica region > consistency > > > Key: HBASE-24563 > URL: https://issues.apache.org/jira/browse/HBASE-24563 > Project: HBase > Issue Type: Improvement > Components: read replicas >Affects Versions: 2.3.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > Hbck1 checks/fix only primary region consistency and ignores replica region. > In hbase 2, hbck chore needs to be aware of replica region and check its > consistency as well. Hbck2 needs to fix replica region inconsistency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24824) Add more stats in PE for read replica
[ https://issues.apache.org/jira/browse/HBASE-24824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-24824. -- Fix Version/s: 2.4.0 2.3.1 3.0.0-alpha-1 Resolution: Fixed > Add more stats in PE for read replica > - > > Key: HBASE-24824 > URL: https://issues.apache.org/jira/browse/HBASE-24824 > Project: HBase > Issue Type: Improvement > Components: PE, read replicas >Affects Versions: 2.3.1 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0 > > Attachments: Screen Shot 2020-08-05 at 5.04.56 PM.png > > > Add more stats for read replica PE test. Currently, there is read replica > tests in PE, but it does not provide details for how many requests to replica > regions, and how many replica results win. > Also, want to add a latency histogram for replica reads. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24824) Add more stats in PE for read replica
Huaxiang Sun created HBASE-24824: Summary: Add more stats in PE for read replica Key: HBASE-24824 URL: https://issues.apache.org/jira/browse/HBASE-24824 Project: HBase Issue Type: Improvement Components: PE, read replicas Affects Versions: 2.3.1 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Add more stats for read replica PE test. Currently, there is read replica tests in PE, but it does not provide details for how many requests to replica regions, and how many replica results win. Also, want to add a latency histogram for replica reads. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24804) Follow up work add client side scan metrics for read replica
Huaxiang Sun created HBASE-24804: Summary: Follow up work add client side scan metrics for read replica Key: HBASE-24804 URL: https://issues.apache.org/jira/browse/HBASE-24804 Project: HBase Issue Type: New Feature Components: read replicas Affects Versions: 2.4.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun This is a followup work for HBASE-18436, which adds client metrics for read replica get. Will add metrics for scan as well. This metrics will be used in PE and any interested applications. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24705) MetaFixer#fixHoles() does not include the case for read replicas (i.e, replica regions are not created)
[ https://issues.apache.org/jira/browse/HBASE-24705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-24705. -- Fix Version/s: 2.4.0 2.3.1 3.0.0-alpha-1 Resolution: Fixed > MetaFixer#fixHoles() does not include the case for read replicas (i.e, > replica regions are not created) > --- > > Key: HBASE-24705 > URL: https://issues.apache.org/jira/browse/HBASE-24705 > Project: HBase > Issue Type: Bug > Components: read replicas >Affects Versions: 2.3.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24708) Flaky Test TestRegionReplicas#testVerifySecondaryAbilityToReadWithOnFiles
Huaxiang Sun created HBASE-24708: Summary: Flaky Test TestRegionReplicas#testVerifySecondaryAbilityToReadWithOnFiles Key: HBASE-24708 URL: https://issues.apache.org/jira/browse/HBASE-24708 Project: HBase Issue Type: Test Components: test Affects Versions: 2.3.0 Reporter: Huaxiang Sun -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24707) Fix Empty region in meta with read replica
Huaxiang Sun created HBASE-24707: Summary: Fix Empty region in meta with read replica Key: HBASE-24707 URL: https://issues.apache.org/jira/browse/HBASE-24707 Project: HBase Issue Type: Improvement Components: hbck2 Affects Versions: 2.3.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Currently, there is a case in CatalogJanitor which checks if the default region info is missing in metaRow, it is reporting it as EmptyRegionInfoList. For read replica, this entry needs to be dealt with. In hbase-1, this was caused by region server opens an orphan replica region. In hbase-2, it will not happen since checks are added to defend this case. The hback2 fix is still needed for upgrade. Issues could be brought into hbase-2 post upgrade, hbck2 needs to handle it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24706) When merging a non-empty region with empty regions, skip the reference creation to avoid compaction.
Huaxiang Sun created HBASE-24706: Summary: When merging a non-empty region with empty regions, skip the reference creation to avoid compaction. Key: HBASE-24706 URL: https://issues.apache.org/jira/browse/HBASE-24706 Project: HBase Issue Type: Improvement Components: master Affects Versions: 2.3.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24705) MetaFixer#fixHoles() does not include the case for read replicas (i.e, replica regions are not created)
Huaxiang Sun created HBASE-24705: Summary: MetaFixer#fixHoles() does not include the case for read replicas (i.e, replica regions are not created) Key: HBASE-24705 URL: https://issues.apache.org/jira/browse/HBASE-24705 Project: HBase Issue Type: Bug Components: read replicas Affects Versions: 2.3.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24688) AssignRegionHandler uses EventType.M_RS_CLOSE_META instead of EventType.M_RS_OPEN_META
Huaxiang Sun created HBASE-24688: Summary: AssignRegionHandler uses EventType.M_RS_CLOSE_META instead of EventType.M_RS_OPEN_META Key: HBASE-24688 URL: https://issues.apache.org/jira/browse/HBASE-24688 Project: HBase Issue Type: Bug Reporter: Huaxiang Sun This results in openMetaRegion always be executed in closeMetaExecutor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24661) TestHeapSize.testSizes failure
Huaxiang Sun created HBASE-24661: Summary: TestHeapSize.testSizes failure Key: HBASE-24661 URL: https://issues.apache.org/jira/browse/HBASE-24661 Project: HBase Issue Type: Bug Components: test Affects Versions: 3.0.0-alpha-1 Reporter: Huaxiang Sun {code:java} INFO] [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hbase-server --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hbase.io.TestHeapSize [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.884 s <<< FAILURE! - in org.apache.hadoop.hbase.io.TestHeapSize [ERROR] org.apache.hadoop.hbase.io.TestHeapSize.testSizes Time elapsed: 0.308 s <<< FAILURE! java.lang.AssertionError: expected:<368> but was:<360> at org.apache.hadoop.hbase.io.TestHeapSize.testSizes(TestHeapSize.java:493) [INFO] [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestHeapSize.testSizes:493 expected:<368> but was:<360> [INFO] [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0 [INFO] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24552) Replica region needs to check if primary region directory exists at file system in TransitRegionStateProcedure
[ https://issues.apache.org/jira/browse/HBASE-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-24552. -- Fix Version/s: 2.3.0 3.0.0-alpha-1 Resolution: Fixed > Replica region needs to check if primary region directory exists at file > system in TransitRegionStateProcedure > > > Key: HBASE-24552 > URL: https://issues.apache.org/jira/browse/HBASE-24552 > Project: HBase > Issue Type: Bug > Components: read replicas >Affects Versions: 2.3.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0 > > > In hbase-1, it always runs into the situation that primary region has been > closed/removed and replica region still stays in master's in-memory db and > open at one of the region servers. Balancer can move this replica region to a > new region server. During the region open, replica region does not check if > primary region has been removed and moves forward. During store open, it will > recreates primary region directory at hdfs and caused inconsistency. > > In hbase-2, things get much better. To prevent the above inconsistency from > happening, it adds more checks for a replica region, i.e, if primary regions' > directory exists and there is a .regioninfo under. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24643) Replace Cluster#primariesOfRegionsPerServer from int array to treemap
Huaxiang Sun created HBASE-24643: Summary: Replace Cluster#primariesOfRegionsPerServer from int array to treemap Key: HBASE-24643 URL: https://issues.apache.org/jira/browse/HBASE-24643 Project: HBase Issue Type: Improvement Components: Balancer Affects Versions: 2.3.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Currently, primariesOfRegionsPerServer is an int array, moveRegion does heavy work by searching the array (linearly) and insert/remove an element requires allocating/copying the whole array. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24633) Remove data locality and StoreFileCostFunction for replica regions out of balancer's cost calculation
Huaxiang Sun created HBASE-24633: Summary: Remove data locality and StoreFileCostFunction for replica regions out of balancer's cost calculation Key: HBASE-24633 URL: https://issues.apache.org/jira/browse/HBASE-24633 Project: HBase Issue Type: Improvement Components: Balancer Affects Versions: 2.3.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun We found one of the clusters with read replica enabled always balance lots of replica regions. going through the balancer's cost functions, found that data locality and StoreFileCost have same multiplier for both primary and replica regions. That is something we can improve. Data locality for replica regions should not be a dominant factor for balancer. We can either remove it out of balancer's picture for now and give it a small multiplier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24582) The current implementation of assignMetaReplicas() may assign replica meta regions to the same server hosting primary meta region.
Huaxiang Sun created HBASE-24582: Summary: The current implementation of assignMetaReplicas() may assign replica meta regions to the same server hosting primary meta region. Key: HBASE-24582 URL: https://issues.apache.org/jira/browse/HBASE-24582 Project: HBase Issue Type: Bug Affects Versions: 2.3.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun We need to take the approach similar to SplitTableRegionProcedure, which uses round robin algo to assign replica regions and excludes the primary server. '''return AssignmentManagerUtil.createAssignProceduresForOpeningNewRegions(env, hris, getRegionReplication(env), getParentRegionServerName(env));''' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24581) Replica regions should not trigger any compaction
Huaxiang Sun created HBASE-24581: Summary: Replica regions should not trigger any compaction Key: HBASE-24581 URL: https://issues.apache.org/jira/browse/HBASE-24581 Project: HBase Issue Type: Bug Components: read replicas Affects Versions: 2.3.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun I found that in certain cases replica regions can trigger compaction, such as {code:java} @Override public void postOpenDeployTasks(final PostOpenDeployContext context) throws IOException { HRegion r = context.getRegion(); long openProcId = context.getOpenProcId(); long masterSystemTime = context.getMasterSystemTime(); rpcServices.checkOpen(); LOG.info("Post open deploy tasks for {}, openProcId={}, masterSystemTime={}", r.getRegionInfo().getRegionNameAsString(), openProcId, masterSystemTime); // Do checks to see if we need to compact (references or too many files) // TODO: SHX, do not do this for replica regions? Otherwise, it is going to lost data locality for primary regions. for (HStore s : r.stores.values()) { if (s.hasReferences() || s.needsCompaction()) { this.compactSplitThread.requestSystemCompaction(r, s, "Opening Region"); } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24563) Make hbck chore aware of replica region and check/fix replica region consistency
Huaxiang Sun created HBASE-24563: Summary: Make hbck chore aware of replica region and check/fix replica region consistency Key: HBASE-24563 URL: https://issues.apache.org/jira/browse/HBASE-24563 Project: HBase Issue Type: Improvement Components: read replicas Affects Versions: 2.3.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Hbck1 checks/fix only primary region consistency and ignores replica region. In hbase 2, hbck chore needs to be aware of replica region and check its consistency as well. Hbck2 needs to fix replica region inconsistency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24554) Improve/stable read replica
Huaxiang Sun created HBASE-24554: Summary: Improve/stable read replica Key: HBASE-24554 URL: https://issues.apache.org/jira/browse/HBASE-24554 Project: HBase Issue Type: Task Components: read replicas Affects Versions: 2.3.0 Reporter: Huaxiang Sun Assignee: Huaxiang Sun Tracing some read replica issues recently, this is the umbrella Jira to track this effort. A few observations so far: # balancer balances replica regions too often, need to spend time on it. Replica region does not serve write and rarely serve reads (unless the client specifically selects the replica region). So data locality should be a very minimum factor for replica regions. # Need to study split/merge for regions with replica, need to make them more robust. With proc-v2, probably it is already robust. -- This message was sent by Atlassian Jira (v8.3.4#803005)