[jira] [Commented] (HBASE-27571) Get supports RAW
[ https://issues.apache.org/jira/browse/HBASE-27571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677684#comment-17677684 ] Bo Cui commented on HBASE-27571: hi [~zhangduo] we may push down this limit to RawScanQueryMatcher or remove it ? > Get supports RAW > > > Key: HBASE-27571 > URL: https://issues.apache.org/jira/browse/HBASE-27571 > Project: HBase > Issue Type: Improvement >Reporter: Bo Cui >Priority: Major > > [https://github.com/apache/hbase/blob/da261344cc55e7812dfe22d86d5fa88c93ed79b9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L234] > > I used the `Get` to query all put and delete in a column, but I got this error > *Cannot specify any column for a raw scan.* > Why add this restriction? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27571) Get supports RAW
[ https://issues.apache.org/jira/browse/HBASE-27571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677680#comment-17677680 ] Bo Cui commented on HBASE-27571: and then I removed the `if (columns != null && scan.isRaw())` logic and got the correct answer. My guess is that it is not sure if the family has delete operation (`delete 't1','r1','f1',1673945640117`). but if the consumer determines that the family has not delete operation, we should allow such `Get`. > Get supports RAW > > > Key: HBASE-27571 > URL: https://issues.apache.org/jira/browse/HBASE-27571 > Project: HBase > Issue Type: Improvement >Reporter: Bo Cui >Priority: Major > > [https://github.com/apache/hbase/blob/da261344cc55e7812dfe22d86d5fa88c93ed79b9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L234] > > I used the `Get` to query all put and delete in a column, but I got this error > *Cannot specify any column for a raw scan.* > Why add this restriction? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27571) Get supports RAW
Bo Cui created HBASE-27571: -- Summary: Get supports RAW Key: HBASE-27571 URL: https://issues.apache.org/jira/browse/HBASE-27571 Project: HBase Issue Type: Improvement Reporter: Bo Cui [https://github.com/apache/hbase/blob/da261344cc55e7812dfe22d86d5fa88c93ed79b9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L234] I used the `Get` to query all put and delete in a column, but I got this error *Cannot specify any column for a raw scan.* Why add this restriction? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-20727) Persist FlushedSequenceId to speed up WAL split after cluster restart
[ https://issues.apache.org/jira/browse/HBASE-20727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266480#comment-17266480 ] Bo Cui commented on HBASE-20727: [~allan163]hi,Can we optimize the feature? 1) the new file is written to the tmp directory. after the new file is written and moved successfully, delete the old file. because new files may fail to be written. 2) can we write new file in batches? If hbase has too many regions, the FlushedSequenceIdFlusher occupies master ChoreService for a long time...like HBASE-25506 > Persist FlushedSequenceId to speed up WAL split after cluster restart > - > > Key: HBASE-20727 > URL: https://issues.apache.org/jira/browse/HBASE-20727 > Project: HBase > Issue Type: New Feature >Affects Versions: 2.0.0 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0-alpha-1 > > Attachments: HBASE-20727.002.patch, HBASE-20727.003.patch, > HBASE-20727.004.patch, HBASE-20727.005.patch, HBASE-20727.patch > > > We use flushedSequenceIdByRegion and storeFlushedSequenceIdsByRegion in > ServerManager to record the latest flushed seqids of regions and stores. So > during log split, we can use seqids stored in those maps to filter out the > edits which do not need to be replayed. But, those maps are not persisted. > After cluster restart or master restart, info of flushed seqids are all lost. > Here I offer a way to persist those info to HDFS, even if master restart, we > can still use those info to filter WAL edits and then to speed up replay. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25506) ServerManager affects MTTR of HMaster
Bo Cui created HBASE-25506: -- Summary: ServerManager affects MTTR of HMaster Key: HBASE-25506 URL: https://issues.apache.org/jira/browse/HBASE-25506 Project: HBase Issue Type: Improvement Components: MTTR Reporter: Bo Cui Attachments: image-2021-01-14-17-44-16-091.png, image-2021-01-14-17-44-42-181.png https://github.com/apache/hbase/blob/3488c44a21612aae1835fc3e91a4a12ed2abb8b7/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L925 If a cluster has N+W regions, this removeDeletedRegionFromLoadedFlushedSequenceIds takes a long time... !image-2021-01-14-17-44-16-091.png! !image-2021-01-14-17-44-42-181.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-25506) ServerManager affects MTTR of HMaster
[ https://issues.apache.org/jira/browse/HBASE-25506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-25506: -- Assignee: Bo Cui > ServerManager affects MTTR of HMaster > - > > Key: HBASE-25506 > URL: https://issues.apache.org/jira/browse/HBASE-25506 > Project: HBase > Issue Type: Improvement > Components: MTTR >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2021-01-14-17-44-16-091.png, > image-2021-01-14-17-44-42-181.png > > > https://github.com/apache/hbase/blob/3488c44a21612aae1835fc3e91a4a12ed2abb8b7/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L925 > If a cluster has N+W regions, this > removeDeletedRegionFromLoadedFlushedSequenceIds takes a long time... > !image-2021-01-14-17-44-16-091.png! > !image-2021-01-14-17-44-42-181.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25506) ServerManager affects MTTR of HMaster
[ https://issues.apache.org/jira/browse/HBASE-25506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-25506: --- Affects Version/s: 3.0.0-alpha-1 > ServerManager affects MTTR of HMaster > - > > Key: HBASE-25506 > URL: https://issues.apache.org/jira/browse/HBASE-25506 > Project: HBase > Issue Type: Improvement > Components: MTTR >Affects Versions: 3.0.0-alpha-1 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2021-01-14-17-44-16-091.png, > image-2021-01-14-17-44-42-181.png > > > https://github.com/apache/hbase/blob/3488c44a21612aae1835fc3e91a4a12ed2abb8b7/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L925 > If a cluster has N+W regions, this > removeDeletedRegionFromLoadedFlushedSequenceIds takes a long time... > !image-2021-01-14-17-44-16-091.png! > !image-2021-01-14-17-44-42-181.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-25483) set the loadMeta log level to debug.
[ https://issues.apache.org/jira/browse/HBASE-25483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-25483: -- Assignee: Bo Cui > set the loadMeta log level to debug. > > > Key: HBASE-25483 > URL: https://issues.apache.org/jira/browse/HBASE-25483 > Project: HBase > Issue Type: Improvement > Components: MTTR, Region Assignment >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > https://github.com/apache/hbase/blob/2444d268901644d90def3fca39505627ff956b40/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java#L167 > test 100w Regions, the log level is info, it takes more than 250 seconds to > load metadata. The log is debug. It takes more than 100 seconds to load > metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25483) set the loadMeta log level to debug.
Bo Cui created HBASE-25483: -- Summary: set the loadMeta log level to debug. Key: HBASE-25483 URL: https://issues.apache.org/jira/browse/HBASE-25483 Project: HBase Issue Type: Improvement Components: MTTR, Region Assignment Affects Versions: 2.2.3, 3.0.0-alpha-1 Reporter: Bo Cui https://github.com/apache/hbase/blob/2444d268901644d90def3fca39505627ff956b40/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java#L167 test 100w Regions, the log level is info, it takes more than 250 seconds to load metadata. The log is debug. It takes more than 100 seconds to load metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-25461) when the cluster has many tables, UI can be opened quickly
[ https://issues.apache.org/jira/browse/HBASE-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-25461: -- Assignee: Bo Cui > when the cluster has many tables, UI can be opened quickly > -- > > Key: HBASE-25461 > URL: https://issues.apache.org/jira/browse/HBASE-25461 > Project: HBase > Issue Type: Improvement > Components: UI >Affects Versions: 3.0.0-alpha-1 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > my cluster has 60+K tables, and UI is opened slowly. > From the following code, we can reduce steps of class conversion > rsgroup.jsp > https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/resources/hbase-webapps/master/rsgroup.jsp#L439 > snapshot.jsp > https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp#L42 > RegionStates.java > https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java#L538 > RSStatusTmpl.jamon > https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon#L47 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25461) when the cluster has many tables, UI can be opened quickly
Bo Cui created HBASE-25461: -- Summary: when the cluster has many tables, UI can be opened quickly Key: HBASE-25461 URL: https://issues.apache.org/jira/browse/HBASE-25461 Project: HBase Issue Type: Improvement Components: UI Affects Versions: 3.0.0-alpha-1 Reporter: Bo Cui my cluster has 60+K tables, and UI is opened slowly. >From the following code, we can reduce steps of class conversion rsgroup.jsp https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/resources/hbase-webapps/master/rsgroup.jsp#L439 snapshot.jsp https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp#L42 RegionStates.java https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java#L538 RSStatusTmpl.jamon https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon#L47 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25447) remoteProc is suspended due to OOM ERROR
[ https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255772#comment-17255772 ] Bo Cui commented on HBASE-25447: 1. the timeoutThread rarely encounter exceptions. and if timeoutThread throws exception, the node of master may have some serious problems, for example, resource leakage, stop master better than timeoutThread retry... 2. In the production env, we have two masters, one active and one standby, and standby might be fine, and HBase can be recovered quickly... so i think , abort master better than timoutThread retry.. > remoteProc is suspended due to OOM ERROR > > > Key: HBASE-25447 > URL: https://issues.apache.org/jira/browse/HBASE-25447 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-12-26-11-49-38-018.png > > > https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317 > If resource leakage occurs due to other components or reasons, > BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the > while (running.get()), and some procs will stuck... > !image-2020-12-26-11-49-38-018.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25447) remoteProc is suspended due to OOM ERROR
[ https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-25447: --- Description: https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317 If resource leakage occurs due to other components or reasons, BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the while (running.get()), and some procs will stuck... !image-2020-12-26-11-49-38-018.png! was: https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317 If a node leaks resources due to other components or reasons, BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the while (running.get()), and some procs will stuck... !image-2020-12-26-11-49-38-018.png! > remoteProc is suspended due to OOM ERROR > > > Key: HBASE-25447 > URL: https://issues.apache.org/jira/browse/HBASE-25447 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-12-26-11-49-38-018.png > > > https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317 > If resource leakage occurs due to other components or reasons, > BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the > while (running.get()), and some procs will stuck... > !image-2020-12-26-11-49-38-018.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (HBASE-25447) remoteProc is suspended due to OOM ERROR
[ https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-25447: --- Comment: was deleted (was: 2 solution : 1. hmaster abort 2. ProcedureDispatcherTimeoutThread does not exit.) > remoteProc is suspended due to OOM ERROR > > > Key: HBASE-25447 > URL: https://issues.apache.org/jira/browse/HBASE-25447 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-12-26-11-49-38-018.png > > > https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317 > If a node leaks resources due to other components or reasons, > BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the > while (running.get()), and some procs will stuck... > !image-2020-12-26-11-49-38-018.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25447) remoteProc is suspended due to OOM ERROR
[ https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254942#comment-17254942 ] Bo Cui commented on HBASE-25447: 2 solution : 1. hmaster abort 2. ProcedureDispatcherTimeoutThread does not exit. > remoteProc is suspended due to OOM ERROR > > > Key: HBASE-25447 > URL: https://issues.apache.org/jira/browse/HBASE-25447 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-12-26-11-49-38-018.png > > > https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317 > If a node leaks resources due to other components or reasons, > BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the > while (running.get()), and some procs will stuck... > !image-2020-12-26-11-49-38-018.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25447) remoteProc is suspended due to OOM ERROR
[ https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-25447: --- Description: https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317 If a node leaks resources due to other components or reasons, BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the while (running.get()), and some procs will stuck... !image-2020-12-26-11-49-38-018.png! was: https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317 If a node leaks resources due to other components or reasons, BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the while (running.get()), and some procs will stuck... !image-2020-12-26-11-49-38-018.png! > remoteProc is suspended due to OOM ERROR > > > Key: HBASE-25447 > URL: https://issues.apache.org/jira/browse/HBASE-25447 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-12-26-11-49-38-018.png > > > https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317 > If a node leaks resources due to other components or reasons, > BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the > while (running.get()), and some procs will stuck... > !image-2020-12-26-11-49-38-018.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-25447) remoteProc is suspended due to OOM ERROR
[ https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-25447: -- Assignee: Bo Cui > remoteProc is suspended due to OOM ERROR > > > Key: HBASE-25447 > URL: https://issues.apache.org/jira/browse/HBASE-25447 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-12-26-11-49-38-018.png > > > https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317 > If a node leaks resources due to other components or reasons, > BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the > while (running.get()), and some procs will stuck... > !image-2020-12-26-11-49-38-018.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25447) remoteProc is suspended due to OOM ERROR
Bo Cui created HBASE-25447: -- Summary: remoteProc is suspended due to OOM ERROR Key: HBASE-25447 URL: https://issues.apache.org/jira/browse/HBASE-25447 Project: HBase Issue Type: Bug Components: proc-v2 Affects Versions: 2.2.3, 3.0.0-alpha-1 Reporter: Bo Cui Attachments: image-2020-12-26-11-49-38-018.png https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317 If a node leaks resources due to other components or reasons, BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the while (running.get()), and some procs will stuck... !image-2020-12-26-11-49-38-018.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in old
[ https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248504#comment-17248504 ] Bo Cui commented on HBASE-23340: [~zhangduo][~vjasani] thanks for review [~zhangduo] i have submitted new PR to branch-2 > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB) > - > > Key: HBASE-23340 > URL: https://issues.apache.org/jira/browse/HBASE-23340 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: jackylau >Assignee: Bo Cui >Priority: Major > Attachments: Snipaste_2019-11-21_10-39-25.png, > Snipaste_2019-11-21_14-10-36.png > > > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB). > !Snipaste_2019-11-21_10-39-25.png! > > !Snipaste_2019-11-21_14-10-36.png! > > we can solve it by following : > 1) increase the session timeout(but i think it is not a good idea. because we > do not know how long to set is suitable) > 2) close the hbase replication. It is not a good idea too, when our user uses > this feature > 3) we need add retry times, for example when it has already happened three > times, we set the ReplicationLogCleaner and SnapShotCleaner stop > that is all my ideas, i do not konw it is suitable, If it is suitable, could > i commit a PR? > Does anynode have a good idea. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-24395) ServerName#getHostname() is case sensitive
[ https://issues.apache.org/jira/browse/HBASE-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-24395: -- Assignee: Bo Cui > ServerName#getHostname() is case sensitive > -- > > Key: HBASE-24395 > URL: https://issues.apache.org/jira/browse/HBASE-24395 > Project: HBase > Issue Type: Sub-task > Components: Balancer >Affects Versions: 1.3.1 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: HBase-24395.patch, image-2020-05-18-17-42-57-119.png > > > ServerName calss,the getServerName(String hostName, int port, long > startcode),equals and compareTo are case insensitive, but getHostname() is > case sensitive. > if hostName is HOSTNAME1, ServerName is hostname1,1,1589615319931, and > getHostname() returns HOSTNAME1. > and then BaseLoadBalancer#retainAssignment() uses ServerName#getHostname(), > all keys of serversByHostname are > upperCase(HOSTNAME1,HOSTNAME2,HOSTNAME3,HOSTNAME4...) from > ServerManager#createDestinationServersList, but oldServerName.getHostname() > is lowerCase(hostname1,hostname2,hostname3...) from walLog dir. > !image-2020-05-18-17-42-57-119.png! > and finally...all region of old ServerName will be assigned to random hosts -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24395) ServerName#getHostname() is case sensitive
[ https://issues.apache.org/jira/browse/HBASE-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236622#comment-17236622 ] Bo Cui commented on HBASE-24395: [https://github.com/apache/hbase/pull/2690] update > ServerName#getHostname() is case sensitive > -- > > Key: HBASE-24395 > URL: https://issues.apache.org/jira/browse/HBASE-24395 > Project: HBase > Issue Type: Sub-task > Components: Balancer >Affects Versions: 1.3.1 >Reporter: Bo Cui >Priority: Major > Attachments: HBase-24395.patch, image-2020-05-18-17-42-57-119.png > > > ServerName calss,the getServerName(String hostName, int port, long > startcode),equals and compareTo are case insensitive, but getHostname() is > case sensitive. > if hostName is HOSTNAME1, ServerName is hostname1,1,1589615319931, and > getHostname() returns HOSTNAME1. > and then BaseLoadBalancer#retainAssignment() uses ServerName#getHostname(), > all keys of serversByHostname are > upperCase(HOSTNAME1,HOSTNAME2,HOSTNAME3,HOSTNAME4...) from > ServerManager#createDestinationServersList, but oldServerName.getHostname() > is lowerCase(hostname1,hostname2,hostname3...) from walLog dir. > !image-2020-05-18-17-42-57-119.png! > and finally...all region of old ServerName will be assigned to random hosts -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236614#comment-17236614 ] Bo Cui commented on HBASE-24924: {quote} You still do not answer my question... How could this happen without data corruption from outside? {quote} If the data was not corrupted, the problem would not have occurred. But the data corruption exists,and we don't know how it is damaged. we have a lot of cluster, and should consider how to automatically recover. > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldWA
[ https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-23340: --- Status: Patch Available (was: Open) > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB) > - > > Key: HBASE-23340 > URL: https://issues.apache.org/jira/browse/HBASE-23340 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 2.2.3, 3.0.0-alpha-1 >Reporter: jackylau >Assignee: Bo Cui >Priority: Major > Attachments: Snipaste_2019-11-21_10-39-25.png, > Snipaste_2019-11-21_14-10-36.png > > > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB). > !Snipaste_2019-11-21_10-39-25.png! > > !Snipaste_2019-11-21_14-10-36.png! > > we can solve it by following : > 1) increase the session timeout(but i think it is not a good idea. because we > do not know how long to set is suitable) > 2) close the hbase replication. It is not a good idea too, when our user uses > this feature > 3) we need add retry times, for example when it has already happened three > times, we set the ReplicationLogCleaner and SnapShotCleaner stop > that is all my ideas, i do not konw it is suitable, If it is suitable, could > i commit a PR? > Does anynode have a good idea. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25311) ui throws NPE
[ https://issues.apache.org/jira/browse/HBASE-25311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-25311: --- Status: Patch Available (was: Open) > ui throws NPE > - > > Key: HBASE-25311 > URL: https://issues.apache.org/jira/browse/HBASE-25311 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.3, 3.0.0-alpha-1 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > https://github.com/apache/hbase/blob/eca904e0fb438461a8da3f37cea3eaf496988be9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L3624 > if rs has invalid znode, and restart master, ui will throw NPE. > i encountered this problem during the upgrade. > workaround: restart HBase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25317) [github]rename HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf
[ https://issues.apache.org/jira/browse/HBASE-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-25317: --- Description: use git pull to obtain an exception, because filename has ':' !image-2020-11-21-14-48-23-794.png! workaround:git config core.protectNTFS false [~stack] i think we can ranme filename... . was: use git pull to obtain an exception !image-2020-11-21-14-48-23-794.png! workaround:git config core.protectNTFS false [~stack] i think we can ranme filename... . > [github]rename HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf > -- > > Key: HBASE-25317 > URL: https://issues.apache.org/jira/browse/HBASE-25317 > Project: HBase > Issue Type: Bug >Reporter: Bo Cui >Priority: Minor > Attachments: image-2020-11-21-14-48-23-794.png > > > use git pull to obtain an exception, because filename has ':' > !image-2020-11-21-14-48-23-794.png! > workaround:git config core.protectNTFS false > [~stack] i think we can ranme filename... > . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25317) [github]rename HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf
Bo Cui created HBASE-25317: -- Summary: [github]rename HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf Key: HBASE-25317 URL: https://issues.apache.org/jira/browse/HBASE-25317 Project: HBase Issue Type: Bug Reporter: Bo Cui Attachments: image-2020-11-21-14-48-23-794.png use git pull to obtain an exception !image-2020-11-21-14-48-23-794.png! workaround:git config core.protectNTFS false [~stack] i think we can ranme filename... . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldW
[ https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-23340: -- Assignee: Bo Cui (was: jackylau) > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB) > - > > Key: HBASE-23340 > URL: https://issues.apache.org/jira/browse/HBASE-23340 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: jackylau >Assignee: Bo Cui >Priority: Major > Attachments: Snipaste_2019-11-21_10-39-25.png, > Snipaste_2019-11-21_14-10-36.png > > > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB). > !Snipaste_2019-11-21_10-39-25.png! > > !Snipaste_2019-11-21_14-10-36.png! > > we can solve it by following : > 1) increase the session timeout(but i think it is not a good idea. because we > do not know how long to set is suitable) > 2) close the hbase replication. It is not a good idea too, when our user uses > this feature > 3) we need add retry times, for example when it has already happened three > times, we set the ReplicationLogCleaner and SnapShotCleaner stop > that is all my ideas, i do not konw it is suitable, If it is suitable, could > i commit a PR? > Does anynode have a good idea. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldWA
[ https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-23340: --- Affects Version/s: 3.0.0-alpha-1 2.2.3 > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB) > - > > Key: HBASE-23340 > URL: https://issues.apache.org/jira/browse/HBASE-23340 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: jackylau >Assignee: jackylau >Priority: Major > Attachments: Snipaste_2019-11-21_10-39-25.png, > Snipaste_2019-11-21_14-10-36.png > > > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB). > !Snipaste_2019-11-21_10-39-25.png! > > !Snipaste_2019-11-21_14-10-36.png! > > we can solve it by following : > 1) increase the session timeout(but i think it is not a good idea. because we > do not know how long to set is suitable) > 2) close the hbase replication. It is not a good idea too, when our user uses > this feature > 3) we need add retry times, for example when it has already happened three > times, we set the ReplicationLogCleaner and SnapShotCleaner stop > that is all my ideas, i do not konw it is suitable, If it is suitable, could > i commit a PR? > Does anynode have a good idea. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25311) ui throws NPE
[ https://issues.apache.org/jira/browse/HBASE-25311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-25311: --- Summary: ui throws NPE (was: hbase ui throws NPE) > ui throws NPE > - > > Key: HBASE-25311 > URL: https://issues.apache.org/jira/browse/HBASE-25311 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > https://github.com/apache/hbase/blob/eca904e0fb438461a8da3f37cea3eaf496988be9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L3624 > if rs has invalid znode, and restart master, ui will throw NPE. > i encountered this problem during the upgrade. > workaround: restart HBase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-25311) hbase ui throws NPE
[ https://issues.apache.org/jira/browse/HBASE-25311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-25311: -- Assignee: Bo Cui > hbase ui throws NPE > --- > > Key: HBASE-25311 > URL: https://issues.apache.org/jira/browse/HBASE-25311 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > https://github.com/apache/hbase/blob/eca904e0fb438461a8da3f37cea3eaf496988be9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L3624 > if rs has invalid znode, and restart master, ui will throw NPE. > i encountered this problem during the upgrade. > workaround: restart HBase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25311) hbase ui throws NPE
Bo Cui created HBASE-25311: -- Summary: hbase ui throws NPE Key: HBASE-25311 URL: https://issues.apache.org/jira/browse/HBASE-25311 Project: HBase Issue Type: Bug Affects Versions: 2.2.3, 3.0.0-alpha-1 Reporter: Bo Cui https://github.com/apache/hbase/blob/eca904e0fb438461a8da3f37cea3eaf496988be9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L3624 if rs has invalid znode, and restart master, ui will throw NPE. i encountered this problem during the upgrade. workaround: restart HBase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in old
[ https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236011#comment-17236011 ] Bo Cui commented on HBASE-23340: I'll submit the PR. > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB) > - > > Key: HBASE-23340 > URL: https://issues.apache.org/jira/browse/HBASE-23340 > Project: HBase > Issue Type: Improvement > Components: master >Reporter: jackylau >Assignee: jackylau >Priority: Major > Attachments: Snipaste_2019-11-21_10-39-25.png, > Snipaste_2019-11-21_14-10-36.png > > > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB). > !Snipaste_2019-11-21_10-39-25.png! > > !Snipaste_2019-11-21_14-10-36.png! > > we can solve it by following : > 1) increase the session timeout(but i think it is not a good idea. because we > do not know how long to set is suitable) > 2) close the hbase replication. It is not a good idea too, when our user uses > this feature > 3) we need add retry times, for example when it has already happened three > times, we set the ReplicationLogCleaner and SnapShotCleaner stop > that is all my ideas, i do not konw it is suitable, If it is suitable, could > i commit a PR? > Does anynode have a good idea. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25092) RSGroupBalancer#assignments lost some regionPlans
[ https://issues.apache.org/jira/browse/HBASE-25092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-25092: --- Status: Patch Available (was: Open) > RSGroupBalancer#assignments lost some regionPlans > - > > Key: HBASE-25092 > URL: https://issues.apache.org/jira/browse/HBASE-25092 > Project: HBase > Issue Type: Bug > Components: rsgroup >Affects Versions: 2.2.3, 2.3.1 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216 > when fallback is enabled, servers does not contain the current group's rs, > and contains the rs of other group, region will be assigend to other group, > but assignments already contains targetRS, and then assignments.putAll > overwrites old entry > {code:java} > this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList) > .forEach((serverName, regionInfos) -> { > assignments.computeIfAbsent(serverName, s -> new > ArrayList<>()) > .addAll(regionInfos); > }); > {code} > the issue exists only in the branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-25111) Admin supports multi region merge
[ https://issues.apache.org/jira/browse/HBASE-25111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-25111: -- Assignee: Bo Cui > Admin supports multi region merge > - > > Key: HBASE-25111 > URL: https://issues.apache.org/jira/browse/HBASE-25111 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > https://github.com/apache/hbase/blob/68b56beab744e983df0877eec9f576ef884a2807/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L889 > from masterRpcServices and mergeProc, master supports multi region merge... > but admin dont support... > we can enhance it -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25111) Admin supports multi region merge
Bo Cui created HBASE-25111: -- Summary: Admin supports multi region merge Key: HBASE-25111 URL: https://issues.apache.org/jira/browse/HBASE-25111 Project: HBase Issue Type: Improvement Affects Versions: 2.2.3, 3.0.0-alpha-1 Reporter: Bo Cui https://github.com/apache/hbase/blob/68b56beab744e983df0877eec9f576ef884a2807/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L889 from masterRpcServices and mergeProc, master supports multi region merge... but admin dont support... we can enhance it -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-25092) RSGroupBalancer#assignments lost some regionPlans
[ https://issues.apache.org/jira/browse/HBASE-25092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-25092: -- Assignee: Bo Cui > RSGroupBalancer#assignments lost some regionPlans > - > > Key: HBASE-25092 > URL: https://issues.apache.org/jira/browse/HBASE-25092 > Project: HBase > Issue Type: Bug > Components: rsgroup >Affects Versions: 2.3.1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216 > when fallback is enabled, servers does not contain the current group's rs, > and contains the rs of other group, region will be assigend to other group, > but assignments already contains targetRS, and then assignments.putAll > overwrites old entry > {code:java} > this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList) > .forEach((serverName, regionInfos) -> { > assignments.computeIfAbsent(serverName, s -> new > ArrayList<>()) > .addAll(regionInfos); > }); > {code} > the issue exists only in the branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25093) the RSGroupBasedLoadBalancer#retainAssignment throws NPE
Bo Cui created HBASE-25093: -- Summary: the RSGroupBasedLoadBalancer#retainAssignment throws NPE Key: HBASE-25093 URL: https://issues.apache.org/jira/browse/HBASE-25093 Project: HBase Issue Type: Bug Components: rsgroup Affects Versions: 2.2.3, 2.3.1, 3.0.0-alpha-1 Reporter: Bo Cui when BaseLoadBalancer# https://github.com/apache/hbase/blob/8bfa2cb2eedcf050b26a28961e1b77dbf3cd8c95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java#L1433 If the result of the BaseLoadBalancer#retainAssignment is null, the RSGroupBasedLoadBalancer#retainAssignment will throw NPE. https://github.com/apache/hbase/blob/8bfa2cb2eedcf050b26a28961e1b77dbf3cd8c95/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L206 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25092) RSGroupBalancer#assignments lost some regionPlans
[ https://issues.apache.org/jira/browse/HBASE-25092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-25092: --- Description: https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216 when fallback is enabled, servers does not contain the current group's rs, and contains the rs of other group, region will be assigend to other group, but assignments already contains targetRS, and then assignments.putAll overwrites old entry {code:java} this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList) .forEach((serverName, regionInfos) -> { assignments.computeIfAbsent(serverName, s -> new ArrayList<>()) .addAll(regionInfos); }); {code} the issue exists only in the branch-2. was: https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216 when fallbak is enabled, servers does not contain the current group's rs, and contains the rs of other group, region will be assigend to other group, but assignments already contains targetRS, and then assignments.putAll overwrites old entry {code:java} this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList) .forEach((serverName, regionInfos) -> { assignments.computeIfAbsent(serverName, s -> new ArrayList<>()) .addAll(regionInfos); }); {code} the issue exists only in the branch-2. > RSGroupBalancer#assignments lost some regionPlans > - > > Key: HBASE-25092 > URL: https://issues.apache.org/jira/browse/HBASE-25092 > Project: HBase > Issue Type: Bug > Components: rsgroup >Affects Versions: 2.3.1, 2.2.3 >Reporter: Bo Cui >Priority: Major > > https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216 > when fallback is enabled, servers does not contain the current group's rs, > and contains the rs of other group, region will be assigend to other group, > but assignments already contains targetRS, and then assignments.putAll > overwrites old entry > {code:java} > this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList) > .forEach((serverName, regionInfos) -> { > assignments.computeIfAbsent(serverName, s -> new > ArrayList<>()) > .addAll(regionInfos); > }); > {code} > the issue exists only in the branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25092) RSGroupBalancer#assignments lost some regionPlans
Bo Cui created HBASE-25092: -- Summary: RSGroupBalancer#assignments lost some regionPlans Key: HBASE-25092 URL: https://issues.apache.org/jira/browse/HBASE-25092 Project: HBase Issue Type: Bug Components: rsgroup Affects Versions: 2.2.3, 2.3.1 Reporter: Bo Cui https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216 when fallbak is enabled, servers does not contain the current group's rs, and contains the rs of other group, region will be assigend to other group, but assignments already contains targetRS, and then assignments.putAll overwrites old entry {code:java} this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList) .forEach((serverName, regionInfos) -> { assignments.computeIfAbsent(serverName, s -> new ArrayList<>()) .addAll(regionInfos); }); {code} the issue exists only in the branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24962) Optimize BufferNode Lock
[ https://issues.apache.org/jira/browse/HBASE-24962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24962: --- Status: Patch Available (was: Open) > Optimize BufferNode Lock > > > Key: HBASE-24962 > URL: https://issues.apache.org/jira/browse/HBASE-24962 > Project: HBase > Issue Type: Bug > Components: MTTR >Affects Versions: 2.2.3, 3.0.0-alpha-1 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L373] > during startup, a large number of OpenRegionProcedures are generated, which > are added to the BufferNode. However, the BufferNode has some "synchronized" > methods, These methods may affect MTTR -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-24962) Optimize BufferNode Lock
[ https://issues.apache.org/jira/browse/HBASE-24962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-24962: -- Assignee: Bo Cui > Optimize BufferNode Lock > > > Key: HBASE-24962 > URL: https://issues.apache.org/jira/browse/HBASE-24962 > Project: HBase > Issue Type: Bug > Components: MTTR >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L373] > during startup, a large number of OpenRegionProcedures are generated, which > are added to the BufferNode. However, the BufferNode has some "synchronized" > methods, These methods may affect MTTR -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24960) reduce invalid subprocedure task
[ https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24960: --- Status: Patch Available (was: Open) > reduce invalid subprocedure task > > > Key: HBASE-24960 > URL: https://issues.apache.org/jira/browse/HBASE-24960 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.2.3, 3.0.0-alpha-1 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165] > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146] > > if involvedRegions is null or empty, rs should skip subprocedure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-24960) reduce invalid subprocedure task
[ https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-24960: -- Assignee: Bo Cui > reduce invalid subprocedure task > > > Key: HBASE-24960 > URL: https://issues.apache.org/jira/browse/HBASE-24960 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165] > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146] > > if involvedRegions is null or empty, rs should skip subprocedure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24960) reduce invalid subprocedure task
[ https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24960: --- Priority: Major (was: Minor) > reduce invalid subprocedure task > > > Key: HBASE-24960 > URL: https://issues.apache.org/jira/browse/HBASE-24960 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Major > > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165] > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146] > > if involvedRegions is null or empty, rs should skip subprocedure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24680) Refactor the checkAndMutate code on the server side
[ https://issues.apache.org/jira/browse/HBASE-24680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24680: --- Description: # Refactor the checkAndMutate code on the server side by using the CheckAndMutate class (introduced in HBASE-8458) and the CheckAndMutateResult class (introduced in HBASE-24650). (was: Refactor the checkAndMutate code on the server side by using the CheckAndMutate class (introduced in HBASE-8458) and the CheckAndMutateResult class (introduced in HBASE-24650).) > Refactor the checkAndMutate code on the server side > --- > > Key: HBASE-24680 > URL: https://issues.apache.org/jira/browse/HBASE-24680 > Project: HBase > Issue Type: Sub-task >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0 > > > # Refactor the checkAndMutate code on the server side by using the > CheckAndMutate class (introduced in HBASE-8458) and the CheckAndMutateResult > class (introduced in HBASE-24650). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-24925) SCP reduce unnecessary get requests
[ https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186924#comment-17186924 ] Bo Cui edited comment on HBASE-24925 at 8/29/20, 10:50 AM: --- !image-2020-08-29-17-46-00-900.png! If the thread pool is not used, load 10k tablestate needs 170+s was (Author: bo cui): !image-2020-08-29-17-46-00-900.png! If the thread pool is not used, load tablestate needs 170+s > SCP reduce unnecessary get requests > --- > > Key: HBASE-24925 > URL: https://issues.apache.org/jira/browse/HBASE-24925 > Project: HBase > Issue Type: Improvement > Components: MTTR >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-29-17-46-00-900.png > > > SCP should reduce unnecessary Get request > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520] > during startup, the tableNam2State of tableStateManager is not loading > tableState data form metaTable yet. if procThread num is 50 and hbase has > 10K tables, in the worst case, the master needs to query meta table 500K > times(50*10K. and the regions that all SCPs simultaneously check tableState > belong to the same table ) > > i think master can reduce Get request, and AM#loadMeta can load regions and > all tables through asynchronous threads. > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24925) SCP reduce unnecessary get requests
[ https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186924#comment-17186924 ] Bo Cui commented on HBASE-24925: !image-2020-08-29-17-46-00-900.png! If the thread pool is not used, load tablestate needs 170+s > SCP reduce unnecessary get requests > --- > > Key: HBASE-24925 > URL: https://issues.apache.org/jira/browse/HBASE-24925 > Project: HBase > Issue Type: Improvement > Components: MTTR >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-29-17-46-00-900.png > > > SCP should reduce unnecessary Get request > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520] > during startup, the tableNam2State of tableStateManager is not loading > tableState data form metaTable yet. if procThread num is 50 and hbase has > 10K tables, in the worst case, the master needs to query meta table 500K > times(50*10K. and the regions that all SCPs simultaneously check tableState > belong to the same table ) > > i think master can reduce Get request, and AM#loadMeta can load regions and > all tables through asynchronous threads. > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24925) SCP reduce unnecessary get requests
[ https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24925: --- Attachment: image-2020-08-29-17-46-00-900.png > SCP reduce unnecessary get requests > --- > > Key: HBASE-24925 > URL: https://issues.apache.org/jira/browse/HBASE-24925 > Project: HBase > Issue Type: Improvement > Components: MTTR >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-29-17-46-00-900.png > > > SCP should reduce unnecessary Get request > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520] > during startup, the tableNam2State of tableStateManager is not loading > tableState data form metaTable yet. if procThread num is 50 and hbase has > 10K tables, in the worst case, the master needs to query meta table 500K > times(50*10K. and the regions that all SCPs simultaneously check tableState > belong to the same table ) > > i think master can reduce Get request, and AM#loadMeta can load regions and > all tables through asynchronous threads. > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24937) table.rb use LocalDateTime to replace Instant
[ https://issues.apache.org/jira/browse/HBASE-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24937: --- Status: Patch Available (was: Open) > table.rb use LocalDateTime to replace Instant > - > > Key: HBASE-24937 > URL: https://issues.apache.org/jira/browse/HBASE-24937 > Project: HBase > Issue Type: Improvement > Components: shell >Affects Versions: 2.2.3, 3.0.0-alpha-1 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Minor > > https://github.com/apache/hbase/blob/9f62a82334574b135f8e220b024981df64fab811/hbase-shell/src/main/ruby/hbase/table.rb#L754 > we can use timeZone to improve readability. > {code:java} > return java.time.LocalDateTime.ofInstant(instant, > java.time.ZoneId.systemDefault()).toString > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24962) Optimize BufferNode Lock
Bo Cui created HBASE-24962: -- Summary: Optimize BufferNode Lock Key: HBASE-24962 URL: https://issues.apache.org/jira/browse/HBASE-24962 Project: HBase Issue Type: Bug Components: MTTR Affects Versions: 2.2.3, 3.0.0-alpha-1 Reporter: Bo Cui [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L373] during startup, a large number of OpenRegionProcedures are generated, which are added to the BufferNode. However, the BufferNode has some "synchronized" methods, These methods may affect MTTR -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-24937) table.rb use LocalDateTime to replace Instant
[ https://issues.apache.org/jira/browse/HBASE-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-24937: -- Assignee: Bo Cui > table.rb use LocalDateTime to replace Instant > - > > Key: HBASE-24937 > URL: https://issues.apache.org/jira/browse/HBASE-24937 > Project: HBase > Issue Type: Improvement > Components: shell >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Minor > > https://github.com/apache/hbase/blob/9f62a82334574b135f8e220b024981df64fab811/hbase-shell/src/main/ruby/hbase/table.rb#L754 > we can use timeZone to improve readability. > {code:java} > return java.time.LocalDateTime.ofInstant(instant, > java.time.ZoneId.systemDefault()).toString > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-24925) SCP reduce unnecessary get requests
[ https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-24925: -- Assignee: Bo Cui > SCP reduce unnecessary get requests > --- > > Key: HBASE-24925 > URL: https://issues.apache.org/jira/browse/HBASE-24925 > Project: HBase > Issue Type: Improvement > Components: MTTR >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > SCP should reduce unnecessary Get request > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520] > during startup, the tableNam2State of tableStateManager is not loading > tableState data form metaTable yet. if procThread num is 50 and hbase has > 10K tables, in the worst case, the master needs to query meta table 500K > times(50*10K. and the regions that all SCPs simultaneously check tableState > belong to the same table ) > > i think master can reduce Get request, and AM#loadMeta can load regions and > all tables through asynchronous threads. > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24960) reduce invalid subprocedure task
[ https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185731#comment-17185731 ] Bo Cui commented on HBASE-24960: [~wenfeiyi666] hi, u want to fix the issue? but i am alread working with the issue, will be raising PR soon. > reduce invalid subprocedure task > > > Key: HBASE-24960 > URL: https://issues.apache.org/jira/browse/HBASE-24960 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: wenfeiyi666 >Priority: Minor > > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165] > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146] > > if involvedRegions is null or empty, rs should skip subprocedure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24960) reduce invalid subprocedure task
[ https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24960: --- Description: [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165] [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146] if involvedRegions is null or empty, rs should skip subprocedure. was: [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165] if involvedRegions is null or empty, rs should skip subprocedure. > reduce invalid subprocedure task > > > Key: HBASE-24960 > URL: https://issues.apache.org/jira/browse/HBASE-24960 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Minor > > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165] > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146] > > if involvedRegions is null or empty, rs should skip subprocedure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24960) reduce invalid subprocedure task
[ https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24960: --- Summary: reduce invalid subprocedure task (was: reduce invalid snapshot task) > reduce invalid subprocedure task > > > Key: HBASE-24960 > URL: https://issues.apache.org/jira/browse/HBASE-24960 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Minor > > [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165] > > if involvedRegions is null or empty, rs should skip subprocedure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24960) reduce invalid snapshot task
Bo Cui created HBASE-24960: -- Summary: reduce invalid snapshot task Key: HBASE-24960 URL: https://issues.apache.org/jira/browse/HBASE-24960 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 2.2.3, 3.0.0-alpha-1 Reporter: Bo Cui [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165] if involvedRegions is null or empty, rs should skip subprocedure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24939) Invalid addFsRegionsMissingInMeta -d
Bo Cui created HBASE-24939: -- Summary: Invalid addFsRegionsMissingInMeta -d Key: HBASE-24939 URL: https://issues.apache.org/jira/browse/HBASE-24939 Project: HBase Issue Type: Bug Components: hbck2 Affects Versions: 2.2.3, 3.0.0-alpha-1 Reporter: Bo Cui [https://github.com/apache/hbase-operator-tools/blob/87878aada3354514050f5a2df11f27b317efd42d/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java#L436] -d is invalid. addFsRegionsMissingInMeta does not use -d -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24925) SCP reduce unnecessary get requests
[ https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24925: --- Description: SCP should reduce unnecessary Get request [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520] during startup, the tableNam2State of tableStateManager is not loading tableState data form metaTable yet. if procThread num is 50 and hbase has 10K tables, in the worst case, the master needs to query meta table 500K times(50*10K. and the regions that all SCPs simultaneously check tableState belong to the same table ) i think master can reduce Get request, and AM#loadMeta can load regions and all tables through asynchronous threads. [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532] was: SCP should reduce unnecessary Get request [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520] during startup, the tableNam2State of tableStateManager is not loading tableState data form metaTable yet. if procThread num is 50 and hbase has 10K tables, in the worst case, the master needs to query meta table 500K times(50*10K. and the regions that all SCPs simultaneously check tableState belong to the same table ) i think master can reduce Get request, and AM#loadMeta can load regions and all tables [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532] > SCP reduce unnecessary get requests > --- > > Key: HBASE-24925 > URL: https://issues.apache.org/jira/browse/HBASE-24925 > Project: HBase > Issue Type: Improvement > Components: MTTR >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Major > > SCP should reduce unnecessary Get request > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520] > during startup, the tableNam2State of tableStateManager is not loading > tableState data form metaTable yet. if procThread num is 50 and hbase has > 10K tables, in the worst case, the master needs to query meta table 500K > times(50*10K. and the regions that all SCPs simultaneously check tableState > belong to the same table ) > > i think master can reduce Get request, and AM#loadMeta can load regions and > all tables through asynchronous threads. > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24937) table.rb use LocalDateTime to replace Instant
[ https://issues.apache.org/jira/browse/HBASE-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24937: --- Affects Version/s: 3.0.0-alpha-1 2.2.3 > table.rb use LocalDateTime to replace Instant > - > > Key: HBASE-24937 > URL: https://issues.apache.org/jira/browse/HBASE-24937 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Minor > > https://github.com/apache/hbase/blob/9f62a82334574b135f8e220b024981df64fab811/hbase-shell/src/main/ruby/hbase/table.rb#L754 > we can use timeZone to improve readability. > {code:java} > return java.time.LocalDateTime.ofInstant(instant, > java.time.ZoneId.systemDefault()).toString > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24937) table.rb use LocalDateTime to replace Instant
[ https://issues.apache.org/jira/browse/HBASE-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24937: --- Component/s: shell > table.rb use LocalDateTime to replace Instant > - > > Key: HBASE-24937 > URL: https://issues.apache.org/jira/browse/HBASE-24937 > Project: HBase > Issue Type: Improvement > Components: shell >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Minor > > https://github.com/apache/hbase/blob/9f62a82334574b135f8e220b024981df64fab811/hbase-shell/src/main/ruby/hbase/table.rb#L754 > we can use timeZone to improve readability. > {code:java} > return java.time.LocalDateTime.ofInstant(instant, > java.time.ZoneId.systemDefault()).toString > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24937) table.rb use LocalDateTime to replace Instant
Bo Cui created HBASE-24937: -- Summary: table.rb use LocalDateTime to replace Instant Key: HBASE-24937 URL: https://issues.apache.org/jira/browse/HBASE-24937 Project: HBase Issue Type: Improvement Reporter: Bo Cui https://github.com/apache/hbase/blob/9f62a82334574b135f8e220b024981df64fab811/hbase-shell/src/main/ruby/hbase/table.rb#L754 we can use timeZone to improve readability. {code:java} return java.time.LocalDateTime.ofInstant(instant, java.time.ZoneId.systemDefault()).toString {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181776#comment-17181776 ] Bo Cui edited comment on HBASE-24924 at 8/21/20, 10:00 AM: --- yeah, The root cause is the same:meta znode doest not exist but after {quote}next startup, if znode doest not exist, hbase will recreate new IniMetaProcedure {quote} hbase can be started. after {quote}if procWAL has completed IniMetaProcedure and znode doest not exist {quote} hbase stuck was (Author: bo cui): yeah, The root cause is the same:meta znode doest not exist but after {quote}next startup, if znode doest not exist, hbase will recreate new IniMetaProcedure {quote} hbase is ok after {quote}if procWAL has completed IniMetaProcedure and znode doest not exist {quote} hbase stuck > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181776#comment-17181776 ] Bo Cui commented on HBASE-24924: yeah, The root cause is the same:meta znode doest not exist but after {quote}next startup, if znode doest not exist, hbase will recreate new IniMetaProcedure {quote} hbase is ok after {quote}if procWAL has completed IniMetaProcedure and znode doest not exist {quote} hbase stuck > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181757#comment-17181757 ] Bo Cui commented on HBASE-24924: thx [~zhangduo] my question and the PR have a little different in normal scenario: 1、first startup, hbase will create IniMetaProcedure 2、next startup, if znode doest not exist, hbase will recreate new IniMetaProcedure my question: if procWAL has completed IniMetaProcedure and znode doest not exist, hbase should startup like 2, hbase should not stuck > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181738#comment-17181738 ] Bo Cui commented on HBASE-24924: {quote} The problem is who deletes the meta znode? Why? {quote} actual production env ZK data my be corrupted... ZK data lost...manual deletion ...reinstall zk ...etc... > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181711#comment-17181711 ] Bo Cui commented on HBASE-24924: but when we knew znode had been deleted, hbase had started and submitted the InitMetaProcedure and recreate znode. So I think, since we don't know about it in time, we should minimize its impact. After a InitMetaProcedure is resubmitted, how to ensure that the meta region is not deleted and not stuck? > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181698#comment-17181698 ] Bo Cui commented on HBASE-24924: {quote}Anyway, this is not a normal operation, we should deal with this through HBCK2, not in the normal code path.{quote} when startup, How do master avoid submit InitMetaProcedure if meta is already there > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181671#comment-17181671 ] Bo Cui commented on HBASE-24924: {quote}I think the design here is to not submit InitMetaProcedure if meta is already there...{quote} if znode does not exist, how assign meta? > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181661#comment-17181661 ] Bo Cui edited comment on HBASE-24924 at 8/21/20, 7:04 AM: -- {quote}I assume in a normal deployment, we should not delete the meta znode? The guys from AWS has a scenario where they restart a cluster with nothing on zookeeper, then the problem is the InitMetaProcedure will delete the meta region... {quote} yeah, 2.2.3 and branch-master InitMetaProcedure are different. the 2.2.3 InitMetaProcedure does not create metaTable, so master can resubmit InitMetaProcedure. but in branch-master,master executes only INIT_META_ASSIGN_META(if meta dir already exists, InitMetaProcedure should skip INIT_META_WRITE_FS_LAYOUT and INIT_META_CREATE_NAMESPACES) was (Author: bo cui): {quote}I assume in a normal deployment, we should not delete the meta znode? The guys from AWS has a scenario where they restart a cluster with nothing on zookeeper, then the problem is the InitMetaProcedure will delete the meta region... {quote} yeah, 2.2.3 and master InitMetaProcedure are different. the 2.2.3 InitMetaProcedure does not create metaTable, so master can resubmit InitMetaProcedure. but in branch-master,master executes only INIT_META_ASSIGN_META(if meta dir already exists, InitMetaProcedure should skip INIT_META_WRITE_FS_LAYOUT and INIT_META_CREATE_NAMESPACES) > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181661#comment-17181661 ] Bo Cui edited comment on HBASE-24924 at 8/21/20, 7:03 AM: -- {quote}I assume in a normal deployment, we should not delete the meta znode? The guys from AWS has a scenario where they restart a cluster with nothing on zookeeper, then the problem is the InitMetaProcedure will delete the meta region... {quote} yeah, 2.2.3 and master InitMetaProcedure are different. the 2.2.3 InitMetaProcedure does not create metaTable, so master can resubmit InitMetaProcedure. but in branch-master,master executes only INIT_META_ASSIGN_META(if meta dir already exists, InitMetaProcedure should skip INIT_META_WRITE_FS_LAYOUT and INIT_META_CREATE_NAMESPACES) was (Author: bo cui): {{{quote}}}I assume in a normal deployment, we should not delete the meta znode? The guys from AWS has a scenario where they restart a cluster with nothing on zookeeper, then the problem is the InitMetaProcedure will delete the meta region... {{{quote}}} yeah, 2.2.3 and master InitMetaProcedure are different. the 2.2.3 InitMetaProcedure does not create metaTable, so master can resubmit InitMetaProcedure. but in branch-master,master executes only INIT_META_ASSIGN_META(if meta dir already exists, InitMetaProcedure should skip INIT_META_WRITE_FS_LAYOUT and INIT_META_CREATE_NAMESPACES) > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181661#comment-17181661 ] Bo Cui commented on HBASE-24924: {{{quote}}}I assume in a normal deployment, we should not delete the meta znode? The guys from AWS has a scenario where they restart a cluster with nothing on zookeeper, then the problem is the InitMetaProcedure will delete the meta region... {{{quote}}} yeah, 2.2.3 and master InitMetaProcedure are different. the 2.2.3 InitMetaProcedure does not create metaTable, so master can resubmit InitMetaProcedure. but in branch-master,master executes only INIT_META_ASSIGN_META(if meta dir already exists, InitMetaProcedure should skip INIT_META_WRITE_FS_LAYOUT and INIT_META_CREATE_NAMESPACES) > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181518#comment-17181518 ] Bo Cui edited comment on HBASE-24924 at 8/21/20, 3:03 AM: -- i think we can enhance it. [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto#L497] {code:java} message InitMetaStateData { required int32 latch = 1; } {code} if the completed InitMetaProcedure exists, and meta znode does not exist, master should create new InitMetaProcedure [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1046] {code:java} filter(p -> (p instanceof InitMetaProcedure && !p.isFinished())) {code} was (Author: bo cui): i think we should strengthen it. [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto#L497] {code:java} message InitMetaStateData { required int32 latch = 1; } {code} if the completed InitMetaProcedure exists, and meta znode does not exist, master should create new InitMetaProcedure https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1046 {code:java} filter(p -> (p instanceof InitMetaProcedure && !p.isFinished())) {code} > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24925) SCP reduce unnecessary get requests
[ https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24925: --- Component/s: MTTR > SCP reduce unnecessary get requests > --- > > Key: HBASE-24925 > URL: https://issues.apache.org/jira/browse/HBASE-24925 > Project: HBase > Issue Type: Improvement > Components: MTTR >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Major > > SCP should reduce unnecessary Get request > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520] > during startup, the tableNam2State of tableStateManager is not loading > tableState data form metaTable yet. if procThread num is 50 and hbase has > 10K tables, in the worst case, the master needs to query meta table 500K > times(50*10K. and the regions that all SCPs simultaneously check tableState > belong to the same table ) > > i think master can reduce Get request, and AM#loadMeta can load regions and > all tables > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24925) SCP reduce unnecessary get requests
[ https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24925: --- Affects Version/s: 3.0.0-alpha-1 2.2.3 > SCP reduce unnecessary get requests > --- > > Key: HBASE-24925 > URL: https://issues.apache.org/jira/browse/HBASE-24925 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Major > > SCP should reduce unnecessary Get request > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520] > during startup, the tableNam2State of tableStateManager is not loading > tableState data form metaTable yet. if procThread num is 50 and hbase has > 10K tables, in the worst case, the master needs to query meta table 500K > times(50*10K. and the regions that all SCPs simultaneously check tableState > belong to the same table ) > > i think master can reduce Get request, and AM#loadMeta can load regions and > all tables > [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24924: --- Component/s: master > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24925) SCP reduce unnecessary get requests
Bo Cui created HBASE-24925: -- Summary: SCP reduce unnecessary get requests Key: HBASE-24925 URL: https://issues.apache.org/jira/browse/HBASE-24925 Project: HBase Issue Type: Improvement Reporter: Bo Cui SCP should reduce unnecessary Get request [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520] during startup, the tableNam2State of tableStateManager is not loading tableState data form metaTable yet. if procThread num is 50 and hbase has 10K tables, in the worst case, the master needs to query meta table 500K times(50*10K. and the regions that all SCPs simultaneously check tableState belong to the same table ) i think master can reduce Get request, and AM#loadMeta can load regions and all tables [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181518#comment-17181518 ] Bo Cui edited comment on HBASE-24924 at 8/21/20, 2:28 AM: -- i think we should strengthen it. [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto#L497] {code:java} message InitMetaStateData { required int32 latch = 1; } {code} if the completed InitMetaProcedure exists, and meta znode does not exist, master should create new InitMetaProcedure https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1046 {code:java} filter(p -> (p instanceof InitMetaProcedure && !p.isFinished())) {code} was (Author: bo cui): https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto#L497 i think we should strengthen it. {code:java} message InitMetaStateData { required int32 latch = 1; } {code} > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24924: --- Description: if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, and meta znode does not exist, finishActiveMasterInitialization will stuck because during startup,If InitMetaProcedure exists, InitMetaProcedure recreates a new CountDownLatch. master jstack !image-2020-08-21-09-12-33-894.png! master log !masterLog.gif! was: if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, and meta znode does not exist, finishActiveMasterInitialization will stuck master jstack !image-2020-08-21-09-12-33-894.png! master log !masterLog.gif! > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > because during startup,If InitMetaProcedure exists, InitMetaProcedure > recreates a new CountDownLatch. > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181518#comment-17181518 ] Bo Cui edited comment on HBASE-24924 at 8/21/20, 2:09 AM: -- https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto#L497 i think we should strengthen it. {code:java} message InitMetaStateData { required int32 latch = 1; } {code} was (Author: bo cui): [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1046] i think we should strengthen it. {code:java} filter(p -> (p instanceof InitMetaProcedure && !p.isFinished())) {code} > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181521#comment-17181521 ] Bo Cui commented on HBASE-24924: reproduction steps 1、start new hbase,master will create meta table and submit InitMetaProcedure to procedureExecutor 2、wait until the InitMetaProcedure is complete, and then kill master and all rs 3、delete meta znode 4、start hbase, load procWAL, and load the completed InitMetaProcedure, master is stuck > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-24924: -- Assignee: Bo Cui > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181518#comment-17181518 ] Bo Cui commented on HBASE-24924: [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1046] i think we should strengthen it. {code:java} filter(p -> (p instanceof InitMetaProcedure && !p.isFinished())) {code} > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24924: --- Affects Version/s: 3.0.0-alpha-1 2.2.3 > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
[ https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24924: --- Description: if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, and meta znode does not exist, finishActiveMasterInitialization will stuck master jstack !image-2020-08-21-09-12-33-894.png! master log !masterLog.gif! was: if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, and meta znode does not exist, finishActiveMasterInitialization will stuck master jstack !image-2020-08-21-09-12-33-894.png! master log > stuck InitMetaProcedure in finishActiveMasterInitialization > --- > > Key: HBASE-24924 > URL: https://issues.apache.org/jira/browse/HBASE-24924 > Project: HBase > Issue Type: Bug >Reporter: Bo Cui >Priority: Major > Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif > > > if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, > and meta znode does not exist, finishActiveMasterInitialization will stuck > master jstack > !image-2020-08-21-09-12-33-894.png! > master log > !masterLog.gif! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization
Bo Cui created HBASE-24924: -- Summary: stuck InitMetaProcedure in finishActiveMasterInitialization Key: HBASE-24924 URL: https://issues.apache.org/jira/browse/HBASE-24924 Project: HBase Issue Type: Bug Reporter: Bo Cui Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, and meta znode does not exist, finishActiveMasterInitialization will stuck master jstack !image-2020-08-21-09-12-33-894.png! master log -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24885) STUCK RIT by hbck2 assigns
[ https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179303#comment-17179303 ] Bo Cui commented on HBASE-24885: {quote}Want me to take this [~Bo Cui]. I had seen the second case occur dealing w/ a Replica that failed assign and had added investigation to my todo list. Thanks for digging in here.{quote} [~stack] thx, i have assigned to u. > STUCK RIT by hbck2 assigns > -- > > Key: HBASE-24885 > URL: https://issues.apache.org/jira/browse/HBASE-24885 > Project: HBase > Issue Type: Bug > Components: hbck2, Region Assignment >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Michael Stack >Priority: Major > > If a region has been assign to rs1 and then client assigns region again by > "hbck2 assigns" > 1、if regionPlan is region to be assign to rs2,the region will be opened on > rs1 and rs2. > master log: > {quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: > rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on > server=rs1 but state has otherwise > {quote} > 2、if regionPlan is region to be assign to rs1, the > TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 > is not responding to master > rslog: > {quote}Receiving OPEN for the region:{}, which we are already trying to OPEN > - ignoring this new request for this region. > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-24885) STUCK RIT by hbck2 assigns
[ https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-24885: -- Assignee: Michael Stack (was: Bo Cui) > STUCK RIT by hbck2 assigns > -- > > Key: HBASE-24885 > URL: https://issues.apache.org/jira/browse/HBASE-24885 > Project: HBase > Issue Type: Bug > Components: hbck2, Region Assignment >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Michael Stack >Priority: Major > > If a region has been assign to rs1 and then client assigns region again by > "hbck2 assigns" > 1、if regionPlan is region to be assign to rs2,the region will be opened on > rs1 and rs2. > master log: > {quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: > rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on > server=rs1 but state has otherwise > {quote} > 2、if regionPlan is region to be assign to rs1, the > TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 > is not responding to master > rslog: > {quote}Receiving OPEN for the region:{}, which we are already trying to OPEN > - ignoring this new request for this region. > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24885) STUCK RIT by hbck2 assigns
[ https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178833#comment-17178833 ] Bo Cui commented on HBASE-24885: bq. Is this only a problem of HBCK2? Or we could meet the same problem when calling admin.assign? yeah, TRSP can judge current regionState in TRSP#executeFromState(REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE/REGION_STATE_TRANSITION_CLOSE) https://github.com/apache/hbase/blob/7335dbc8345298c57b8da4ccba640d5432b3fde9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java#L340 > STUCK RIT by hbck2 assigns > -- > > Key: HBASE-24885 > URL: https://issues.apache.org/jira/browse/HBASE-24885 > Project: HBase > Issue Type: Bug > Components: hbck2, Region Assignment >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > If a region has been assign to rs1 and then client assigns region again by > "hbck2 assigns" > 1、if regionPlan is region to be assign to rs2,the region will be opened on > rs1 and rs2. > master log: > {quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: > rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on > server=rs1 but state has otherwise > {quote} > 2、if regionPlan is region to be assign to rs1, the > TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 > is not responding to master > rslog: > {quote}Receiving OPEN for the region:{}, which we are already trying to OPEN > - ignoring this new request for this region. > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-24885) STUCK RIT by hbck2 assigns
[ https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui reassigned HBASE-24885: -- Assignee: Bo Cui > STUCK RIT by hbck2 assigns > -- > > Key: HBASE-24885 > URL: https://issues.apache.org/jira/browse/HBASE-24885 > Project: HBase > Issue Type: Bug > Components: hbck2, Region Assignment >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > > If a region has been assign to rs1 and then client assigns region again by > "hbck2 assigns" > 1、if regionPlan is region to be assign to rs2,the region will be opened on > rs1 and rs2. > master log: > {quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: > rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on > server=rs1 but state has otherwise > {quote} > 2、if regionPlan is region to be assign to rs1, the > TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 > is not responding to master > rslog: > {quote}Receiving OPEN for the region:{}, which we are already trying to OPEN > - ignoring this new request for this region. > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178685#comment-17178685 ] Bo Cui commented on HBASE-23035: [https://github.com/apache/hbase/blob/c2e0cf989e4a86169219161d4d889db80288e636/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L556] [~anoop.hbase] u are talking about it? > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24885) STUCK RIT by hbck2 assigns
[ https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24885: --- Description: If a region has been assign to rs1 and then client assigns region again by "hbck2 assigns" 1、if regionPlan is region to be assign to rs2,the region will be opened on rs1 and rs2. master log: {quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on server=rs1 but state has otherwise {quote} 2、if regionPlan is region to be assign to rs1, the TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 is not responding to master rslog: {quote}Receiving OPEN for the region:{}, which we are already trying to OPEN - ignoring this new request for this region. {quote} was: If a region has been assign to rs1 and then client assigns region again by "hbck2 assigns" 1、if regionPlan is region to be assign to rs2,the region will be opened on rs1 and rs2. master log: bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 reported OPEN on server=rs1 but state has otherwise 2、if regionPlan is region to be assign to rs1, the TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 is not responding to master rslog: bq. Receiving OPEN for the region:{}, which we are already trying to OPEN - ignoring this new request for this region. > STUCK RIT by hbck2 assigns > -- > > Key: HBASE-24885 > URL: https://issues.apache.org/jira/browse/HBASE-24885 > Project: HBase > Issue Type: Bug > Components: hbck2, Region Assignment >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Major > > If a region has been assign to rs1 and then client assigns region again by > "hbck2 assigns" > 1、if regionPlan is region to be assign to rs2,the region will be opened on > rs1 and rs2. > master log: > {quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: > rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on > server=rs1 but state has otherwise > {quote} > 2、if regionPlan is region to be assign to rs1, the > TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 > is not responding to master > rslog: > {quote}Receiving OPEN for the region:{}, which we are already trying to OPEN > - ignoring this new request for this region. > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24885) STUCK RIT by hbck2 assigns
[ https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24885: --- Affects Version/s: 3.0.0-alpha-1 > STUCK RIT by hbck2 assigns > -- > > Key: HBASE-24885 > URL: https://issues.apache.org/jira/browse/HBASE-24885 > Project: HBase > Issue Type: Bug > Components: hbck2, Region Assignment >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: Bo Cui >Priority: Major > > If a region has been assign to rs1 and then client assigns region again by > "hbck2 assigns" > 1、if regionPlan is region to be assign to rs2,the region will be opened on > rs1 and rs2. > master log: > bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: > rit=OPEN, location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 > reported OPEN on server=rs1 but state has otherwise > 2、if regionPlan is region to be assign to rs1, the > TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 > is not responding to master > rslog: > bq. Receiving OPEN for the region:{}, which we are already trying to OPEN - > ignoring this new request for this region. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24885) STUCK RIT by hbck2 assigns
[ https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HBASE-24885: --- Description: If a region has been assign to rs1 and then client assigns region again by "hbck2 assigns" 1、if regionPlan is region to be assign to rs2,the region will be opened on rs1 and rs2. master log: bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 reported OPEN on server=rs1 but state has otherwise 2、if regionPlan is region to be assign to rs1, the TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 is not responding to master rslog: bq. Receiving OPEN for the region:{}, which we are already trying to OPEN - ignoring this new request for this region. was: If a region has been assign to rs1 and then client assigns region again by "hbck2 assigns" 1、if regionPlan is region to be assign to rs2,the region will be opened on rs1 and rs2. master log: bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 reported OPEN on server=rs1 but state has otherwise 2、if regionPlan is region to be assign to rs1, the TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 is not responding to master rslog: bq. Receiving OPEN for the region:{}, which we are already trying to OPEN - ignoring this new request for this region. > STUCK RIT by hbck2 assigns > -- > > Key: HBASE-24885 > URL: https://issues.apache.org/jira/browse/HBASE-24885 > Project: HBase > Issue Type: Bug > Components: hbck2, Region Assignment >Affects Versions: 2.2.3 >Reporter: Bo Cui >Priority: Major > > If a region has been assign to rs1 and then client assigns region again by > "hbck2 assigns" > 1、if regionPlan is region to be assign to rs2,the region will be opened on > rs1 and rs2. > master log: > bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: > rit=OPEN, location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 > reported OPEN on server=rs1 but state has otherwise > 2、if regionPlan is region to be assign to rs1, the > TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 > is not responding to master > rslog: > bq. Receiving OPEN for the region:{}, which we are already trying to OPEN - > ignoring this new request for this region. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24885) STUCK RIT by hbck2 assigns
Bo Cui created HBASE-24885: -- Summary: STUCK RIT by hbck2 assigns Key: HBASE-24885 URL: https://issues.apache.org/jira/browse/HBASE-24885 Project: HBase Issue Type: Bug Components: hbck2, Region Assignment Affects Versions: 2.2.3 Reporter: Bo Cui If a region has been assign to rs1 and then client assigns region again by "hbck2 assigns" 1、if regionPlan is region to be assign to rs2,the region will be opened on rs1 and rs2. master log: bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 reported OPEN on server=rs1 but state has otherwise 2、if regionPlan is region to be assign to rs1, the TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 is not responding to master rslog: bq. Receiving OPEN for the region:{}, which we are already trying to OPEN - ignoring this new request for this region. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176744#comment-17176744 ] Bo Cui commented on HBASE-23035: [~zghao] During startup, hbase needs to assign region to previous rs without affecting the scan performance, so we can add conf to solve this problem > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174189#comment-17174189 ] Bo Cui commented on HBASE-23035: {quote}And the locality is not big deal when deploy HBase on cloud. {quote} [~zghao] hi, but some hbase cluster is not on the cloud, > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-21721) reduce write#syncs() times
[ https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173534#comment-17173534 ] Bo Cui commented on HBASE-21721: [~anoop.hbase] master has been updated > reduce write#syncs() times > -- > > Key: HBASE-21721 > URL: https://issues.apache.org/jira/browse/HBASE-21721 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.3.1, 2.1.1, master, 2.2.3 >Reporter: Bo Cui >Assignee: Bo Cui >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6 > > > the number of write#syncs can be reduced by updating the > highestUnsyncedSequence: > before write#sync(), get the current highestUnsyncedSequence > after write#sync, highestSyncedSequence=highestUnsyncedSequence > > {code:title=FSHLog.java|borderStyle=solid} > // Some comments here > public void run() > { > long currentSequence; > while (!isInterrupted()) { > int syncCount = 0; > try { > while (true) { > ... > try { > Trace.addTimelineAnnotation("syncing writer"); > long unSyncedFlushSeq = highestUnsyncedSequence; > writer.sync(); > Trace.addTimelineAnnotation("writer synced"); > if( unSyncedFlushSeq > currentSequence ) currentSequence = > unSyncedFlushSeq; > currentSequence = updateHighestSyncedSequence(currentSequence); > } catch (IOException e) { > LOG.error("Error syncing, request close of WAL", e); > lastException = e; > } catch (Exception e) { >... > } > } > {code} > Add code > long unSyncedFlushSeq = highestUnsyncedSequence; > if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22263) Master creates duplicate ServerCrashProcedure on initialization, leading to assignment hanging in region-dense clusters
[ https://issues.apache.org/jira/browse/HBASE-22263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170653#comment-17170653 ] Bo Cui commented on HBASE-22263: [~busbey] yeah, the issue affects only branch-1 i will raise a new branch-1 PR and close branch-1.4 PR > Master creates duplicate ServerCrashProcedure on initialization, leading to > assignment hanging in region-dense clusters > --- > > Key: HBASE-22263 > URL: https://issues.apache.org/jira/browse/HBASE-22263 > Project: HBase > Issue Type: Bug > Components: proc-v2, Region Assignment >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0 >Reporter: Sean Busbey >Assignee: Bo Cui >Priority: Critical > Attachments: HBASE-22263-branch-1.v0.add.patch, > HBASE-22263-branch-1.v0.patch > > > h3. Problem: > During Master initialization we > # restore existing procedures that still need to run from prior active > Master instances > # look for signs that Region Servers have died and need to be recovered > while we were out and schedule a ServerCrashProcedure (SCP) for each them > # turn on the assignment manager > The normal turn of events for a ServerCrashProcedure will attempt to use a > bulk assignment to maintain the set of regions on a RS if possible. However, > we wait around and retry a bit later if the assignment manager isn’t ready > yet. > Note that currently #2 has no notion of wether or not a previous active > Master instances has already done a check. This means we might schedule an > SCP for a ServerName (host, port, start code) that already has an SCP > scheduled. Ideally, such a duplicate should be a no-op. > However, before step #2 schedules the SCP it first marks the region server as > dead and not yet processed, with the expectation that the SCP it just created > will look if there is log splitting work and then mark the server as easy for > region assignment. At the same time, any restored SCPs that are past the step > of log splitting will be waiting for the AssignmentManager still. As a part > of restoring themselves, they do not update with the current master instance > to show that they are past the point of WAL processing. > Once the AssignmentManager starts in #3 the restored SCP continues; it will > eventually get to the assignment phase and find that its server is marked as > dead and in need of wal processing. Such assignments are skipped with a log > message. Thus as we iterate over the regions to assign we’ll skip all of > them. This non-intuitively shifts the “no-op” status from the newer SCP we > scheduled at #2 to the older SCP that was restored in #1. > Bulk assignment works by sending the assign calls via a pool to allow more > parallelism. Once we’ve set up the pool we just wait to see if the region > state updates to online. Unfortunately, since all of the assigns got skipped, > we’ll never change the state for any of these regions. That means the bulk > assign, and the older SCP that started it, will wait until it hits a timeout. > By default the timeout for a bulk assignment is the smaller of {{(# Regions > in the plan * 10s)}} or {{(# Regions in the most loaded RS in the plan * 1s + > 60s + # of RegionServers in the cluster * 30s)}}. For even modest clusters > with several hundreds of regions per region server, this means the “no-op” > SCP will end up waiting ~tens-of-minutes (e.g. ~50 minutes for an average > region density of 300 regions per region server on a 100 node cluster. ~11 > minutes for 300 regions per region server on a 10 node cluster). During this > time, the SCP will hold one of the available procedure execution slots for > both the overall pool and for the specific server queue. > As previously mentioned, restored SCPs will retry their submission if the > assignment manager has not yet been activated (done in #3), this can cause > them to be scheduled after the newer SCPs (created in #2). Thus the order of > execution of no-op and usable SCPs can vary from run-to-run of master > initialization. > This means that unless you get lucky with SCP ordering, impacted regions will > remain as RIT for an extended period of time. If you get particularly unlucky > and a critical system table is included in the regions that are being > recovered, then master initialization itself will end up blocked on this > sequence of SCP timeouts. If there are enough of them to exceed the master > initialization timeouts, then the situation can be self-sustaining as > additional master fails over cause even more duplicative SCPs to be scheduled. > h3. Indicators: > * Master appears to hang; failing to assign regions to available region > servers. > * Master appears to hang during ini
[jira] [Comment Edited] (HBASE-21721) reduce write#syncs() times
[ https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165159#comment-17165159 ] Bo Cui edited comment on HBASE-21721 at 7/26/20, 4:46 AM: -- # yes, AsyncFsWAL is not in the 1.3 and older hbase # in 2.x version, default is async WAL, but FSHLog exists # the patch does not apply with AsyncFSWAL was (Author: bo cui): # yes, AsyncFsWAL is not in the 1.3 and older hbase # in 2.x version, default is async WAL, but FSWAL exists # the patch does not apply with AsyncFSWAL > reduce write#syncs() times > -- > > Key: HBASE-21721 > URL: https://issues.apache.org/jira/browse/HBASE-21721 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.3.1, 2.1.1, master, 2.2.3 >Reporter: Bo Cui >Priority: Major > > the number of write#syncs can be reduced by updating the > highestUnsyncedSequence: > before write#sync(), get the current highestUnsyncedSequence > after write#sync, highestSyncedSequence=highestUnsyncedSequence > > {code:title=FSHLog.java|borderStyle=solid} > // Some comments here > public void run() > { > long currentSequence; > while (!isInterrupted()) { > int syncCount = 0; > try { > while (true) { > ... > try { > Trace.addTimelineAnnotation("syncing writer"); > long unSyncedFlushSeq = highestUnsyncedSequence; > writer.sync(); > Trace.addTimelineAnnotation("writer synced"); > if( unSyncedFlushSeq > currentSequence ) currentSequence = > unSyncedFlushSeq; > currentSequence = updateHighestSyncedSequence(currentSequence); > } catch (IOException e) { > LOG.error("Error syncing, request close of WAL", e); > lastException = e; > } catch (Exception e) { >... > } > } > {code} > Add code > long unSyncedFlushSeq = highestUnsyncedSequence; > if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-21721) reduce write#syncs() times
[ https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165159#comment-17165159 ] Bo Cui commented on HBASE-21721: # yes, AsyncFsWAL is not in the 1.3 and older hbase # in 2.x version, default is async WAL, but FSWAL exists # the patch does not apply with AsyncFSWAL > reduce write#syncs() times > -- > > Key: HBASE-21721 > URL: https://issues.apache.org/jira/browse/HBASE-21721 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.3.1, 2.1.1, master, 2.2.3 >Reporter: Bo Cui >Priority: Major > > the number of write#syncs can be reduced by updating the > highestUnsyncedSequence: > before write#sync(), get the current highestUnsyncedSequence > after write#sync, highestSyncedSequence=highestUnsyncedSequence > > {code:title=FSHLog.java|borderStyle=solid} > // Some comments here > public void run() > { > long currentSequence; > while (!isInterrupted()) { > int syncCount = 0; > try { > while (true) { > ... > try { > Trace.addTimelineAnnotation("syncing writer"); > long unSyncedFlushSeq = highestUnsyncedSequence; > writer.sync(); > Trace.addTimelineAnnotation("writer synced"); > if( unSyncedFlushSeq > currentSequence ) currentSequence = > unSyncedFlushSeq; > currentSequence = updateHighestSyncedSequence(currentSequence); > } catch (IOException e) { > LOG.error("Error syncing, request close of WAL", e); > lastException = e; > } catch (Exception e) { >... > } > } > {code} > Add code > long unSyncedFlushSeq = highestUnsyncedSequence; > if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq; -- This message was sent by Atlassian Jira (v8.3.4#803005)