Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
bq, A Master startup refactor + WALs-per-system-table sounds like a lot of change for a minor release. Yes, we've talked offline about this, it is too big. We plan to revert the table based replication storage from the master branch first and open a feature branch for it. Thanks. 2018-03-17 2:11 GMT+08:00 Stack : > On Thu, Mar 15, 2018 at 8:42 PM, Guanghao Zhang > wrote: > > > > > > > We've done the work to make sure hbase:meta > > > is up before everything else. It has its own WALs so we can split these > > > ahead of user-space WALs, and so on. We've not done the work to for > > > hbase:replication or hbase:namespace, hbase:acl... etc. > > > > If we import a new SYSTEM-TABLE-ONLY state for region server startup, > then > > it is necessary to own WALs for all system tables (not only hbase:meta). > > > > WALs dedicated to system tables would be a new facility. Would be good to > have. We'd have a WAL per system table or they'd share a WAL? The meta-only > WAL was hacked in. Would probably take more work to get system-dedicated > WALs in the mix. > > > > > All system tables have their own wal and split these ahead of user-space > > WALs. And the WAL of system tables no need replication. So we can start a > > region server without replication. After all system table online, region > > server can continue start from SYSTEM-TABLE-ONLY to STARTED. > > > > > A refactor of Master startup is definitely needed. Would like to get > considerations other than just assign order considered but I suppose that > can wait. Your suggested stepped assign sounds fine. What about shutdown > and when Master joins an existing cluster or a host goes down that had > system tables on it? How would stepping work then? Will there be a > hierarchy of assign amongst system tables? Or will it just be meta first, > then general system tables, and then user-space tables? We need to split > meta. How will that impinge on these plans? > > My suggestion of overloading hbase:meta so it can carry the metadata for > replication is less pure but keeps our assign simple. > > A Master startup refactor + WALs-per-system-table sounds like a lot of > change for a minor release. > > Thanks Guanghao, > S > > > > > Thanks. > > > > 2018-03-16 10:12 GMT+08:00 OpenInx : > > > > > The HBASE-15867 will not be introduced into 2.0.0, I expect to > > introduce > > > in 2.1.0 release . > > > > > > Thanks. > > > > > > On Fri, Mar 16, 2018 at 12:45 AM, Mike Drob wrote: > > > > > > > I'm also +1 for splitting RS startup into multiple steps. > > > > > > > > Looking at the linked JIRA and the parent issue it was not > immediately > > > > apparent if this is an issue for 2.0 or not - can somebody clarify? > > > > > > > > On Thu, Mar 15, 2018 at 5:14 AM, 张铎(Duo Zhang) < > palomino...@gmail.com> > > > > wrote: > > > > > > > > > I'm +1 on the second solution. > > > > > > > > > > 2018-03-15 16:59 GMT+08:00 Guanghao Zhang : > > > > > > > > > > > From a more general perspective, this may be a general problem as > > we > > > > may > > > > > > move more and more data from zookeeper to system table. Or we may > > > have > > > > > more > > > > > > features to create new system table. But if the RS relays some > > system > > > > > table > > > > > > to start up, we will meet a dead lock... > > > > > > > > > > > > One solution is let master to serve system table only. So the > > cluster > > > > > > startup will have two step. First startup master to serve system > > > table. > > > > > > Then start region servers. But the problem is master will have > > > > > > more responsibility and may be a bottleneck. > > > > > > > > > > > > Another solution is break RS startup progress to two steps. First > > > step > > > > is > > > > > > "serve system table only". Second step is "totally startup and > > serve > > > > any > > > > > > tables". It means we will import a new state for RS startup. A > RS's > > > > > startup > > > > > > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But > > this > > > > may > > > > > > need more refactor for our RS code. > > > > > > > > > > > > Thanks. > > > > > > > > > > > > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) >: > > > > > > > > > > > > > Oh, it should be 'The replication peer related data is small'. > > > > > > > > > > > > > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) < > palomino...@gmail.com > > >: > > > > > > > > > > > > > > > I think this is a bit awkward... A region server even does > not > > > need > > > > > the > > > > > > > > meta table to be online when starting, but it needs another > > > system > > > > > > table > > > > > > > > when starting... > > > > > > > > > > > > > > > > I think unless we can make the regionserver start without > > > > > replication, > > > > > > > and > > > > > > > > initialize it later, otherwise we can not break the tie. > > Having a > > > > > > special > > > > > > > > 'region server' seems a bad smell to me. What's the advantage > > > > > comparing > > > > > > > to > > > > > > > > zk? > > > > > > > >
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
On Thu, Mar 15, 2018 at 8:42 PM, Guanghao Zhang wrote: > > > > We've done the work to make sure hbase:meta > > is up before everything else. It has its own WALs so we can split these > > ahead of user-space WALs, and so on. We've not done the work to for > > hbase:replication or hbase:namespace, hbase:acl... etc. > > If we import a new SYSTEM-TABLE-ONLY state for region server startup, then > it is necessary to own WALs for all system tables (not only hbase:meta). > WALs dedicated to system tables would be a new facility. Would be good to have. We'd have a WAL per system table or they'd share a WAL? The meta-only WAL was hacked in. Would probably take more work to get system-dedicated WALs in the mix. > All system tables have their own wal and split these ahead of user-space > WALs. And the WAL of system tables no need replication. So we can start a > region server without replication. After all system table online, region > server can continue start from SYSTEM-TABLE-ONLY to STARTED. > > A refactor of Master startup is definitely needed. Would like to get considerations other than just assign order considered but I suppose that can wait. Your suggested stepped assign sounds fine. What about shutdown and when Master joins an existing cluster or a host goes down that had system tables on it? How would stepping work then? Will there be a hierarchy of assign amongst system tables? Or will it just be meta first, then general system tables, and then user-space tables? We need to split meta. How will that impinge on these plans? My suggestion of overloading hbase:meta so it can carry the metadata for replication is less pure but keeps our assign simple. A Master startup refactor + WALs-per-system-table sounds like a lot of change for a minor release. Thanks Guanghao, S > Thanks. > > 2018-03-16 10:12 GMT+08:00 OpenInx : > > > The HBASE-15867 will not be introduced into 2.0.0, I expect to > introduce > > in 2.1.0 release . > > > > Thanks. > > > > On Fri, Mar 16, 2018 at 12:45 AM, Mike Drob wrote: > > > > > I'm also +1 for splitting RS startup into multiple steps. > > > > > > Looking at the linked JIRA and the parent issue it was not immediately > > > apparent if this is an issue for 2.0 or not - can somebody clarify? > > > > > > On Thu, Mar 15, 2018 at 5:14 AM, 张铎(Duo Zhang) > > > wrote: > > > > > > > I'm +1 on the second solution. > > > > > > > > 2018-03-15 16:59 GMT+08:00 Guanghao Zhang : > > > > > > > > > From a more general perspective, this may be a general problem as > we > > > may > > > > > move more and more data from zookeeper to system table. Or we may > > have > > > > more > > > > > features to create new system table. But if the RS relays some > system > > > > table > > > > > to start up, we will meet a dead lock... > > > > > > > > > > One solution is let master to serve system table only. So the > cluster > > > > > startup will have two step. First startup master to serve system > > table. > > > > > Then start region servers. But the problem is master will have > > > > > more responsibility and may be a bottleneck. > > > > > > > > > > Another solution is break RS startup progress to two steps. First > > step > > > is > > > > > "serve system table only". Second step is "totally startup and > serve > > > any > > > > > tables". It means we will import a new state for RS startup. A RS's > > > > startup > > > > > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But > this > > > may > > > > > need more refactor for our RS code. > > > > > > > > > > Thanks. > > > > > > > > > > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) : > > > > > > > > > > > Oh, it should be 'The replication peer related data is small'. > > > > > > > > > > > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) >: > > > > > > > > > > > > > I think this is a bit awkward... A region server even does not > > need > > > > the > > > > > > > meta table to be online when starting, but it needs another > > system > > > > > table > > > > > > > when starting... > > > > > > > > > > > > > > I think unless we can make the regionserver start without > > > > replication, > > > > > > and > > > > > > > initialize it later, otherwise we can not break the tie. > Having a > > > > > special > > > > > > > 'region server' seems a bad smell to me. What's the advantage > > > > comparing > > > > > > to > > > > > > > zk? > > > > > > > > > > > > > > BTW, I believe that we only need the ReplicationPeerStorage to > be > > > > > > > available when starting a region server, so we can keep this > data > > > in > > > > > zk, > > > > > > > and storage the queue related data to hbase:replication table? > > The > > > > > > > replication related data is small so I think this is OK. > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > 2018-03-15 14:55 GMT+08:00 OpenInx : > > > > > > > > > > > > > >> Hi : > > > > > > >> > > > > > > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166 > ? > > > > > > >> focusedCommentId=16399886&page=com.atlass
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
> > We've done the work to make sure hbase:meta > is up before everything else. It has its own WALs so we can split these > ahead of user-space WALs, and so on. We've not done the work to for > hbase:replication or hbase:namespace, hbase:acl... etc. If we import a new SYSTEM-TABLE-ONLY state for region server startup, then it is necessary to own WALs for all system tables (not only hbase:meta). All system tables have their own wal and split these ahead of user-space WALs. And the WAL of system tables no need replication. So we can start a region server without replication. After all system table online, region server can continue start from SYSTEM-TABLE-ONLY to STARTED. Thanks. 2018-03-16 10:12 GMT+08:00 OpenInx : > The HBASE-15867 will not be introduced into 2.0.0, I expect to introduce > in 2.1.0 release . > > Thanks. > > On Fri, Mar 16, 2018 at 12:45 AM, Mike Drob wrote: > > > I'm also +1 for splitting RS startup into multiple steps. > > > > Looking at the linked JIRA and the parent issue it was not immediately > > apparent if this is an issue for 2.0 or not - can somebody clarify? > > > > On Thu, Mar 15, 2018 at 5:14 AM, 张铎(Duo Zhang) > > wrote: > > > > > I'm +1 on the second solution. > > > > > > 2018-03-15 16:59 GMT+08:00 Guanghao Zhang : > > > > > > > From a more general perspective, this may be a general problem as we > > may > > > > move more and more data from zookeeper to system table. Or we may > have > > > more > > > > features to create new system table. But if the RS relays some system > > > table > > > > to start up, we will meet a dead lock... > > > > > > > > One solution is let master to serve system table only. So the cluster > > > > startup will have two step. First startup master to serve system > table. > > > > Then start region servers. But the problem is master will have > > > > more responsibility and may be a bottleneck. > > > > > > > > Another solution is break RS startup progress to two steps. First > step > > is > > > > "serve system table only". Second step is "totally startup and serve > > any > > > > tables". It means we will import a new state for RS startup. A RS's > > > startup > > > > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But this > > may > > > > need more refactor for our RS code. > > > > > > > > Thanks. > > > > > > > > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) : > > > > > > > > > Oh, it should be 'The replication peer related data is small'. > > > > > > > > > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) : > > > > > > > > > > > I think this is a bit awkward... A region server even does not > need > > > the > > > > > > meta table to be online when starting, but it needs another > system > > > > table > > > > > > when starting... > > > > > > > > > > > > I think unless we can make the regionserver start without > > > replication, > > > > > and > > > > > > initialize it later, otherwise we can not break the tie. Having a > > > > special > > > > > > 'region server' seems a bad smell to me. What's the advantage > > > comparing > > > > > to > > > > > > zk? > > > > > > > > > > > > BTW, I believe that we only need the ReplicationPeerStorage to be > > > > > > available when starting a region server, so we can keep this data > > in > > > > zk, > > > > > > and storage the queue related data to hbase:replication table? > The > > > > > > replication related data is small so I think this is OK. > > > > > > > > > > > > Thanks. > > > > > > > > > > > > 2018-03-15 14:55 GMT+08:00 OpenInx : > > > > > > > > > > > >> Hi : > > > > > >> > > > > > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166? > > > > > >> focusedCommentId=16399886&page=com.atlassian.jira. > > > > > >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment- > 16399886) > > > > > >> > > > > > >> There's a really big problem here if we use table based > > replication > > > to > > > > > >> start a hbase cluster: > > > > > >> > > > > > >> For HMaster process, it works as following: > > > > > >> 1. Start active master initialization . > > > > > >> 2. Master wait rs report in . > > > > > >> 3. Master assign meta region to one of the region servers . > > > > > >> 4. Master create hbase:replication table if not exist. > > > > > >> > > > > > >> But the RS need to finish initialize the replication source & > sink > > > > > before > > > > > >> finish startup( and the initialization of replication source & > > sink > > > > must > > > > > >> finish before opening region, because we need to listen the wal > > > event, > > > > > >> otherwise our replication may lost data), and when initialize > the > > > > > source & > > > > > >> sink , we need to read hbase:replication table which hasn't been > > > > > avaiable > > > > > >> because our master is waiting rs to be OK, and the rs is waiting > > > > > >> hbase:replication to be OK ... a dead loop happen again ... > > > > > >> > > > > > >> After discussed with Guanghao Zhang offline, I'm considering > that > > > try > > > > to > > > > > >> assign all
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
The HBASE-15867 will not be introduced into 2.0.0, I expect to introduce in 2.1.0 release . Thanks. On Fri, Mar 16, 2018 at 12:45 AM, Mike Drob wrote: > I'm also +1 for splitting RS startup into multiple steps. > > Looking at the linked JIRA and the parent issue it was not immediately > apparent if this is an issue for 2.0 or not - can somebody clarify? > > On Thu, Mar 15, 2018 at 5:14 AM, 张铎(Duo Zhang) > wrote: > > > I'm +1 on the second solution. > > > > 2018-03-15 16:59 GMT+08:00 Guanghao Zhang : > > > > > From a more general perspective, this may be a general problem as we > may > > > move more and more data from zookeeper to system table. Or we may have > > more > > > features to create new system table. But if the RS relays some system > > table > > > to start up, we will meet a dead lock... > > > > > > One solution is let master to serve system table only. So the cluster > > > startup will have two step. First startup master to serve system table. > > > Then start region servers. But the problem is master will have > > > more responsibility and may be a bottleneck. > > > > > > Another solution is break RS startup progress to two steps. First step > is > > > "serve system table only". Second step is "totally startup and serve > any > > > tables". It means we will import a new state for RS startup. A RS's > > startup > > > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But this > may > > > need more refactor for our RS code. > > > > > > Thanks. > > > > > > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) : > > > > > > > Oh, it should be 'The replication peer related data is small'. > > > > > > > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) : > > > > > > > > > I think this is a bit awkward... A region server even does not need > > the > > > > > meta table to be online when starting, but it needs another system > > > table > > > > > when starting... > > > > > > > > > > I think unless we can make the regionserver start without > > replication, > > > > and > > > > > initialize it later, otherwise we can not break the tie. Having a > > > special > > > > > 'region server' seems a bad smell to me. What's the advantage > > comparing > > > > to > > > > > zk? > > > > > > > > > > BTW, I believe that we only need the ReplicationPeerStorage to be > > > > > available when starting a region server, so we can keep this data > in > > > zk, > > > > > and storage the queue related data to hbase:replication table? The > > > > > replication related data is small so I think this is OK. > > > > > > > > > > Thanks. > > > > > > > > > > 2018-03-15 14:55 GMT+08:00 OpenInx : > > > > > > > > > >> Hi : > > > > >> > > > > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166? > > > > >> focusedCommentId=16399886&page=com.atlassian.jira. > > > > >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886) > > > > >> > > > > >> There's a really big problem here if we use table based > replication > > to > > > > >> start a hbase cluster: > > > > >> > > > > >> For HMaster process, it works as following: > > > > >> 1. Start active master initialization . > > > > >> 2. Master wait rs report in . > > > > >> 3. Master assign meta region to one of the region servers . > > > > >> 4. Master create hbase:replication table if not exist. > > > > >> > > > > >> But the RS need to finish initialize the replication source & sink > > > > before > > > > >> finish startup( and the initialization of replication source & > sink > > > must > > > > >> finish before opening region, because we need to listen the wal > > event, > > > > >> otherwise our replication may lost data), and when initialize the > > > > source & > > > > >> sink , we need to read hbase:replication table which hasn't been > > > > avaiable > > > > >> because our master is waiting rs to be OK, and the rs is waiting > > > > >> hbase:replication to be OK ... a dead loop happen again ... > > > > >> > > > > >> After discussed with Guanghao Zhang offline, I'm considering that > > try > > > to > > > > >> assign all system table to a rs which only accept regions of > system > > > > table > > > > >> assignment (The rs will skip to initialize the replication source > or > > > > sink > > > > >> )... > > > > >> > > > > >> I've tried to start a mini cluster by setting > > > > >> hbase.balancer.tablesOnMaster.systemTablesOnly=true > > > > >> & hbase.balancer.tablesOnMaster=true , it seems not work. because > > > > >> currently > > > > >> we initialize the master logic firstly, then region logic for the > > > > HMaster > > > > >> process, and it should be ... > > > > >> > > > > >> > > > > >> Any suggestion ? > > > > >> > > > > > > > > > > > > > > > > > > > > -- == Openinx blog : http://openinx.github.io TO BE A GREAT HACKER ! ==
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
On Thu, Mar 15, 2018 at 5:39 PM, 张铎(Duo Zhang) wrote: > > But for other stuffs for replication, such as the WAL files we need to > replicate, the row key will be a peer id, and multiplies the server name, > and maybe also the file name? This is a completely different thing. If we > put this in meta, the row key will be messed up... > > Is there some artifice that would allow us shape this info so it fit the meta schema: e.g. a table that we do not assign (special namespace? special tablename? hbase:replication?). Then perhaps the regions in meta have peer as the start row etc. St.Ack > 2018-03-16 3:01 GMT+08:00 Stack : > > > On Wed, Mar 14, 2018 at 11:55 PM, OpenInx wrote: > > > > > Hi : > > > > > > (Paste from https://issues.apache.org/jira/browse/HBASE-20166? > > > focusedCommentId=16399886&page=com.atlassian.jira. > > > plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886) > > > > > > There's a really big problem here if we use table based replication to > > > start a hbase cluster: > > > > > > For HMaster process, it works as following: > > > 1. Start active master initialization . > > > 2. Master wait rs report in . > > > 3. Master assign meta region to one of the region servers . > > > 4. Master create hbase:replication table if not exist. > > > > > > > > We have to have a new system table? Can't we add a column family on > > hbase:meta that keeps offsets? We've done the work to make sure > hbase:meta > > is up before everything else. It has its own WALs so we can split these > > ahead of user-space WALs, and so on. We've not done the work to for > > hbase:replication or hbase:namespace, hbase:acl... etc. > > > > Means more loading on hbase:meta and it is going to get bigger but I'd > > rather work on splitting meta than on figuring how to preassign > > miscellaneous system tables; one-per-feature. > > > > > > > > > But the RS need to finish initialize the replication source & sink > before > > > finish startup( and the initialization of replication source & sink > must > > > finish before opening region, because we need to listen the wal event, > > > otherwise our replication may lost data), and when initialize the > source > > & > > > sink , we need to read hbase:replication table which hasn't been > avaiable > > > because our master is waiting rs to be OK, and the rs is waiting > > > hbase:replication to be OK ... a dead loop happen again ... > > > > > > After discussed with Guanghao Zhang offline, I'm considering that try > to > > > assign all system table to a rs which only accept regions of system > table > > > assignment (The rs will skip to initialize the replication source or > sink > > > )... > > > > > > > > Can we avoid this sort of special-casing? > > > > St.Ack > > > > > > > > > I've tried to start a mini cluster by setting > > > hbase.balancer.tablesOnMaster.systemTablesOnly=true > > > & hbase.balancer.tablesOnMaster=true , it seems not work. because > > > currently > > > we initialize the master logic firstly, then region logic for the > HMaster > > > process, and it should be ... > > > > > > > > > Any suggestion ? > > > > > >
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
The problem for putting more things in meta is that, the row key pattern are different. For example, when re-implementing the serial replication feature, the 'replication barrier', which is actually the sequence of the open sequence number for a region, is stored in meta as the row key can the region name, so it is OK. But for other stuffs for replication, such as the WAL files we need to replicate, the row key will be a peer id, and multiplies the server name, and maybe also the file name? This is a completely different thing. If we put this in meta, the row key will be messed up... 2018-03-16 3:01 GMT+08:00 Stack : > On Wed, Mar 14, 2018 at 11:55 PM, OpenInx wrote: > > > Hi : > > > > (Paste from https://issues.apache.org/jira/browse/HBASE-20166? > > focusedCommentId=16399886&page=com.atlassian.jira. > > plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886) > > > > There's a really big problem here if we use table based replication to > > start a hbase cluster: > > > > For HMaster process, it works as following: > > 1. Start active master initialization . > > 2. Master wait rs report in . > > 3. Master assign meta region to one of the region servers . > > 4. Master create hbase:replication table if not exist. > > > > > We have to have a new system table? Can't we add a column family on > hbase:meta that keeps offsets? We've done the work to make sure hbase:meta > is up before everything else. It has its own WALs so we can split these > ahead of user-space WALs, and so on. We've not done the work to for > hbase:replication or hbase:namespace, hbase:acl... etc. > > Means more loading on hbase:meta and it is going to get bigger but I'd > rather work on splitting meta than on figuring how to preassign > miscellaneous system tables; one-per-feature. > > > > > But the RS need to finish initialize the replication source & sink before > > finish startup( and the initialization of replication source & sink must > > finish before opening region, because we need to listen the wal event, > > otherwise our replication may lost data), and when initialize the source > & > > sink , we need to read hbase:replication table which hasn't been avaiable > > because our master is waiting rs to be OK, and the rs is waiting > > hbase:replication to be OK ... a dead loop happen again ... > > > > After discussed with Guanghao Zhang offline, I'm considering that try to > > assign all system table to a rs which only accept regions of system table > > assignment (The rs will skip to initialize the replication source or sink > > )... > > > > > Can we avoid this sort of special-casing? > > St.Ack > > > > > I've tried to start a mini cluster by setting > > hbase.balancer.tablesOnMaster.systemTablesOnly=true > > & hbase.balancer.tablesOnMaster=true , it seems not work. because > > currently > > we initialize the master logic firstly, then region logic for the HMaster > > process, and it should be ... > > > > > > Any suggestion ? > > >
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
On Wed, Mar 14, 2018 at 11:55 PM, OpenInx wrote: > Hi : > > (Paste from https://issues.apache.org/jira/browse/HBASE-20166? > focusedCommentId=16399886&page=com.atlassian.jira. > plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886) > > There's a really big problem here if we use table based replication to > start a hbase cluster: > > For HMaster process, it works as following: > 1. Start active master initialization . > 2. Master wait rs report in . > 3. Master assign meta region to one of the region servers . > 4. Master create hbase:replication table if not exist. > > We have to have a new system table? Can't we add a column family on hbase:meta that keeps offsets? We've done the work to make sure hbase:meta is up before everything else. It has its own WALs so we can split these ahead of user-space WALs, and so on. We've not done the work to for hbase:replication or hbase:namespace, hbase:acl... etc. Means more loading on hbase:meta and it is going to get bigger but I'd rather work on splitting meta than on figuring how to preassign miscellaneous system tables; one-per-feature. > But the RS need to finish initialize the replication source & sink before > finish startup( and the initialization of replication source & sink must > finish before opening region, because we need to listen the wal event, > otherwise our replication may lost data), and when initialize the source & > sink , we need to read hbase:replication table which hasn't been avaiable > because our master is waiting rs to be OK, and the rs is waiting > hbase:replication to be OK ... a dead loop happen again ... > > After discussed with Guanghao Zhang offline, I'm considering that try to > assign all system table to a rs which only accept regions of system table > assignment (The rs will skip to initialize the replication source or sink > )... > > Can we avoid this sort of special-casing? St.Ack > I've tried to start a mini cluster by setting > hbase.balancer.tablesOnMaster.systemTablesOnly=true > & hbase.balancer.tablesOnMaster=true , it seems not work. because > currently > we initialize the master logic firstly, then region logic for the HMaster > process, and it should be ... > > > Any suggestion ? >
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
I'm also +1 for splitting RS startup into multiple steps. Looking at the linked JIRA and the parent issue it was not immediately apparent if this is an issue for 2.0 or not - can somebody clarify? On Thu, Mar 15, 2018 at 5:14 AM, 张铎(Duo Zhang) wrote: > I'm +1 on the second solution. > > 2018-03-15 16:59 GMT+08:00 Guanghao Zhang : > > > From a more general perspective, this may be a general problem as we may > > move more and more data from zookeeper to system table. Or we may have > more > > features to create new system table. But if the RS relays some system > table > > to start up, we will meet a dead lock... > > > > One solution is let master to serve system table only. So the cluster > > startup will have two step. First startup master to serve system table. > > Then start region servers. But the problem is master will have > > more responsibility and may be a bottleneck. > > > > Another solution is break RS startup progress to two steps. First step is > > "serve system table only". Second step is "totally startup and serve any > > tables". It means we will import a new state for RS startup. A RS's > startup > > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But this may > > need more refactor for our RS code. > > > > Thanks. > > > > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) : > > > > > Oh, it should be 'The replication peer related data is small'. > > > > > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) : > > > > > > > I think this is a bit awkward... A region server even does not need > the > > > > meta table to be online when starting, but it needs another system > > table > > > > when starting... > > > > > > > > I think unless we can make the regionserver start without > replication, > > > and > > > > initialize it later, otherwise we can not break the tie. Having a > > special > > > > 'region server' seems a bad smell to me. What's the advantage > comparing > > > to > > > > zk? > > > > > > > > BTW, I believe that we only need the ReplicationPeerStorage to be > > > > available when starting a region server, so we can keep this data in > > zk, > > > > and storage the queue related data to hbase:replication table? The > > > > replication related data is small so I think this is OK. > > > > > > > > Thanks. > > > > > > > > 2018-03-15 14:55 GMT+08:00 OpenInx : > > > > > > > >> Hi : > > > >> > > > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166? > > > >> focusedCommentId=16399886&page=com.atlassian.jira. > > > >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886) > > > >> > > > >> There's a really big problem here if we use table based replication > to > > > >> start a hbase cluster: > > > >> > > > >> For HMaster process, it works as following: > > > >> 1. Start active master initialization . > > > >> 2. Master wait rs report in . > > > >> 3. Master assign meta region to one of the region servers . > > > >> 4. Master create hbase:replication table if not exist. > > > >> > > > >> But the RS need to finish initialize the replication source & sink > > > before > > > >> finish startup( and the initialization of replication source & sink > > must > > > >> finish before opening region, because we need to listen the wal > event, > > > >> otherwise our replication may lost data), and when initialize the > > > source & > > > >> sink , we need to read hbase:replication table which hasn't been > > > avaiable > > > >> because our master is waiting rs to be OK, and the rs is waiting > > > >> hbase:replication to be OK ... a dead loop happen again ... > > > >> > > > >> After discussed with Guanghao Zhang offline, I'm considering that > try > > to > > > >> assign all system table to a rs which only accept regions of system > > > table > > > >> assignment (The rs will skip to initialize the replication source or > > > sink > > > >> )... > > > >> > > > >> I've tried to start a mini cluster by setting > > > >> hbase.balancer.tablesOnMaster.systemTablesOnly=true > > > >> & hbase.balancer.tablesOnMaster=true , it seems not work. because > > > >> currently > > > >> we initialize the master logic firstly, then region logic for the > > > HMaster > > > >> process, and it should be ... > > > >> > > > >> > > > >> Any suggestion ? > > > >> > > > > > > > > > > > > > >
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
I'm +1 on the second solution. 2018-03-15 16:59 GMT+08:00 Guanghao Zhang : > From a more general perspective, this may be a general problem as we may > move more and more data from zookeeper to system table. Or we may have more > features to create new system table. But if the RS relays some system table > to start up, we will meet a dead lock... > > One solution is let master to serve system table only. So the cluster > startup will have two step. First startup master to serve system table. > Then start region servers. But the problem is master will have > more responsibility and may be a bottleneck. > > Another solution is break RS startup progress to two steps. First step is > "serve system table only". Second step is "totally startup and serve any > tables". It means we will import a new state for RS startup. A RS's startup > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But this may > need more refactor for our RS code. > > Thanks. > > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) : > > > Oh, it should be 'The replication peer related data is small'. > > > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) : > > > > > I think this is a bit awkward... A region server even does not need the > > > meta table to be online when starting, but it needs another system > table > > > when starting... > > > > > > I think unless we can make the regionserver start without replication, > > and > > > initialize it later, otherwise we can not break the tie. Having a > special > > > 'region server' seems a bad smell to me. What's the advantage comparing > > to > > > zk? > > > > > > BTW, I believe that we only need the ReplicationPeerStorage to be > > > available when starting a region server, so we can keep this data in > zk, > > > and storage the queue related data to hbase:replication table? The > > > replication related data is small so I think this is OK. > > > > > > Thanks. > > > > > > 2018-03-15 14:55 GMT+08:00 OpenInx : > > > > > >> Hi : > > >> > > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166? > > >> focusedCommentId=16399886&page=com.atlassian.jira. > > >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886) > > >> > > >> There's a really big problem here if we use table based replication to > > >> start a hbase cluster: > > >> > > >> For HMaster process, it works as following: > > >> 1. Start active master initialization . > > >> 2. Master wait rs report in . > > >> 3. Master assign meta region to one of the region servers . > > >> 4. Master create hbase:replication table if not exist. > > >> > > >> But the RS need to finish initialize the replication source & sink > > before > > >> finish startup( and the initialization of replication source & sink > must > > >> finish before opening region, because we need to listen the wal event, > > >> otherwise our replication may lost data), and when initialize the > > source & > > >> sink , we need to read hbase:replication table which hasn't been > > avaiable > > >> because our master is waiting rs to be OK, and the rs is waiting > > >> hbase:replication to be OK ... a dead loop happen again ... > > >> > > >> After discussed with Guanghao Zhang offline, I'm considering that try > to > > >> assign all system table to a rs which only accept regions of system > > table > > >> assignment (The rs will skip to initialize the replication source or > > sink > > >> )... > > >> > > >> I've tried to start a mini cluster by setting > > >> hbase.balancer.tablesOnMaster.systemTablesOnly=true > > >> & hbase.balancer.tablesOnMaster=true , it seems not work. because > > >> currently > > >> we initialize the master logic firstly, then region logic for the > > HMaster > > >> process, and it should be ... > > >> > > >> > > >> Any suggestion ? > > >> > > > > > > > > >
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
>From a more general perspective, this may be a general problem as we may move more and more data from zookeeper to system table. Or we may have more features to create new system table. But if the RS relays some system table to start up, we will meet a dead lock... One solution is let master to serve system table only. So the cluster startup will have two step. First startup master to serve system table. Then start region servers. But the problem is master will have more responsibility and may be a bottleneck. Another solution is break RS startup progress to two steps. First step is "serve system table only". Second step is "totally startup and serve any tables". It means we will import a new state for RS startup. A RS's startup progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But this may need more refactor for our RS code. Thanks. 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) : > Oh, it should be 'The replication peer related data is small'. > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) : > > > I think this is a bit awkward... A region server even does not need the > > meta table to be online when starting, but it needs another system table > > when starting... > > > > I think unless we can make the regionserver start without replication, > and > > initialize it later, otherwise we can not break the tie. Having a special > > 'region server' seems a bad smell to me. What's the advantage comparing > to > > zk? > > > > BTW, I believe that we only need the ReplicationPeerStorage to be > > available when starting a region server, so we can keep this data in zk, > > and storage the queue related data to hbase:replication table? The > > replication related data is small so I think this is OK. > > > > Thanks. > > > > 2018-03-15 14:55 GMT+08:00 OpenInx : > > > >> Hi : > >> > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166? > >> focusedCommentId=16399886&page=com.atlassian.jira. > >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886) > >> > >> There's a really big problem here if we use table based replication to > >> start a hbase cluster: > >> > >> For HMaster process, it works as following: > >> 1. Start active master initialization . > >> 2. Master wait rs report in . > >> 3. Master assign meta region to one of the region servers . > >> 4. Master create hbase:replication table if not exist. > >> > >> But the RS need to finish initialize the replication source & sink > before > >> finish startup( and the initialization of replication source & sink must > >> finish before opening region, because we need to listen the wal event, > >> otherwise our replication may lost data), and when initialize the > source & > >> sink , we need to read hbase:replication table which hasn't been > avaiable > >> because our master is waiting rs to be OK, and the rs is waiting > >> hbase:replication to be OK ... a dead loop happen again ... > >> > >> After discussed with Guanghao Zhang offline, I'm considering that try to > >> assign all system table to a rs which only accept regions of system > table > >> assignment (The rs will skip to initialize the replication source or > sink > >> )... > >> > >> I've tried to start a mini cluster by setting > >> hbase.balancer.tablesOnMaster.systemTablesOnly=true > >> & hbase.balancer.tablesOnMaster=true , it seems not work. because > >> currently > >> we initialize the master logic firstly, then region logic for the > HMaster > >> process, and it should be ... > >> > >> > >> Any suggestion ? > >> > > > > >
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
> I think unless we can make the regionserver start without replication, and initialize it later, otherwise we can not break the tie Yes, what we thought before was : we can assign all system table in master because it run as a region server now in 2.0. the problem is once we restart the master, the availability may be affected , so master should be always available. > I believe that we only need the ReplicationPeerStorage to be available when starting a region server, so we can keep this data in zk, and storage the queue related data to hbase:replication table? Yes, If we still keep the peer config & state in zookeeper, the cluster start up will be no problem, and will be a minor change in the code base, but not such elegant. On Thu, Mar 15, 2018 at 3:57 PM, 张铎(Duo Zhang) wrote: > Oh, it should be 'The replication peer related data is small'. > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) : > > > I think this is a bit awkward... A region server even does not need the > > meta table to be online when starting, but it needs another system table > > when starting... > > > > I think unless we can make the regionserver start without replication, > and > > initialize it later, otherwise we can not break the tie. Having a special > > 'region server' seems a bad smell to me. What's the advantage comparing > to > > zk? > > > > BTW, I believe that we only need the ReplicationPeerStorage to be > > available when starting a region server, so we can keep this data in zk, > > and storage the queue related data to hbase:replication table? The > > replication related data is small so I think this is OK. > > > > Thanks. > > > > 2018-03-15 14:55 GMT+08:00 OpenInx : > > > >> Hi : > >> > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166? > >> focusedCommentId=16399886&page=com.atlassian.jira. > >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886) > >> > >> There's a really big problem here if we use table based replication to > >> start a hbase cluster: > >> > >> For HMaster process, it works as following: > >> 1. Start active master initialization . > >> 2. Master wait rs report in . > >> 3. Master assign meta region to one of the region servers . > >> 4. Master create hbase:replication table if not exist. > >> > >> But the RS need to finish initialize the replication source & sink > before > >> finish startup( and the initialization of replication source & sink must > >> finish before opening region, because we need to listen the wal event, > >> otherwise our replication may lost data), and when initialize the > source & > >> sink , we need to read hbase:replication table which hasn't been > avaiable > >> because our master is waiting rs to be OK, and the rs is waiting > >> hbase:replication to be OK ... a dead loop happen again ... > >> > >> After discussed with Guanghao Zhang offline, I'm considering that try to > >> assign all system table to a rs which only accept regions of system > table > >> assignment (The rs will skip to initialize the replication source or > sink > >> )... > >> > >> I've tried to start a mini cluster by setting > >> hbase.balancer.tablesOnMaster.systemTablesOnly=true > >> & hbase.balancer.tablesOnMaster=true , it seems not work. because > >> currently > >> we initialize the master logic firstly, then region logic for the > HMaster > >> process, and it should be ... > >> > >> > >> Any suggestion ? > >> > > > > > -- == Openinx blog : http://openinx.github.io TO BE A GREAT HACKER ! ==
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
Oh, it should be 'The replication peer related data is small'. 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) : > I think this is a bit awkward... A region server even does not need the > meta table to be online when starting, but it needs another system table > when starting... > > I think unless we can make the regionserver start without replication, and > initialize it later, otherwise we can not break the tie. Having a special > 'region server' seems a bad smell to me. What's the advantage comparing to > zk? > > BTW, I believe that we only need the ReplicationPeerStorage to be > available when starting a region server, so we can keep this data in zk, > and storage the queue related data to hbase:replication table? The > replication related data is small so I think this is OK. > > Thanks. > > 2018-03-15 14:55 GMT+08:00 OpenInx : > >> Hi : >> >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166? >> focusedCommentId=16399886&page=com.atlassian.jira. >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886) >> >> There's a really big problem here if we use table based replication to >> start a hbase cluster: >> >> For HMaster process, it works as following: >> 1. Start active master initialization . >> 2. Master wait rs report in . >> 3. Master assign meta region to one of the region servers . >> 4. Master create hbase:replication table if not exist. >> >> But the RS need to finish initialize the replication source & sink before >> finish startup( and the initialization of replication source & sink must >> finish before opening region, because we need to listen the wal event, >> otherwise our replication may lost data), and when initialize the source & >> sink , we need to read hbase:replication table which hasn't been avaiable >> because our master is waiting rs to be OK, and the rs is waiting >> hbase:replication to be OK ... a dead loop happen again ... >> >> After discussed with Guanghao Zhang offline, I'm considering that try to >> assign all system table to a rs which only accept regions of system table >> assignment (The rs will skip to initialize the replication source or sink >> )... >> >> I've tried to start a mini cluster by setting >> hbase.balancer.tablesOnMaster.systemTablesOnly=true >> & hbase.balancer.tablesOnMaster=true , it seems not work. because >> currently >> we initialize the master logic firstly, then region logic for the HMaster >> process, and it should be ... >> >> >> Any suggestion ? >> > >
Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
I think this is a bit awkward... A region server even does not need the meta table to be online when starting, but it needs another system table when starting... I think unless we can make the regionserver start without replication, and initialize it later, otherwise we can not break the tie. Having a special 'region server' seems a bad smell to me. What's the advantage comparing to zk? BTW, I believe that we only need the ReplicationPeerStorage to be available when starting a region server, so we can keep this data in zk, and storage the queue related data to hbase:replication table? The replication related data is small so I think this is OK. Thanks. 2018-03-15 14:55 GMT+08:00 OpenInx : > Hi : > > (Paste from https://issues.apache.org/jira/browse/HBASE-20166? > focusedCommentId=16399886&page=com.atlassian.jira. > plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886) > > There's a really big problem here if we use table based replication to > start a hbase cluster: > > For HMaster process, it works as following: > 1. Start active master initialization . > 2. Master wait rs report in . > 3. Master assign meta region to one of the region servers . > 4. Master create hbase:replication table if not exist. > > But the RS need to finish initialize the replication source & sink before > finish startup( and the initialization of replication source & sink must > finish before opening region, because we need to listen the wal event, > otherwise our replication may lost data), and when initialize the source & > sink , we need to read hbase:replication table which hasn't been avaiable > because our master is waiting rs to be OK, and the rs is waiting > hbase:replication to be OK ... a dead loop happen again ... > > After discussed with Guanghao Zhang offline, I'm considering that try to > assign all system table to a rs which only accept regions of system table > assignment (The rs will skip to initialize the replication source or sink > )... > > I've tried to start a mini cluster by setting > hbase.balancer.tablesOnMaster.systemTablesOnly=true > & hbase.balancer.tablesOnMaster=true , it seems not work. because > currently > we initialize the master logic firstly, then region logic for the HMaster > process, and it should be ... > > > Any suggestion ? >
[DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication
Hi : (Paste from https://issues.apache.org/jira/browse/HBASE-20166? focusedCommentId=16399886&page=com.atlassian.jira. plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886) There's a really big problem here if we use table based replication to start a hbase cluster: For HMaster process, it works as following: 1. Start active master initialization . 2. Master wait rs report in . 3. Master assign meta region to one of the region servers . 4. Master create hbase:replication table if not exist. But the RS need to finish initialize the replication source & sink before finish startup( and the initialization of replication source & sink must finish before opening region, because we need to listen the wal event, otherwise our replication may lost data), and when initialize the source & sink , we need to read hbase:replication table which hasn't been avaiable because our master is waiting rs to be OK, and the rs is waiting hbase:replication to be OK ... a dead loop happen again ... After discussed with Guanghao Zhang offline, I'm considering that try to assign all system table to a rs which only accept regions of system table assignment (The rs will skip to initialize the replication source or sink )... I've tried to start a mini cluster by setting hbase.balancer.tablesOnMaster.systemTablesOnly=true & hbase.balancer.tablesOnMaster=true , it seems not work. because currently we initialize the master logic firstly, then region logic for the HMaster process, and it should be ... Any suggestion ?