bq, A Master startup refactor + WALs-per-system-table sounds like a lot of change for a minor release.
Yes, we've talked offline about this, it is too big. We plan to revert the table based replication storage from the master branch first and open a feature branch for it. Thanks. 2018-03-17 2:11 GMT+08:00 Stack <st...@duboce.net>: > On Thu, Mar 15, 2018 at 8:42 PM, Guanghao Zhang <zghao...@gmail.com> > wrote: > > > > > > > We've done the work to make sure hbase:meta > > > is up before everything else. It has its own WALs so we can split these > > > ahead of user-space WALs, and so on. We've not done the work to for > > > hbase:replication or hbase:namespace, hbase:acl... etc. > > > > If we import a new SYSTEM-TABLE-ONLY state for region server startup, > then > > it is necessary to own WALs for all system tables (not only hbase:meta). > > > > WALs dedicated to system tables would be a new facility. Would be good to > have. We'd have a WAL per system table or they'd share a WAL? The meta-only > WAL was hacked in. Would probably take more work to get system-dedicated > WALs in the mix. > > > > > All system tables have their own wal and split these ahead of user-space > > WALs. And the WAL of system tables no need replication. So we can start a > > region server without replication. After all system table online, region > > server can continue start from SYSTEM-TABLE-ONLY to STARTED. > > > > > A refactor of Master startup is definitely needed. Would like to get > considerations other than just assign order considered but I suppose that > can wait. Your suggested stepped assign sounds fine. What about shutdown > and when Master joins an existing cluster or a host goes down that had > system tables on it? How would stepping work then? Will there be a > hierarchy of assign amongst system tables? Or will it just be meta first, > then general system tables, and then user-space tables? We need to split > meta. How will that impinge on these plans? > > My suggestion of overloading hbase:meta so it can carry the metadata for > replication is less pure but keeps our assign simple. > > A Master startup refactor + WALs-per-system-table sounds like a lot of > change for a minor release. > > Thanks Guanghao, > S > > > > > Thanks. > > > > 2018-03-16 10:12 GMT+08:00 OpenInx <open...@gmail.com>: > > > > > The HBASE-15867 will not be introduced into 2.0.0, I expect to > > introduce > > > in 2.1.0 release . > > > > > > Thanks. > > > > > > On Fri, Mar 16, 2018 at 12:45 AM, Mike Drob <md...@apache.org> wrote: > > > > > > > I'm also +1 for splitting RS startup into multiple steps. > > > > > > > > Looking at the linked JIRA and the parent issue it was not > immediately > > > > apparent if this is an issue for 2.0 or not - can somebody clarify? > > > > > > > > On Thu, Mar 15, 2018 at 5:14 AM, 张铎(Duo Zhang) < > palomino...@gmail.com> > > > > wrote: > > > > > > > > > I'm +1 on the second solution. > > > > > > > > > > 2018-03-15 16:59 GMT+08:00 Guanghao Zhang <zghao...@gmail.com>: > > > > > > > > > > > From a more general perspective, this may be a general problem as > > we > > > > may > > > > > > move more and more data from zookeeper to system table. Or we may > > > have > > > > > more > > > > > > features to create new system table. But if the RS relays some > > system > > > > > table > > > > > > to start up, we will meet a dead lock... > > > > > > > > > > > > One solution is let master to serve system table only. So the > > cluster > > > > > > startup will have two step. First startup master to serve system > > > table. > > > > > > Then start region servers. But the problem is master will have > > > > > > more responsibility and may be a bottleneck. > > > > > > > > > > > > Another solution is break RS startup progress to two steps. First > > > step > > > > is > > > > > > "serve system table only". Second step is "totally startup and > > serve > > > > any > > > > > > tables". It means we will import a new state for RS startup. A > RS's > > > > > startup > > > > > > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But > > this > > > > may > > > > > > need more refactor for our RS code. > > > > > > > > > > > > Thanks. > > > > > > > > > > > > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) <palomino...@gmail.com > >: > > > > > > > > > > > > > Oh, it should be 'The replication peer related data is small'. > > > > > > > > > > > > > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) < > palomino...@gmail.com > > >: > > > > > > > > > > > > > > > I think this is a bit awkward... A region server even does > not > > > need > > > > > the > > > > > > > > meta table to be online when starting, but it needs another > > > system > > > > > > table > > > > > > > > when starting... > > > > > > > > > > > > > > > > I think unless we can make the regionserver start without > > > > > replication, > > > > > > > and > > > > > > > > initialize it later, otherwise we can not break the tie. > > Having a > > > > > > special > > > > > > > > 'region server' seems a bad smell to me. What's the advantage > > > > > comparing > > > > > > > to > > > > > > > > zk? > > > > > > > > > > > > > > > > BTW, I believe that we only need the ReplicationPeerStorage > to > > be > > > > > > > > available when starting a region server, so we can keep this > > data > > > > in > > > > > > zk, > > > > > > > > and storage the queue related data to hbase:replication > table? > > > The > > > > > > > > replication related data is small so I think this is OK. > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > 2018-03-15 14:55 GMT+08:00 OpenInx <open...@gmail.com>: > > > > > > > > > > > > > > > >> Hi : > > > > > > > >> > > > > > > > >> (Paste from https://issues.apache.org/ > jira/browse/HBASE-20166 > > ? > > > > > > > >> focusedCommentId=16399886&page=com.atlassian.jira. > > > > > > > >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment- > > > 16399886) > > > > > > > >> > > > > > > > >> There's a really big problem here if we use table based > > > > replication > > > > > to > > > > > > > >> start a hbase cluster: > > > > > > > >> > > > > > > > >> For HMaster process, it works as following: > > > > > > > >> 1. Start active master initialization . > > > > > > > >> 2. Master wait rs report in . > > > > > > > >> 3. Master assign meta region to one of the region servers . > > > > > > > >> 4. Master create hbase:replication table if not exist. > > > > > > > >> > > > > > > > >> But the RS need to finish initialize the replication source > & > > > sink > > > > > > > before > > > > > > > >> finish startup( and the initialization of replication > source & > > > > sink > > > > > > must > > > > > > > >> finish before opening region, because we need to listen the > > wal > > > > > event, > > > > > > > >> otherwise our replication may lost data), and when > initialize > > > the > > > > > > > source & > > > > > > > >> sink , we need to read hbase:replication table which hasn't > > been > > > > > > > avaiable > > > > > > > >> because our master is waiting rs to be OK, and the rs is > > waiting > > > > > > > >> hbase:replication to be OK ... a dead loop happen again ... > > > > > > > >> > > > > > > > >> After discussed with Guanghao Zhang offline, I'm considering > > > that > > > > > try > > > > > > to > > > > > > > >> assign all system table to a rs which only accept regions of > > > > system > > > > > > > table > > > > > > > >> assignment (The rs will skip to initialize the replication > > > source > > > > or > > > > > > > sink > > > > > > > >> )... > > > > > > > >> > > > > > > > >> I've tried to start a mini cluster by setting > > > > > > > >> hbase.balancer.tablesOnMaster.systemTablesOnly=true > > > > > > > >> & hbase.balancer.tablesOnMaster=true , it seems not work. > > > because > > > > > > > >> currently > > > > > > > >> we initialize the master logic firstly, then region logic > for > > > the > > > > > > > HMaster > > > > > > > >> process, and it should be ... > > > > > > > >> > > > > > > > >> > > > > > > > >> Any suggestion ? > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > ============================== > > > Openinx blog : http://openinx.github.io > > > > > > TO BE A GREAT HACKER ! > > > ============================== > > > > > >