Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-16 Thread Duo Zhang
bq,  A Master startup refactor + WALs-per-system-table sounds like a lot of
change for a minor release.

Yes, we've talked offline about this, it is too big. We plan to revert the
table based replication storage from the master branch first and open a
feature branch for it.

Thanks.

2018-03-17 2:11 GMT+08:00 Stack :

> On Thu, Mar 15, 2018 at 8:42 PM, Guanghao Zhang 
> wrote:
>
> > >
> > > We've done the work to make sure hbase:meta
> > > is up before everything else. It has its own WALs so we can split these
> > > ahead of user-space WALs, and so on. We've not done the work to for
> > > hbase:replication or hbase:namespace, hbase:acl... etc.
> >
> > If we import a new SYSTEM-TABLE-ONLY state for region server startup,
> then
> > it is necessary to own WALs for all system tables (not only hbase:meta).
> >
>
> WALs dedicated to system tables would be a new facility. Would be good to
> have. We'd have a WAL per system table or they'd share a WAL? The meta-only
> WAL was hacked in. Would probably take more work to get system-dedicated
> WALs in the mix.
>
>
>
> > All system tables have their own wal and split these ahead of user-space
> > WALs. And the WAL of system tables no need replication. So we can start a
> > region server without replication. After all system table online, region
> > server can continue start from SYSTEM-TABLE-ONLY to STARTED.
> >
> >
> A refactor of Master startup is definitely needed. Would like to get
> considerations other than just assign order considered but I suppose that
> can wait. Your suggested stepped assign sounds fine. What about shutdown
> and when Master joins an existing cluster or a host goes down that had
> system tables on it? How would stepping work then? Will there be a
> hierarchy of assign amongst system tables? Or will it just be meta first,
> then general system tables, and then user-space tables? We need to split
> meta. How will that impinge on these plans?
>
> My suggestion of overloading hbase:meta so it can carry the metadata for
> replication is less pure but keeps our assign simple.
>
> A Master startup refactor + WALs-per-system-table sounds like a lot of
> change for a minor release.
>
> Thanks Guanghao,
> S
>
>
>
> > Thanks.
> >
> > 2018-03-16 10:12 GMT+08:00 OpenInx :
> >
> > > The HBASE-15867 will not be introduced into 2.0.0,   I expect to
> > introduce
> > > in 2.1.0 release .
> > >
> > > Thanks.
> > >
> > > On Fri, Mar 16, 2018 at 12:45 AM, Mike Drob  wrote:
> > >
> > > > I'm also +1 for splitting RS startup into multiple steps.
> > > >
> > > > Looking at the linked JIRA and the parent issue it was not
> immediately
> > > > apparent if this is an issue for 2.0 or not - can somebody clarify?
> > > >
> > > > On Thu, Mar 15, 2018 at 5:14 AM, 张铎(Duo Zhang) <
> palomino...@gmail.com>
> > > > wrote:
> > > >
> > > > > I'm +1 on the second solution.
> > > > >
> > > > > 2018-03-15 16:59 GMT+08:00 Guanghao Zhang :
> > > > >
> > > > > > From a more general perspective, this may be a general problem as
> > we
> > > > may
> > > > > > move more and more data from zookeeper to system table. Or we may
> > > have
> > > > > more
> > > > > > features to create new system table. But if the RS relays some
> > system
> > > > > table
> > > > > > to start up, we will meet a dead lock...
> > > > > >
> > > > > > One solution is let master to serve system table only. So the
> > cluster
> > > > > > startup will have two step. First startup master to serve system
> > > table.
> > > > > > Then start region servers. But the problem is master will have
> > > > > > more responsibility and may be a bottleneck.
> > > > > >
> > > > > > Another solution is break RS startup progress to two steps. First
> > > step
> > > > is
> > > > > > "serve system table only". Second step is "totally startup and
> > serve
> > > > any
> > > > > > tables". It means we will import a new state for RS startup. A
> RS's
> > > > > startup
> > > > > > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But
> > this
> > > > may
> > > > > > need more refactor for our RS code.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang)  >:
> > > > > >
> > > > > > > Oh, it should be 'The replication peer related data is small'.
> > > > > > >
> > > > > > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) <
> palomino...@gmail.com
> > >:
> > > > > > >
> > > > > > > > I think this is a bit awkward... A region server even does
> not
> > > need
> > > > > the
> > > > > > > > meta table to be online when starting, but it needs another
> > > system
> > > > > > table
> > > > > > > > when starting...
> > > > > > > >
> > > > > > > > I think unless we can make the regionserver start without
> > > > > replication,
> > > > > > > and
> > > > > > > > initialize it later, otherwise we can not break the tie.
> > Having a
> > > > > > special
> > > > > > > > 'region server' seems a bad smell to me. What's the advantage
> > > > > comparing
> > > > > > > to
> > > > > > > > zk?
> > > > > > > >

Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-16 Thread Stack
On Thu, Mar 15, 2018 at 8:42 PM, Guanghao Zhang  wrote:

> >
> > We've done the work to make sure hbase:meta
> > is up before everything else. It has its own WALs so we can split these
> > ahead of user-space WALs, and so on. We've not done the work to for
> > hbase:replication or hbase:namespace, hbase:acl... etc.
>
> If we import a new SYSTEM-TABLE-ONLY state for region server startup, then
> it is necessary to own WALs for all system tables (not only hbase:meta).
>

WALs dedicated to system tables would be a new facility. Would be good to
have. We'd have a WAL per system table or they'd share a WAL? The meta-only
WAL was hacked in. Would probably take more work to get system-dedicated
WALs in the mix.



> All system tables have their own wal and split these ahead of user-space
> WALs. And the WAL of system tables no need replication. So we can start a
> region server without replication. After all system table online, region
> server can continue start from SYSTEM-TABLE-ONLY to STARTED.
>
>
A refactor of Master startup is definitely needed. Would like to get
considerations other than just assign order considered but I suppose that
can wait. Your suggested stepped assign sounds fine. What about shutdown
and when Master joins an existing cluster or a host goes down that had
system tables on it? How would stepping work then? Will there be a
hierarchy of assign amongst system tables? Or will it just be meta first,
then general system tables, and then user-space tables? We need to split
meta. How will that impinge on these plans?

My suggestion of overloading hbase:meta so it can carry the metadata for
replication is less pure but keeps our assign simple.

A Master startup refactor + WALs-per-system-table sounds like a lot of
change for a minor release.

Thanks Guanghao,
S



> Thanks.
>
> 2018-03-16 10:12 GMT+08:00 OpenInx :
>
> > The HBASE-15867 will not be introduced into 2.0.0,   I expect to
> introduce
> > in 2.1.0 release .
> >
> > Thanks.
> >
> > On Fri, Mar 16, 2018 at 12:45 AM, Mike Drob  wrote:
> >
> > > I'm also +1 for splitting RS startup into multiple steps.
> > >
> > > Looking at the linked JIRA and the parent issue it was not immediately
> > > apparent if this is an issue for 2.0 or not - can somebody clarify?
> > >
> > > On Thu, Mar 15, 2018 at 5:14 AM, 张铎(Duo Zhang) 
> > > wrote:
> > >
> > > > I'm +1 on the second solution.
> > > >
> > > > 2018-03-15 16:59 GMT+08:00 Guanghao Zhang :
> > > >
> > > > > From a more general perspective, this may be a general problem as
> we
> > > may
> > > > > move more and more data from zookeeper to system table. Or we may
> > have
> > > > more
> > > > > features to create new system table. But if the RS relays some
> system
> > > > table
> > > > > to start up, we will meet a dead lock...
> > > > >
> > > > > One solution is let master to serve system table only. So the
> cluster
> > > > > startup will have two step. First startup master to serve system
> > table.
> > > > > Then start region servers. But the problem is master will have
> > > > > more responsibility and may be a bottleneck.
> > > > >
> > > > > Another solution is break RS startup progress to two steps. First
> > step
> > > is
> > > > > "serve system table only". Second step is "totally startup and
> serve
> > > any
> > > > > tables". It means we will import a new state for RS startup. A RS's
> > > > startup
> > > > > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But
> this
> > > may
> > > > > need more refactor for our RS code.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) :
> > > > >
> > > > > > Oh, it should be 'The replication peer related data is small'.
> > > > > >
> > > > > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang)  >:
> > > > > >
> > > > > > > I think this is a bit awkward... A region server even does not
> > need
> > > > the
> > > > > > > meta table to be online when starting, but it needs another
> > system
> > > > > table
> > > > > > > when starting...
> > > > > > >
> > > > > > > I think unless we can make the regionserver start without
> > > > replication,
> > > > > > and
> > > > > > > initialize it later, otherwise we can not break the tie.
> Having a
> > > > > special
> > > > > > > 'region server' seems a bad smell to me. What's the advantage
> > > > comparing
> > > > > > to
> > > > > > > zk?
> > > > > > >
> > > > > > > BTW, I believe that we only need the ReplicationPeerStorage to
> be
> > > > > > > available when starting a region server, so we can keep this
> data
> > > in
> > > > > zk,
> > > > > > > and storage the queue related data to hbase:replication table?
> > The
> > > > > > > replication related data is small so I think this is OK.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > 2018-03-15 14:55 GMT+08:00 OpenInx :
> > > > > > >
> > > > > > >> Hi :
> > > > > > >>
> > > > > > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166
> ?
> > > > > > >> focusedCommentId=16399886&page=com.atlass

Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-15 Thread Guanghao Zhang
>
> We've done the work to make sure hbase:meta
> is up before everything else. It has its own WALs so we can split these
> ahead of user-space WALs, and so on. We've not done the work to for
> hbase:replication or hbase:namespace, hbase:acl... etc.

If we import a new SYSTEM-TABLE-ONLY state for region server startup, then
it is necessary to own WALs for all system tables (not only hbase:meta).
All system tables have their own wal and split these ahead of user-space
WALs. And the WAL of system tables no need replication. So we can start a
region server without replication. After all system table online, region
server can continue start from SYSTEM-TABLE-ONLY to STARTED.

Thanks.

2018-03-16 10:12 GMT+08:00 OpenInx :

> The HBASE-15867 will not be introduced into 2.0.0,   I expect to introduce
> in 2.1.0 release .
>
> Thanks.
>
> On Fri, Mar 16, 2018 at 12:45 AM, Mike Drob  wrote:
>
> > I'm also +1 for splitting RS startup into multiple steps.
> >
> > Looking at the linked JIRA and the parent issue it was not immediately
> > apparent if this is an issue for 2.0 or not - can somebody clarify?
> >
> > On Thu, Mar 15, 2018 at 5:14 AM, 张铎(Duo Zhang) 
> > wrote:
> >
> > > I'm +1 on the second solution.
> > >
> > > 2018-03-15 16:59 GMT+08:00 Guanghao Zhang :
> > >
> > > > From a more general perspective, this may be a general problem as we
> > may
> > > > move more and more data from zookeeper to system table. Or we may
> have
> > > more
> > > > features to create new system table. But if the RS relays some system
> > > table
> > > > to start up, we will meet a dead lock...
> > > >
> > > > One solution is let master to serve system table only. So the cluster
> > > > startup will have two step. First startup master to serve system
> table.
> > > > Then start region servers. But the problem is master will have
> > > > more responsibility and may be a bottleneck.
> > > >
> > > > Another solution is break RS startup progress to two steps. First
> step
> > is
> > > > "serve system table only". Second step is "totally startup and serve
> > any
> > > > tables". It means we will import a new state for RS startup. A RS's
> > > startup
> > > > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But this
> > may
> > > > need more refactor for our RS code.
> > > >
> > > > Thanks.
> > > >
> > > > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) :
> > > >
> > > > > Oh, it should be 'The replication peer related data is small'.
> > > > >
> > > > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) :
> > > > >
> > > > > > I think this is a bit awkward... A region server even does not
> need
> > > the
> > > > > > meta table to be online when starting, but it needs another
> system
> > > > table
> > > > > > when starting...
> > > > > >
> > > > > > I think unless we can make the regionserver start without
> > > replication,
> > > > > and
> > > > > > initialize it later, otherwise we can not break the tie. Having a
> > > > special
> > > > > > 'region server' seems a bad smell to me. What's the advantage
> > > comparing
> > > > > to
> > > > > > zk?
> > > > > >
> > > > > > BTW, I believe that we only need the ReplicationPeerStorage to be
> > > > > > available when starting a region server, so we can keep this data
> > in
> > > > zk,
> > > > > > and storage the queue related data to hbase:replication table?
> The
> > > > > > replication related data is small so I think this is OK.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > 2018-03-15 14:55 GMT+08:00 OpenInx :
> > > > > >
> > > > > >> Hi :
> > > > > >>
> > > > > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166?
> > > > > >> focusedCommentId=16399886&page=com.atlassian.jira.
> > > > > >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-
> 16399886)
> > > > > >>
> > > > > >> There's a really big problem here if we use table based
> > replication
> > > to
> > > > > >> start a hbase cluster:
> > > > > >>
> > > > > >> For HMaster process, it works as following:
> > > > > >> 1. Start active master initialization .
> > > > > >> 2. Master wait rs report in .
> > > > > >> 3. Master assign meta region to one of the region servers .
> > > > > >> 4. Master create hbase:replication table if not exist.
> > > > > >>
> > > > > >> But the RS need to finish initialize the replication source &
> sink
> > > > > before
> > > > > >> finish startup( and the initialization of replication source &
> > sink
> > > > must
> > > > > >> finish before opening region, because we need to listen the wal
> > > event,
> > > > > >> otherwise our replication may lost data), and when initialize
> the
> > > > > source &
> > > > > >> sink , we need to read hbase:replication table which hasn't been
> > > > > avaiable
> > > > > >> because our master is waiting rs to be OK, and the rs is waiting
> > > > > >> hbase:replication to be OK ... a dead loop happen again ...
> > > > > >>
> > > > > >> After discussed with Guanghao Zhang offline, I'm considering
> that
> > > try
> > > > to
> > > > > >> assign all

Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-15 Thread OpenInx
The HBASE-15867 will not be introduced into 2.0.0,   I expect to introduce
in 2.1.0 release .

Thanks.

On Fri, Mar 16, 2018 at 12:45 AM, Mike Drob  wrote:

> I'm also +1 for splitting RS startup into multiple steps.
>
> Looking at the linked JIRA and the parent issue it was not immediately
> apparent if this is an issue for 2.0 or not - can somebody clarify?
>
> On Thu, Mar 15, 2018 at 5:14 AM, 张铎(Duo Zhang) 
> wrote:
>
> > I'm +1 on the second solution.
> >
> > 2018-03-15 16:59 GMT+08:00 Guanghao Zhang :
> >
> > > From a more general perspective, this may be a general problem as we
> may
> > > move more and more data from zookeeper to system table. Or we may have
> > more
> > > features to create new system table. But if the RS relays some system
> > table
> > > to start up, we will meet a dead lock...
> > >
> > > One solution is let master to serve system table only. So the cluster
> > > startup will have two step. First startup master to serve system table.
> > > Then start region servers. But the problem is master will have
> > > more responsibility and may be a bottleneck.
> > >
> > > Another solution is break RS startup progress to two steps. First step
> is
> > > "serve system table only". Second step is "totally startup and serve
> any
> > > tables". It means we will import a new state for RS startup. A RS's
> > startup
> > > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But this
> may
> > > need more refactor for our RS code.
> > >
> > > Thanks.
> > >
> > > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) :
> > >
> > > > Oh, it should be 'The replication peer related data is small'.
> > > >
> > > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) :
> > > >
> > > > > I think this is a bit awkward... A region server even does not need
> > the
> > > > > meta table to be online when starting, but it needs another system
> > > table
> > > > > when starting...
> > > > >
> > > > > I think unless we can make the regionserver start without
> > replication,
> > > > and
> > > > > initialize it later, otherwise we can not break the tie. Having a
> > > special
> > > > > 'region server' seems a bad smell to me. What's the advantage
> > comparing
> > > > to
> > > > > zk?
> > > > >
> > > > > BTW, I believe that we only need the ReplicationPeerStorage to be
> > > > > available when starting a region server, so we can keep this data
> in
> > > zk,
> > > > > and storage the queue related data to hbase:replication table? The
> > > > > replication related data is small so I think this is OK.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > 2018-03-15 14:55 GMT+08:00 OpenInx :
> > > > >
> > > > >> Hi :
> > > > >>
> > > > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166?
> > > > >> focusedCommentId=16399886&page=com.atlassian.jira.
> > > > >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886)
> > > > >>
> > > > >> There's a really big problem here if we use table based
> replication
> > to
> > > > >> start a hbase cluster:
> > > > >>
> > > > >> For HMaster process, it works as following:
> > > > >> 1. Start active master initialization .
> > > > >> 2. Master wait rs report in .
> > > > >> 3. Master assign meta region to one of the region servers .
> > > > >> 4. Master create hbase:replication table if not exist.
> > > > >>
> > > > >> But the RS need to finish initialize the replication source & sink
> > > > before
> > > > >> finish startup( and the initialization of replication source &
> sink
> > > must
> > > > >> finish before opening region, because we need to listen the wal
> > event,
> > > > >> otherwise our replication may lost data), and when initialize the
> > > > source &
> > > > >> sink , we need to read hbase:replication table which hasn't been
> > > > avaiable
> > > > >> because our master is waiting rs to be OK, and the rs is waiting
> > > > >> hbase:replication to be OK ... a dead loop happen again ...
> > > > >>
> > > > >> After discussed with Guanghao Zhang offline, I'm considering that
> > try
> > > to
> > > > >> assign all system table to a rs which only accept regions of
> system
> > > > table
> > > > >> assignment (The rs will skip to initialize the replication source
> or
> > > > sink
> > > > >> )...
> > > > >>
> > > > >> I've tried to start a mini cluster by setting
> > > > >> hbase.balancer.tablesOnMaster.systemTablesOnly=true
> > > > >> & hbase.balancer.tablesOnMaster=true , it seems not work. because
> > > > >> currently
> > > > >> we initialize the master logic firstly, then region logic for the
> > > > HMaster
> > > > >> process, and it should be ...
> > > > >>
> > > > >>
> > > > >> Any  suggestion  ?
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>



-- 
==
Openinx  blog : http://openinx.github.io

TO BE A GREAT HACKER !
==


Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-15 Thread Stack
On Thu, Mar 15, 2018 at 5:39 PM, 张铎(Duo Zhang) 
wrote:

>
> But for other stuffs for replication, such as the WAL files we need to
> replicate, the row key will be a peer id, and multiplies the server name,
> and maybe also the file name? This is a completely different thing. If we
> put this in meta, the row key will be messed up...
>
>
Is there some artifice that would allow us shape this info so it fit the
meta schema: e.g. a table that we do not assign (special namespace? special
tablename? hbase:replication?). Then perhaps the regions in meta have peer
as the start row etc.

St.Ack



> 2018-03-16 3:01 GMT+08:00 Stack :
>
> > On Wed, Mar 14, 2018 at 11:55 PM, OpenInx  wrote:
> >
> > > Hi :
> > >
> > > (Paste from https://issues.apache.org/jira/browse/HBASE-20166?
> > > focusedCommentId=16399886&page=com.atlassian.jira.
> > > plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886)
> > >
> > > There's a really big problem here if we use table based replication to
> > > start a hbase cluster:
> > >
> > > For HMaster process, it works as following:
> > > 1. Start active master initialization .
> > > 2. Master wait rs report in .
> > > 3. Master assign meta region to one of the region servers .
> > > 4. Master create hbase:replication table if not exist.
> > >
> > >
> > We have to have a new system table? Can't we add a column family on
> > hbase:meta that keeps offsets? We've done the work to make sure
> hbase:meta
> > is up before everything else. It has its own WALs so we can split these
> > ahead of user-space WALs, and so on. We've not done the work to for
> > hbase:replication or hbase:namespace, hbase:acl... etc.
> >
> > Means more loading on hbase:meta and it is going to get bigger but I'd
> > rather work on splitting meta than on figuring how to preassign
> > miscellaneous system tables; one-per-feature.
> >
> >
> >
> > > But the RS need to finish initialize the replication source & sink
> before
> > > finish startup( and the initialization of replication source & sink
> must
> > > finish before opening region, because we need to listen the wal event,
> > > otherwise our replication may lost data), and when initialize the
> source
> > &
> > > sink , we need to read hbase:replication table which hasn't been
> avaiable
> > > because our master is waiting rs to be OK, and the rs is waiting
> > > hbase:replication to be OK ... a dead loop happen again ...
> > >
> > > After discussed with Guanghao Zhang offline, I'm considering that try
> to
> > > assign all system table to a rs which only accept regions of system
> table
> > > assignment (The rs will skip to initialize the replication source or
> sink
> > > )...
> > >
> > >
> > Can we avoid this sort of special-casing?
> >
> > St.Ack
> >
> >
> >
> > > I've tried to start a mini cluster by setting
> > > hbase.balancer.tablesOnMaster.systemTablesOnly=true
> > > & hbase.balancer.tablesOnMaster=true , it seems not work. because
> > > currently
> > > we initialize the master logic firstly, then region logic for the
> HMaster
> > > process, and it should be ...
> > >
> > >
> > > Any  suggestion  ?
> > >
> >
>


Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-15 Thread Duo Zhang
The problem for putting more things in meta is that, the row key pattern
are different. For example, when re-implementing the serial replication
feature, the 'replication barrier', which is actually the sequence of the
open sequence number for a region, is stored in meta as the row key can the
region name, so it is OK.

But for other stuffs for replication, such as the WAL files we need to
replicate, the row key will be a peer id, and multiplies the server name,
and maybe also the file name? This is a completely different thing. If we
put this in meta, the row key will be messed up...

2018-03-16 3:01 GMT+08:00 Stack :

> On Wed, Mar 14, 2018 at 11:55 PM, OpenInx  wrote:
>
> > Hi :
> >
> > (Paste from https://issues.apache.org/jira/browse/HBASE-20166?
> > focusedCommentId=16399886&page=com.atlassian.jira.
> > plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886)
> >
> > There's a really big problem here if we use table based replication to
> > start a hbase cluster:
> >
> > For HMaster process, it works as following:
> > 1. Start active master initialization .
> > 2. Master wait rs report in .
> > 3. Master assign meta region to one of the region servers .
> > 4. Master create hbase:replication table if not exist.
> >
> >
> We have to have a new system table? Can't we add a column family on
> hbase:meta that keeps offsets? We've done the work to make sure hbase:meta
> is up before everything else. It has its own WALs so we can split these
> ahead of user-space WALs, and so on. We've not done the work to for
> hbase:replication or hbase:namespace, hbase:acl... etc.
>
> Means more loading on hbase:meta and it is going to get bigger but I'd
> rather work on splitting meta than on figuring how to preassign
> miscellaneous system tables; one-per-feature.
>
>
>
> > But the RS need to finish initialize the replication source & sink before
> > finish startup( and the initialization of replication source & sink must
> > finish before opening region, because we need to listen the wal event,
> > otherwise our replication may lost data), and when initialize the source
> &
> > sink , we need to read hbase:replication table which hasn't been avaiable
> > because our master is waiting rs to be OK, and the rs is waiting
> > hbase:replication to be OK ... a dead loop happen again ...
> >
> > After discussed with Guanghao Zhang offline, I'm considering that try to
> > assign all system table to a rs which only accept regions of system table
> > assignment (The rs will skip to initialize the replication source or sink
> > )...
> >
> >
> Can we avoid this sort of special-casing?
>
> St.Ack
>
>
>
> > I've tried to start a mini cluster by setting
> > hbase.balancer.tablesOnMaster.systemTablesOnly=true
> > & hbase.balancer.tablesOnMaster=true , it seems not work. because
> > currently
> > we initialize the master logic firstly, then region logic for the HMaster
> > process, and it should be ...
> >
> >
> > Any  suggestion  ?
> >
>


Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-15 Thread Stack
On Wed, Mar 14, 2018 at 11:55 PM, OpenInx  wrote:

> Hi :
>
> (Paste from https://issues.apache.org/jira/browse/HBASE-20166?
> focusedCommentId=16399886&page=com.atlassian.jira.
> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886)
>
> There's a really big problem here if we use table based replication to
> start a hbase cluster:
>
> For HMaster process, it works as following:
> 1. Start active master initialization .
> 2. Master wait rs report in .
> 3. Master assign meta region to one of the region servers .
> 4. Master create hbase:replication table if not exist.
>
>
We have to have a new system table? Can't we add a column family on
hbase:meta that keeps offsets? We've done the work to make sure hbase:meta
is up before everything else. It has its own WALs so we can split these
ahead of user-space WALs, and so on. We've not done the work to for
hbase:replication or hbase:namespace, hbase:acl... etc.

Means more loading on hbase:meta and it is going to get bigger but I'd
rather work on splitting meta than on figuring how to preassign
miscellaneous system tables; one-per-feature.



> But the RS need to finish initialize the replication source & sink before
> finish startup( and the initialization of replication source & sink must
> finish before opening region, because we need to listen the wal event,
> otherwise our replication may lost data), and when initialize the source &
> sink , we need to read hbase:replication table which hasn't been avaiable
> because our master is waiting rs to be OK, and the rs is waiting
> hbase:replication to be OK ... a dead loop happen again ...
>
> After discussed with Guanghao Zhang offline, I'm considering that try to
> assign all system table to a rs which only accept regions of system table
> assignment (The rs will skip to initialize the replication source or sink
> )...
>
>
Can we avoid this sort of special-casing?

St.Ack



> I've tried to start a mini cluster by setting
> hbase.balancer.tablesOnMaster.systemTablesOnly=true
> & hbase.balancer.tablesOnMaster=true , it seems not work. because
> currently
> we initialize the master logic firstly, then region logic for the HMaster
> process, and it should be ...
>
>
> Any  suggestion  ?
>


Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-15 Thread Mike Drob
I'm also +1 for splitting RS startup into multiple steps.

Looking at the linked JIRA and the parent issue it was not immediately
apparent if this is an issue for 2.0 or not - can somebody clarify?

On Thu, Mar 15, 2018 at 5:14 AM, 张铎(Duo Zhang) 
wrote:

> I'm +1 on the second solution.
>
> 2018-03-15 16:59 GMT+08:00 Guanghao Zhang :
>
> > From a more general perspective, this may be a general problem as we may
> > move more and more data from zookeeper to system table. Or we may have
> more
> > features to create new system table. But if the RS relays some system
> table
> > to start up, we will meet a dead lock...
> >
> > One solution is let master to serve system table only. So the cluster
> > startup will have two step. First startup master to serve system table.
> > Then start region servers. But the problem is master will have
> > more responsibility and may be a bottleneck.
> >
> > Another solution is break RS startup progress to two steps. First step is
> > "serve system table only". Second step is "totally startup and serve any
> > tables". It means we will import a new state for RS startup. A RS's
> startup
> > progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But this may
> > need more refactor for our RS code.
> >
> > Thanks.
> >
> > 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) :
> >
> > > Oh, it should be 'The replication peer related data is small'.
> > >
> > > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) :
> > >
> > > > I think this is a bit awkward... A region server even does not need
> the
> > > > meta table to be online when starting, but it needs another system
> > table
> > > > when starting...
> > > >
> > > > I think unless we can make the regionserver start without
> replication,
> > > and
> > > > initialize it later, otherwise we can not break the tie. Having a
> > special
> > > > 'region server' seems a bad smell to me. What's the advantage
> comparing
> > > to
> > > > zk?
> > > >
> > > > BTW, I believe that we only need the ReplicationPeerStorage to be
> > > > available when starting a region server, so we can keep this data in
> > zk,
> > > > and storage the queue related data to hbase:replication table? The
> > > > replication related data is small so I think this is OK.
> > > >
> > > > Thanks.
> > > >
> > > > 2018-03-15 14:55 GMT+08:00 OpenInx :
> > > >
> > > >> Hi :
> > > >>
> > > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166?
> > > >> focusedCommentId=16399886&page=com.atlassian.jira.
> > > >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886)
> > > >>
> > > >> There's a really big problem here if we use table based replication
> to
> > > >> start a hbase cluster:
> > > >>
> > > >> For HMaster process, it works as following:
> > > >> 1. Start active master initialization .
> > > >> 2. Master wait rs report in .
> > > >> 3. Master assign meta region to one of the region servers .
> > > >> 4. Master create hbase:replication table if not exist.
> > > >>
> > > >> But the RS need to finish initialize the replication source & sink
> > > before
> > > >> finish startup( and the initialization of replication source & sink
> > must
> > > >> finish before opening region, because we need to listen the wal
> event,
> > > >> otherwise our replication may lost data), and when initialize the
> > > source &
> > > >> sink , we need to read hbase:replication table which hasn't been
> > > avaiable
> > > >> because our master is waiting rs to be OK, and the rs is waiting
> > > >> hbase:replication to be OK ... a dead loop happen again ...
> > > >>
> > > >> After discussed with Guanghao Zhang offline, I'm considering that
> try
> > to
> > > >> assign all system table to a rs which only accept regions of system
> > > table
> > > >> assignment (The rs will skip to initialize the replication source or
> > > sink
> > > >> )...
> > > >>
> > > >> I've tried to start a mini cluster by setting
> > > >> hbase.balancer.tablesOnMaster.systemTablesOnly=true
> > > >> & hbase.balancer.tablesOnMaster=true , it seems not work. because
> > > >> currently
> > > >> we initialize the master logic firstly, then region logic for the
> > > HMaster
> > > >> process, and it should be ...
> > > >>
> > > >>
> > > >> Any  suggestion  ?
> > > >>
> > > >
> > > >
> > >
> >
>


Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-15 Thread Duo Zhang
I'm +1 on the second solution.

2018-03-15 16:59 GMT+08:00 Guanghao Zhang :

> From a more general perspective, this may be a general problem as we may
> move more and more data from zookeeper to system table. Or we may have more
> features to create new system table. But if the RS relays some system table
> to start up, we will meet a dead lock...
>
> One solution is let master to serve system table only. So the cluster
> startup will have two step. First startup master to serve system table.
> Then start region servers. But the problem is master will have
> more responsibility and may be a bottleneck.
>
> Another solution is break RS startup progress to two steps. First step is
> "serve system table only". Second step is "totally startup and serve any
> tables". It means we will import a new state for RS startup. A RS's startup
> progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But this may
> need more refactor for our RS code.
>
> Thanks.
>
> 2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) :
>
> > Oh, it should be 'The replication peer related data is small'.
> >
> > 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) :
> >
> > > I think this is a bit awkward... A region server even does not need the
> > > meta table to be online when starting, but it needs another system
> table
> > > when starting...
> > >
> > > I think unless we can make the regionserver start without replication,
> > and
> > > initialize it later, otherwise we can not break the tie. Having a
> special
> > > 'region server' seems a bad smell to me. What's the advantage comparing
> > to
> > > zk?
> > >
> > > BTW, I believe that we only need the ReplicationPeerStorage to be
> > > available when starting a region server, so we can keep this data in
> zk,
> > > and storage the queue related data to hbase:replication table? The
> > > replication related data is small so I think this is OK.
> > >
> > > Thanks.
> > >
> > > 2018-03-15 14:55 GMT+08:00 OpenInx :
> > >
> > >> Hi :
> > >>
> > >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166?
> > >> focusedCommentId=16399886&page=com.atlassian.jira.
> > >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886)
> > >>
> > >> There's a really big problem here if we use table based replication to
> > >> start a hbase cluster:
> > >>
> > >> For HMaster process, it works as following:
> > >> 1. Start active master initialization .
> > >> 2. Master wait rs report in .
> > >> 3. Master assign meta region to one of the region servers .
> > >> 4. Master create hbase:replication table if not exist.
> > >>
> > >> But the RS need to finish initialize the replication source & sink
> > before
> > >> finish startup( and the initialization of replication source & sink
> must
> > >> finish before opening region, because we need to listen the wal event,
> > >> otherwise our replication may lost data), and when initialize the
> > source &
> > >> sink , we need to read hbase:replication table which hasn't been
> > avaiable
> > >> because our master is waiting rs to be OK, and the rs is waiting
> > >> hbase:replication to be OK ... a dead loop happen again ...
> > >>
> > >> After discussed with Guanghao Zhang offline, I'm considering that try
> to
> > >> assign all system table to a rs which only accept regions of system
> > table
> > >> assignment (The rs will skip to initialize the replication source or
> > sink
> > >> )...
> > >>
> > >> I've tried to start a mini cluster by setting
> > >> hbase.balancer.tablesOnMaster.systemTablesOnly=true
> > >> & hbase.balancer.tablesOnMaster=true , it seems not work. because
> > >> currently
> > >> we initialize the master logic firstly, then region logic for the
> > HMaster
> > >> process, and it should be ...
> > >>
> > >>
> > >> Any  suggestion  ?
> > >>
> > >
> > >
> >
>


Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-15 Thread Guanghao Zhang
>From a more general perspective, this may be a general problem as we may
move more and more data from zookeeper to system table. Or we may have more
features to create new system table. But if the RS relays some system table
to start up, we will meet a dead lock...

One solution is let master to serve system table only. So the cluster
startup will have two step. First startup master to serve system table.
Then start region servers. But the problem is master will have
more responsibility and may be a bottleneck.

Another solution is break RS startup progress to two steps. First step is
"serve system table only". Second step is "totally startup and serve any
tables". It means we will import a new state for RS startup. A RS's startup
progress will be STOPPED ==> SYSTEM-TABLE-ONLY ==> STARTED. But this may
need more refactor for our RS code.

Thanks.

2018-03-15 15:57 GMT+08:00 张铎(Duo Zhang) :

> Oh, it should be 'The replication peer related data is small'.
>
> 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) :
>
> > I think this is a bit awkward... A region server even does not need the
> > meta table to be online when starting, but it needs another system table
> > when starting...
> >
> > I think unless we can make the regionserver start without replication,
> and
> > initialize it later, otherwise we can not break the tie. Having a special
> > 'region server' seems a bad smell to me. What's the advantage comparing
> to
> > zk?
> >
> > BTW, I believe that we only need the ReplicationPeerStorage to be
> > available when starting a region server, so we can keep this data in zk,
> > and storage the queue related data to hbase:replication table? The
> > replication related data is small so I think this is OK.
> >
> > Thanks.
> >
> > 2018-03-15 14:55 GMT+08:00 OpenInx :
> >
> >> Hi :
> >>
> >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166?
> >> focusedCommentId=16399886&page=com.atlassian.jira.
> >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886)
> >>
> >> There's a really big problem here if we use table based replication to
> >> start a hbase cluster:
> >>
> >> For HMaster process, it works as following:
> >> 1. Start active master initialization .
> >> 2. Master wait rs report in .
> >> 3. Master assign meta region to one of the region servers .
> >> 4. Master create hbase:replication table if not exist.
> >>
> >> But the RS need to finish initialize the replication source & sink
> before
> >> finish startup( and the initialization of replication source & sink must
> >> finish before opening region, because we need to listen the wal event,
> >> otherwise our replication may lost data), and when initialize the
> source &
> >> sink , we need to read hbase:replication table which hasn't been
> avaiable
> >> because our master is waiting rs to be OK, and the rs is waiting
> >> hbase:replication to be OK ... a dead loop happen again ...
> >>
> >> After discussed with Guanghao Zhang offline, I'm considering that try to
> >> assign all system table to a rs which only accept regions of system
> table
> >> assignment (The rs will skip to initialize the replication source or
> sink
> >> )...
> >>
> >> I've tried to start a mini cluster by setting
> >> hbase.balancer.tablesOnMaster.systemTablesOnly=true
> >> & hbase.balancer.tablesOnMaster=true , it seems not work. because
> >> currently
> >> we initialize the master logic firstly, then region logic for the
> HMaster
> >> process, and it should be ...
> >>
> >>
> >> Any  suggestion  ?
> >>
> >
> >
>


Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-15 Thread OpenInx
> I think unless we can make the regionserver start without replication,
and initialize it later, otherwise we can not break the tie

Yes,   what we thought before was :  we can assign all system table in
master because it run as a region server now in 2.0. the problem is once we
restart the master, the availability may be affected , so master should be
always available.

>  I believe that we only need the ReplicationPeerStorage to be available when
starting a region server, so we can keep this data in zk, and storage the
queue related data to hbase:replication table?

Yes, If we still keep the peer config & state in zookeeper,  the cluster
start up will be no problem, and will be a minor change in the code base,
but not such elegant.



On Thu, Mar 15, 2018 at 3:57 PM, 张铎(Duo Zhang) 
wrote:

> Oh, it should be 'The replication peer related data is small'.
>
> 2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) :
>
> > I think this is a bit awkward... A region server even does not need the
> > meta table to be online when starting, but it needs another system table
> > when starting...
> >
> > I think unless we can make the regionserver start without replication,
> and
> > initialize it later, otherwise we can not break the tie. Having a special
> > 'region server' seems a bad smell to me. What's the advantage comparing
> to
> > zk?
> >
> > BTW, I believe that we only need the ReplicationPeerStorage to be
> > available when starting a region server, so we can keep this data in zk,
> > and storage the queue related data to hbase:replication table? The
> > replication related data is small so I think this is OK.
> >
> > Thanks.
> >
> > 2018-03-15 14:55 GMT+08:00 OpenInx :
> >
> >> Hi :
> >>
> >> (Paste from https://issues.apache.org/jira/browse/HBASE-20166?
> >> focusedCommentId=16399886&page=com.atlassian.jira.
> >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886)
> >>
> >> There's a really big problem here if we use table based replication to
> >> start a hbase cluster:
> >>
> >> For HMaster process, it works as following:
> >> 1. Start active master initialization .
> >> 2. Master wait rs report in .
> >> 3. Master assign meta region to one of the region servers .
> >> 4. Master create hbase:replication table if not exist.
> >>
> >> But the RS need to finish initialize the replication source & sink
> before
> >> finish startup( and the initialization of replication source & sink must
> >> finish before opening region, because we need to listen the wal event,
> >> otherwise our replication may lost data), and when initialize the
> source &
> >> sink , we need to read hbase:replication table which hasn't been
> avaiable
> >> because our master is waiting rs to be OK, and the rs is waiting
> >> hbase:replication to be OK ... a dead loop happen again ...
> >>
> >> After discussed with Guanghao Zhang offline, I'm considering that try to
> >> assign all system table to a rs which only accept regions of system
> table
> >> assignment (The rs will skip to initialize the replication source or
> sink
> >> )...
> >>
> >> I've tried to start a mini cluster by setting
> >> hbase.balancer.tablesOnMaster.systemTablesOnly=true
> >> & hbase.balancer.tablesOnMaster=true , it seems not work. because
> >> currently
> >> we initialize the master logic firstly, then region logic for the
> HMaster
> >> process, and it should be ...
> >>
> >>
> >> Any  suggestion  ?
> >>
> >
> >
>



-- 
==
Openinx  blog : http://openinx.github.io

TO BE A GREAT HACKER !
==


Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-15 Thread Duo Zhang
Oh, it should be 'The replication peer related data is small'.

2018-03-15 15:56 GMT+08:00 张铎(Duo Zhang) :

> I think this is a bit awkward... A region server even does not need the
> meta table to be online when starting, but it needs another system table
> when starting...
>
> I think unless we can make the regionserver start without replication, and
> initialize it later, otherwise we can not break the tie. Having a special
> 'region server' seems a bad smell to me. What's the advantage comparing to
> zk?
>
> BTW, I believe that we only need the ReplicationPeerStorage to be
> available when starting a region server, so we can keep this data in zk,
> and storage the queue related data to hbase:replication table? The
> replication related data is small so I think this is OK.
>
> Thanks.
>
> 2018-03-15 14:55 GMT+08:00 OpenInx :
>
>> Hi :
>>
>> (Paste from https://issues.apache.org/jira/browse/HBASE-20166?
>> focusedCommentId=16399886&page=com.atlassian.jira.
>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886)
>>
>> There's a really big problem here if we use table based replication to
>> start a hbase cluster:
>>
>> For HMaster process, it works as following:
>> 1. Start active master initialization .
>> 2. Master wait rs report in .
>> 3. Master assign meta region to one of the region servers .
>> 4. Master create hbase:replication table if not exist.
>>
>> But the RS need to finish initialize the replication source & sink before
>> finish startup( and the initialization of replication source & sink must
>> finish before opening region, because we need to listen the wal event,
>> otherwise our replication may lost data), and when initialize the source &
>> sink , we need to read hbase:replication table which hasn't been avaiable
>> because our master is waiting rs to be OK, and the rs is waiting
>> hbase:replication to be OK ... a dead loop happen again ...
>>
>> After discussed with Guanghao Zhang offline, I'm considering that try to
>> assign all system table to a rs which only accept regions of system table
>> assignment (The rs will skip to initialize the replication source or sink
>> )...
>>
>> I've tried to start a mini cluster by setting
>> hbase.balancer.tablesOnMaster.systemTablesOnly=true
>> & hbase.balancer.tablesOnMaster=true , it seems not work. because
>> currently
>> we initialize the master logic firstly, then region logic for the HMaster
>> process, and it should be ...
>>
>>
>> Any  suggestion  ?
>>
>
>


Re: [DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-15 Thread Duo Zhang
I think this is a bit awkward... A region server even does not need the
meta table to be online when starting, but it needs another system table
when starting...

I think unless we can make the regionserver start without replication, and
initialize it later, otherwise we can not break the tie. Having a special
'region server' seems a bad smell to me. What's the advantage comparing to
zk?

BTW, I believe that we only need the ReplicationPeerStorage to be available
when starting a region server, so we can keep this data in zk, and storage
the queue related data to hbase:replication table? The replication related
data is small so I think this is OK.

Thanks.

2018-03-15 14:55 GMT+08:00 OpenInx :

> Hi :
>
> (Paste from https://issues.apache.org/jira/browse/HBASE-20166?
> focusedCommentId=16399886&page=com.atlassian.jira.
> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886)
>
> There's a really big problem here if we use table based replication to
> start a hbase cluster:
>
> For HMaster process, it works as following:
> 1. Start active master initialization .
> 2. Master wait rs report in .
> 3. Master assign meta region to one of the region servers .
> 4. Master create hbase:replication table if not exist.
>
> But the RS need to finish initialize the replication source & sink before
> finish startup( and the initialization of replication source & sink must
> finish before opening region, because we need to listen the wal event,
> otherwise our replication may lost data), and when initialize the source &
> sink , we need to read hbase:replication table which hasn't been avaiable
> because our master is waiting rs to be OK, and the rs is waiting
> hbase:replication to be OK ... a dead loop happen again ...
>
> After discussed with Guanghao Zhang offline, I'm considering that try to
> assign all system table to a rs which only accept regions of system table
> assignment (The rs will skip to initialize the replication source or sink
> )...
>
> I've tried to start a mini cluster by setting
> hbase.balancer.tablesOnMaster.systemTablesOnly=true
> & hbase.balancer.tablesOnMaster=true , it seems not work. because
> currently
> we initialize the master logic firstly, then region logic for the HMaster
> process, and it should be ...
>
>
> Any  suggestion  ?
>


[DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication

2018-03-14 Thread OpenInx
Hi :

(Paste from https://issues.apache.org/jira/browse/HBASE-20166?
focusedCommentId=16399886&page=com.atlassian.jira.
plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16399886)

There's a really big problem here if we use table based replication to
start a hbase cluster:

For HMaster process, it works as following:
1. Start active master initialization .
2. Master wait rs report in .
3. Master assign meta region to one of the region servers .
4. Master create hbase:replication table if not exist.

But the RS need to finish initialize the replication source & sink before
finish startup( and the initialization of replication source & sink must
finish before opening region, because we need to listen the wal event,
otherwise our replication may lost data), and when initialize the source &
sink , we need to read hbase:replication table which hasn't been avaiable
because our master is waiting rs to be OK, and the rs is waiting
hbase:replication to be OK ... a dead loop happen again ...

After discussed with Guanghao Zhang offline, I'm considering that try to
assign all system table to a rs which only accept regions of system table
assignment (The rs will skip to initialize the replication source or sink
)...

I've tried to start a mini cluster by setting
hbase.balancer.tablesOnMaster.systemTablesOnly=true
& hbase.balancer.tablesOnMaster=true , it seems not work. because currently
we initialize the master logic firstly, then region logic for the HMaster
process, and it should be ...


Any  suggestion  ?