[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206109#comment-17206109 ] Nicholas Jiang commented on HBASE-24753: [~zhangduo], I prefer to recommend the raft library https://github.com/sofastack/sofa-jraft. The benchmark of JRaft could refer to https://www.sofastack.tech/projects/sofa-jraft/benchmark-performance/. > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172521#comment-17172521 ] Nick Dimiduk commented on HBASE-24753: -- bq. Actually, no... It is about HBASE-24286, where the AWS guys want to redeploy an HBase cluster with the old data on S3 but all new virtual machines as region server. So I say if you want to do this on cloud, you could also use EBS. I see. I would classify HBASE-24286 as an operator error, in the sense that they are attempting to re-hydrate a cluster from partial state (missing WAL files/directories)... maybe a bug, except that there's never been (i've never seen) an explicit design that completely isolates root dir filesystem from master filesystem. What we're talking about in this JIRA is a design choice explicitly changing from what was intentionally decided before, intentionally introducing a persistence dependency on something additional to the root dir on the shared namespace filesystem. bq. You could build the storage of the raft store on a HDFS? Or even just on a dynamodb as it is a just a KV? I do not see any problems here... Consensus storage on HDFS would resolve the concern I attempted to clarify. > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171878#comment-17171878 ] Duo Zhang commented on HBASE-24753: --- {quote} Duo, my concern was not on avoid zk usage and all HM have own consensus for leader election (As what the jira title says). My worry was on the line which says move the root table data (meta as of today) away from storage but to local and HM handle it in special way. {quote} You could build the storage of the raft store on a HDFS? Or even just on a dynamodb as it is a just a KV? I do not see any problems here... > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171877#comment-17171877 ] Anoop Sam John commented on HBASE-24753: Thanks Nick... Ya even what he mentioned also possible. Clone a cluster. The drop and recreate the cluster based on saved data is a very common thing. HBase always use(d) a FS like HDFS or cloud for storing any persistent data. This is really great for cloud cases. Any system which deal with local storage (replicated and raft kind of consensus), wont be easy to make it to work in cloud. bq.And even for now, it is not safe to just restart a new cluster with data on HDFS but no data on zookeeper. Duo, my concern was not on avoid zk usage and all HM have own consensus for leader election (As what the jira title says). My worry was on the line which says move the root table data (meta as of today) away from storage but to local and HM handle it in special way. > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171864#comment-17171864 ] Duo Zhang commented on HBASE-24753: --- {quote} I believe Anoop Sam John's point is that today we can clone the root directory from hdfs to a new path and from that new path standup an independent cluster. {quote} Actually, no... It is about HBASE-24286, where the AWS guys want to redeploy an HBase cluster with the old data on S3 but all new virtual machines as region server. So I say if you want to do this on cloud, you could also use EBS. And on a normal deploy in DC with physical machine, if you just want to change the machines, raft has built-in support for adding new node and removing old node. And even for now, it is not safe to just restart a new cluster with data on HDFS but no data on zookeeper. You need to use HBCK2 to repair the cluster. I do not think there is much difference if we just move the data from zookeeper to our own raft based master store. > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171807#comment-17171807 ] Nick Dimiduk commented on HBASE-24753: -- I believe [~anoop.hbase]'s point is that today we can clone the root directory from hdfs to a new path and from that new path standup an independent cluster. The cluster's persistent state resides exclusively in the configured root directory, on HDFS. Introducing non-ephemeral consensus changes this story, makes it so that the consensus implementation's data is also required. To date, this has been non-desirable. I don't think we should assume all deployments have an EBS-alike local volume management system available. > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166371#comment-17166371 ] Duo Zhang commented on HBASE-24753: --- {quote} There is one interesting usecase by Cloud to drop a cluster and recreate it later on existing data. This was/is possible because we never store any persisting data locally but always on FS. I would say lets not break that. I read in another jira also says that the Root table data can be stored locally (RAFT will be in place) not on FS. I would say lets not do that. Let us continue to have the storage isolation. {quote} I do not think the problem here is local storage, it is HDFS, and also ZooKeeper. We could use EBS as local storage, and it also supports snapshot so recreate a cluster is easy. > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166294#comment-17166294 ] Tamas Penzes commented on HBASE-24753: -- [~zhangduo] Ozone uses Ratis 1.0.0 according to their pom.xml [https://github.com/apache/hadoop-ozone/blob/master/pom.xml#L82] Looks like it's stable enough for them. > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166154#comment-17166154 ] Anoop Sam John commented on HBASE-24753: bq.With this solution in place, as long as root table will not be in a format of region(we could just use rocksdb to store it locally), There is one interesting usecase by Cloud to drop a cluster and recreate it later on existing data. This was/is possible because we never store any persisting data locally but always on FS. I would say lets not break that. I read in another jira also says that the Root table data can be stored locally (RAFT will be in place) not on FS. I would say lets not do that. Let us continue to have the storage isolation. > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166058#comment-17166058 ] Duo Zhang commented on HBASE-24753: --- Update: I was trying to make use of the sofa-jraft, but it depends on protobuf 3.x, so I tried to set hadoop.version to 3.3.0, as hadoop 3.3.0 has shaded protobuf, so we can purge all the protobuf 2.5 dependencies. But then I noticed that there is a problem for support hadoop 3.3.0 because of the conflict on jetty version. So I'm currently working on shade jetty to solve the problem first. > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17162461#comment-17162461 ] Duo Zhang commented on HBASE-24753: --- Another java raft library: https://github.com/sofastack/sofa-jraft > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24753) HA masters based on raft
[ https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17162445#comment-17162445 ] Duo Zhang commented on HBASE-24753: --- A possible problem is which library to use. Is ratis stable enough? It is used in ozone? > HA masters based on raft > > > Key: HBASE-24753 > URL: https://issues.apache.org/jira/browse/HBASE-24753 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Duo Zhang >Priority: Major > > For better availability, for moving bootstrap information from zookeeper to > our own service so finally we could remove the dependency on zookeeper > completely. > This has been in my mind for a long time, and since the there is a dicussion > in HBASE-11288 about how to storing root table, and also in HBASE-24749, we > want to have better performance on a filesystem can not support list and > rename well, where requires a storage engine at the bottom to store the > storefiles information for meta table, I think it is the time to throw this > idea out. > The basic solution is to build a raft group to store the bootstrap > information, for now it is cluster id(it is on the file system already?) and > the root table. For region servers they will always go to the leader to ask > for the information so they can always see the newest data, and for client, > we enable 'follower read', to reduce the load of the leader(and there are > some solutions to even let 'follower read' to always get the newest data in > raft). > With this solution in place, as long as root table will not be in a format of > region(we could just use rocksdb to store it locally), the cyclic dependency > in HBASE-24749 has also been solved, as we do not need to find a place to > store the storefiles information for root table any more. -- This message was sent by Atlassian Jira (v8.3.4#803005)