[jira] [Created] (HBASE-24192) Let ChaosMonkeyRunner expose the chaos monkey runner it creates for branch-1
Lokesh Khurana created HBASE-24192: -- Summary: Let ChaosMonkeyRunner expose the chaos monkey runner it creates for branch-1 Key: HBASE-24192 URL: https://issues.apache.org/jira/browse/HBASE-24192 Project: HBase Issue Type: Improvement Reporter: Lokesh Khurana -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24193) BackPort (ChaosMonkeyRunner expose the chaos monkey runner it creates
Lokesh Khurana created HBASE-24193: -- Summary: BackPort (ChaosMonkeyRunner expose the chaos monkey runner it creates Key: HBASE-24193 URL: https://issues.apache.org/jira/browse/HBASE-24193 Project: HBase Issue Type: Improvement Reporter: Lokesh Khurana Assignee: Lokesh Khurana Backport Jira : [HBASE-18651|https://issues.apache.org/jira/browse/HBASE-18651] to branch-1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24194) Refactor BufferedEncodedSeeker anonymous classes to named inner class
Viraj Jasani created HBASE-24194: Summary: Refactor BufferedEncodedSeeker anonymous classes to named inner class Key: HBASE-24194 URL: https://issues.apache.org/jira/browse/HBASE-24194 Project: HBase Issue Type: Task Reporter: Viraj Jasani Assignee: Viraj Jasani BufferedEncodedSeeker has multiple anonymous inner sub-classes with huge code. Better to refactor them to named inner classes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24195) Admin.getRegionServers() should return live servers excluding decom RS optionally
Viraj Jasani created HBASE-24195: Summary: Admin.getRegionServers() should return live servers excluding decom RS optionally Key: HBASE-24195 URL: https://issues.apache.org/jira/browse/HBASE-24195 Project: HBase Issue Type: Improvement Reporter: Viraj Jasani Assignee: Viraj Jasani Admin.getRegionServers() returns all live RS of the cluster. It should consider optionally excluding decommissioned RS for operators to get live non-decommissioned RS list from single API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24112) [RSGroup] Support renaming rsgroup
[ https://issues.apache.org/jira/browse/HBASE-24112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reid Chan resolved HBASE-24112. --- Hadoop Flags: Reviewed Resolution: Fixed > [RSGroup] Support renaming rsgroup > -- > > Key: HBASE-24112 > URL: https://issues.apache.org/jira/browse/HBASE-24112 > Project: HBase > Issue Type: Improvement > Components: rsgroup >Reporter: Reid Chan >Assignee: Reid Chan >Priority: Major > Fix For: 3.0.0, 2.3.0, 1.7.0, 2.2.5 > > > Rsgroup name once is decided at the beginning, it is difficult to rename it. > Current approach is removing all tables and servers back to default rsgroup, > then delete it and add a rsgroup with the new name, after that moving regions > and servers back. Or without moving back, if machine resources is ample. > Anyway, it is an expensive operation: moving regions, breaking region's > locality. > And given that rsgroup is one kind of managements in cluster, and management > sometimes changes, renaming is necessary. > It is simple in implementation. I'm working on it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24196) [Shell] Add rsgroup command in hbase shell
Reid Chan created HBASE-24196: - Summary: [Shell] Add rsgroup command in hbase shell Key: HBASE-24196 URL: https://issues.apache.org/jira/browse/HBASE-24196 Project: HBase Issue Type: Improvement Components: rsgroup, shell Reporter: Reid Chan Assignee: Reid Chan -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] Arrange Events for 10-year Anniversary
Dear all, Since our project has reached its 10th birthday, and 10 years is definitely a great milestone, I propose to arrange some special (virtual) events for celebration. What comes into my mind include: * Open threads to collect voices from our dev/user mailing list, like "what do you want to say to HBase for its 10th birthday" (as well as our twitter accounts maybe, if any) * Arrange some online interviews to both PMC members and our customers. Some of us have been in this project all the way and there must be some good stories to tell, as well as expectations for the future. * Join the Apache Feathercast as suggested in another thread. * Form a blogpost to include all above events as an official celebration. What do you think? Any other good ideas? Looking forward to more voices (smile). Best Regards, Yu
[jira] [Created] (HBASE-24197) TestHttpServer.testBindAddress failure with latest jetty
Istvan Toth created HBASE-24197: --- Summary: TestHttpServer.testBindAddress failure with latest jetty Key: HBASE-24197 URL: https://issues.apache.org/jira/browse/HBASE-24197 Project: HBase Issue Type: Bug Affects Versions: master Reporter: Istvan Toth Assignee: Istvan Toth The latest jetty version (tested with 9.4.28.v20200408 which is not included in HBase yet) wraps BindException into an IOException when it fails to bind to a port. This breaks HttpServer's findPort functionality, which manifests in TestHttpServer.testBindAddress failing. The proposed patch handles both the old and the new jetty behaviour correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24175) [Flakey Tests] TestSecureExportSnapshot FileNotFoundException
[ https://issues.apache.org/jira/browse/HBASE-24175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack reopened HBASE-24175: --- The patch here doesn't get all places where yarn is using /tmp. The 2.3 build failed with the below last night: {code} Error Message org.apache.hadoop.service.ServiceStateException: java.io.FileNotFoundException: File file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing does not exist Stacktrace org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.service.ServiceStateException: java.io.FileNotFoundException: File file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing does not exist at org.apache.hadoop.hbase.snapshot.TestExportSnapshotAdjunct.setUpBeforeClass(TestExportSnapshotAdjunct.java:70) Caused by: org.apache.hadoop.service.ServiceStateException: java.io.FileNotFoundException: File file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing does not exist at org.apache.hadoop.hbase.snapshot.TestExportSnapshotAdjunct.setUpBeforeClass(TestExportSnapshotAdjunct.java:70) Caused by: java.io.FileNotFoundException: File file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing does not exist at org.apache.hadoop.hbase.snapshot.TestExportSnapshotAdjunct.setUpBeforeClass(TestExportSnapshotAdjunct.java:70) {code} > [Flakey Tests] TestSecureExportSnapshot FileNotFoundException > - > > Key: HBASE-24175 > URL: https://issues.apache.org/jira/browse/HBASE-24175 > Project: HBase > Issue Type: Sub-task > Components: flakies >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.5 > > Attachments: > 0001-HBASE-24175-Flakey-Tests-TestSecureExportSnapshot-Fi.patch > > > Why we writing '/tmp' dir? > {code} > Error Message > org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > Stacktrace > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > Caused by: org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > Caused by: java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Arrange Events for 10-year Anniversary
2020 - 10 = 2010. As far as I remember I joined HBase community in 2009 :) and I am pretty sure that Mr. Stack did it even earlier. Best regards, Vlad On Wed, Apr 15, 2020 at 5:57 AM Yu Li wrote: > Dear all, > > Since our project has reached its 10th birthday, and 10 years is definitely > a great milestone, I propose to arrange some special (virtual) events for > celebration. What comes into my mind include: > > * Open threads to collect voices from our dev/user mailing list, like "what > do you want to say to HBase for its 10th birthday" (as well as our twitter > accounts maybe, if any) > > * Arrange some online interviews to both PMC members and our customers. > Some of us have been in this project all the way and there must be some > good stories to tell, as well as expectations for the future. > > * Join the Apache Feathercast as suggested in another thread. > > * Form a blogpost to include all above events as an official celebration. > > What do you think? Any other good ideas? Looking forward to more voices > (smile). > > Best Regards, > Yu >
[DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)
Hi folks, I'd like to bring up the topic of the experience of new users as it pertains to use of the `LocalFileSystem` and its associated (lack of) data durability guarantees. By default, an unconfigured HBase runs with its root directory on a `file:///` path. This patch is picked up as an instance of `LocalFileSystem`. Hadoop has long offered this class, but it has never supported `hsync` or `hflush` stream characteristics. Thus, when HBase runs on this configuration, it is unable to ensure that WAL writes are durable, and thus will ACK a write without this assurance. This is the case, even when running in a fully durable WAL mode. This impacts a new user, someone kicking the tires on HBase following our Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase will WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book describes a process of disabling enforcement of stream capability enforcement as a first step. This is a mandatory configuration for running HBase directly out of our binary distribution. HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on 2.8: log a warning and cary on. The critique of this approach is that it's far too subtle, too quiet for a system operating in a state known to not provide data durability. I have two assumptions/concerns around the state of things, which prompted my solution on HBASE-24086 and the associated doc update on HBASE-24106. 1. No one should be running a production system on `LocalFileSystem`. The initial implementation checked both for `LocalFileSystem` and `hbase.cluster.distributed`. When running on the former and the latter is false, we assume the user is running a non-production deployment and carry on with the warning. When the latter is true, we assume the user intended a production deployment and the process terminates due to stream capability enforcement. Subsequent code review resulted in skipping the `hbase.cluster.distributed` check and simply warning, as was done on 2.8 and earlier. (As I understand it, we've long used the `hbase.cluster.distributed` configuration to decide if the user intends this runtime to be a production deployment or not.) Is this a faulty assumption? Is there a use-case we support where we condone running production deployment on the non-durable `LocalFileSystem`? 2. The Quick Start experience should require no configuration at all. Our stack is difficult enough to run in a fully durable production environment. We should make it a priority to ensure it's as easy as possible to try out HBase. Forcing a user to make decisions about data durability before they even launch the web ui is a terrible experience, in my opinion, and should be a non-starter for us as a project. (In my opinion, the need to configure either `hbase.rootdir` or `hbase.tmp.dir` away from `/tmp` is equally bad for a Getting Started experience. It is a second, more subtle question of data durability that we should avoid out of the box. But I'm happy to leave that for another thread.) Thank you for your time, Nick
Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)
Quick Start and Production are exclusive configurations. Quick Start, as you say, should have as few steps to up and running as possible. Production requires a real distributed filesystem for persistence and that means HDFS and that means, whatever the provisioning and deployment and process management (Ambari or k8s or...) choices are going to be, they will not be a Quick Start. We’ve always had this problem. The Quick Start simply can’t produce a system capable of durability because prerequisites for durability are not quick to set up. Specifically about /tmp... I agree that’s not a good default. Time and again I’ve heard people complain that the tmp cleaner has removed their test data. It shouldn’t be surprising but is and that is real feedback on mismatch of user expectation to what we are providing in that configuration. Addressing this aspect of the Quick Start experience would be a simple change: make the default a new directory in $HOME, perhaps “.hbase” . > On Apr 15, 2020, at 9:40 AM, Nick Dimiduk wrote: > > Hi folks, > > I'd like to bring up the topic of the experience of new users as it > pertains to use of the `LocalFileSystem` and its associated (lack of) data > durability guarantees. By default, an unconfigured HBase runs with its root > directory on a `file:///` path. This patch is picked up as an instance of > `LocalFileSystem`. Hadoop has long offered this class, but it has never > supported `hsync` or `hflush` stream characteristics. Thus, when HBase runs > on this configuration, it is unable to ensure that WAL writes are durable, > and thus will ACK a write without this assurance. This is the case, even > when running in a fully durable WAL mode. > > This impacts a new user, someone kicking the tires on HBase following our > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase will > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book > describes a process of disabling enforcement of stream capability > enforcement as a first step. This is a mandatory configuration for running > HBase directly out of our binary distribution. > > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on > 2.8: log a warning and cary on. The critique of this approach is that it's > far too subtle, too quiet for a system operating in a state known to not > provide data durability. > > I have two assumptions/concerns around the state of things, which prompted > my solution on HBASE-24086 and the associated doc update on HBASE-24106. > > 1. No one should be running a production system on `LocalFileSystem`. > > The initial implementation checked both for `LocalFileSystem` and > `hbase.cluster.distributed`. When running on the former and the latter is > false, we assume the user is running a non-production deployment and carry > on with the warning. When the latter is true, we assume the user intended a > production deployment and the process terminates due to stream capability > enforcement. Subsequent code review resulted in skipping the > `hbase.cluster.distributed` check and simply warning, as was done on 2.8 > and earlier. > > (As I understand it, we've long used the `hbase.cluster.distributed` > configuration to decide if the user intends this runtime to be a production > deployment or not.) > > Is this a faulty assumption? Is there a use-case we support where we > condone running production deployment on the non-durable `LocalFileSystem`? > > 2. The Quick Start experience should require no configuration at all. > > Our stack is difficult enough to run in a fully durable production > environment. We should make it a priority to ensure it's as easy as > possible to try out HBase. Forcing a user to make decisions about data > durability before they even launch the web ui is a terrible experience, in > my opinion, and should be a non-starter for us as a project. > > (In my opinion, the need to configure either `hbase.rootdir` or > `hbase.tmp.dir` away from `/tmp` is equally bad for a Getting Started > experience. It is a second, more subtle question of data durability that we > should avoid out of the box. But I'm happy to leave that for another > thread.) > > Thank you for your time, > Nick
Re: [DISCUSS] Arrange Events for 10-year Anniversary
On Wed, Apr 15, 2020 at 9:25 AM Vladimir Rodionov wrote: > 2020 - 10 = 2010. As far as I remember I joined HBase community in 2009 :) > and I am pretty sure that Mr. Stack did it even earlier. > IIRC, 2010 is when HBase graduated from being a Hadoop sub-project and became a Apache Top-Level Project. On Wed, Apr 15, 2020 at 5:57 AM Yu Li wrote: > > > Dear all, > > > > Since our project has reached its 10th birthday, and 10 years is > definitely > > a great milestone, I propose to arrange some special (virtual) events for > > celebration. What comes into my mind include: > > > > * Open threads to collect voices from our dev/user mailing list, like > "what > > do you want to say to HBase for its 10th birthday" (as well as our > twitter > > accounts maybe, if any) > > > > * Arrange some online interviews to both PMC members and our customers. > > Some of us have been in this project all the way and there must be some > > good stories to tell, as well as expectations for the future. > > > > * Join the Apache Feathercast as suggested in another thread. > > > > * Form a blogpost to include all above events as an official celebration. > > > > What do you think? Any other good ideas? Looking forward to more voices > > (smile). > > > > Best Regards, > > Yu > > >
[DISCUSS] Change the Location of hbase.rootdir to improve the Quick Start User Experience (was Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086))
Branching off this subject from the original thread. On Wed, Apr 15, 2020 at 9:56 AM Andrew Purtell wrote: > Quick Start and Production are exclusive configurations. > Yes absolutely. Quick Start, as you say, should have as few steps to up and running as > possible. > > Production requires a real distributed filesystem for persistence and that > means HDFS and that means, whatever the provisioning and deployment and > process management (Ambari or k8s or...) choices are going to be, they will > not be a Quick Start. > > We’ve always had this problem. The Quick Start simply can’t produce a > system capable of durability because prerequisites for durability are not > quick to set up. > Is this exclusively due to the implementation of `LocalFileSystem` or are there other issues at play? I've seen there's also `RawLocalFileSystem` but I haven't investigated their relationship, it's capabilities, or if we might profit from its use for the Quick Start experience. Specifically about /tmp... I agree that’s not a good default. Time and > again I’ve heard people complain that the tmp cleaner has removed their > test data. It shouldn’t be surprising but is and that is real feedback on > mismatch of user expectation to what we are providing in that > configuration. Addressing this aspect of the Quick Start experience would > be a simple change: make the default a new directory in $HOME, perhaps > “.hbase” . > I propose changing the default value of `hbase.tmp.dir` as shipped in the default hbase-site.xml to be simply `tmp`, as I documented in my change on HBASE-24106. That way it's not hidden somewhere and it's self-contained to this unpacking of the source/binary distribution. I.e., there's no need to worry about upgrading the data stored there when a user experiments with a new version. > On Apr 15, 2020, at 9:40 AM, Nick Dimiduk wrote: > > > > Hi folks, > > > > I'd like to bring up the topic of the experience of new users as it > > pertains to use of the `LocalFileSystem` and its associated (lack of) > data > > durability guarantees. By default, an unconfigured HBase runs with its > root > > directory on a `file:///` path. This patch is picked up as an instance of > > `LocalFileSystem`. Hadoop has long offered this class, but it has never > > supported `hsync` or `hflush` stream characteristics. Thus, when HBase > runs > > on this configuration, it is unable to ensure that WAL writes are > durable, > > and thus will ACK a write without this assurance. This is the case, even > > when running in a fully durable WAL mode. > > > > This impacts a new user, someone kicking the tires on HBase following our > > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase > will > > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book > > describes a process of disabling enforcement of stream capability > > enforcement as a first step. This is a mandatory configuration for > running > > HBase directly out of our binary distribution. > > > > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on > > 2.8: log a warning and cary on. The critique of this approach is that > it's > > far too subtle, too quiet for a system operating in a state known to not > > provide data durability. > > > > I have two assumptions/concerns around the state of things, which > prompted > > my solution on HBASE-24086 and the associated doc update on HBASE-24106. > > > > 1. No one should be running a production system on `LocalFileSystem`. > > > > The initial implementation checked both for `LocalFileSystem` and > > `hbase.cluster.distributed`. When running on the former and the latter is > > false, we assume the user is running a non-production deployment and > carry > > on with the warning. When the latter is true, we assume the user > intended a > > production deployment and the process terminates due to stream capability > > enforcement. Subsequent code review resulted in skipping the > > `hbase.cluster.distributed` check and simply warning, as was done on 2.8 > > and earlier. > > > > (As I understand it, we've long used the `hbase.cluster.distributed` > > configuration to decide if the user intends this runtime to be a > production > > deployment or not.) > > > > Is this a faulty assumption? Is there a use-case we support where we > > condone running production deployment on the non-durable > `LocalFileSystem`? > > > > 2. The Quick Start experience should require no configuration at all. > > > > Our stack is difficult enough to run in a fully durable production > > environment. We should make it a priority to ensure it's as easy as > > possible to try out HBase. Forcing a user to make decisions about data > > durability before they even launch the web ui is a terrible experience, > in > > my opinion, and should be a non-starter for us as a project. > > > > (In my opinion, the need to configure either `hbase.rootdir` or > > `hbase.tmp.dir` away from `/tmp` is equally bad for a Getting Started
Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)
I think the first assumption no longer holds. Especially with the move to flexible compute environments I regularly get asked by folks what the smallest HBase they can start with for production. I can keep saying 3/5/7 nodes or whatever but I guarantee there are folks who want to and will run HBase with a single node. Probably those deployments won't want to have the distributed flag set. None of them really have a good option for where the WALs go, and failing loud when they try to go to LocalFileSystem is the best option I've seen so far to make sure folks realize they are getting into muddy waters. I agree with the second assumption. Our quickstart in general is too complicated. Maybe if we include big warnings in the guide itself, we could make a quickstart specific artifact to download that has the unsafe disabling config in place? Last fall I toyed with the idea of adding an "hbase-local" module to the hbase-filesystem repo that could start us out with some optimizations for single node set ups. We could start with a fork of RawLocalFileSystem (which will call OutputStream flush operations in response to hflush/hsync) that properly advertises its StreamCapabilities to say that it supports the operations we need. Alternatively we could make our own implementation of FileSystem that uses NIO stuff. Either of these approaches would solve both problems. On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk wrote: > > Hi folks, > > I'd like to bring up the topic of the experience of new users as it > pertains to use of the `LocalFileSystem` and its associated (lack of) data > durability guarantees. By default, an unconfigured HBase runs with its root > directory on a `file:///` path. This patch is picked up as an instance of > `LocalFileSystem`. Hadoop has long offered this class, but it has never > supported `hsync` or `hflush` stream characteristics. Thus, when HBase runs > on this configuration, it is unable to ensure that WAL writes are durable, > and thus will ACK a write without this assurance. This is the case, even > when running in a fully durable WAL mode. > > This impacts a new user, someone kicking the tires on HBase following our > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase will > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book > describes a process of disabling enforcement of stream capability > enforcement as a first step. This is a mandatory configuration for running > HBase directly out of our binary distribution. > > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on > 2.8: log a warning and cary on. The critique of this approach is that it's > far too subtle, too quiet for a system operating in a state known to not > provide data durability. > > I have two assumptions/concerns around the state of things, which prompted > my solution on HBASE-24086 and the associated doc update on HBASE-24106. > > 1. No one should be running a production system on `LocalFileSystem`. > > The initial implementation checked both for `LocalFileSystem` and > `hbase.cluster.distributed`. When running on the former and the latter is > false, we assume the user is running a non-production deployment and carry > on with the warning. When the latter is true, we assume the user intended a > production deployment and the process terminates due to stream capability > enforcement. Subsequent code review resulted in skipping the > `hbase.cluster.distributed` check and simply warning, as was done on 2.8 > and earlier. > > (As I understand it, we've long used the `hbase.cluster.distributed` > configuration to decide if the user intends this runtime to be a production > deployment or not.) > > Is this a faulty assumption? Is there a use-case we support where we > condone running production deployment on the non-durable `LocalFileSystem`? > > 2. The Quick Start experience should require no configuration at all. > > Our stack is difficult enough to run in a fully durable production > environment. We should make it a priority to ensure it's as easy as > possible to try out HBase. Forcing a user to make decisions about data > durability before they even launch the web ui is a terrible experience, in > my opinion, and should be a non-starter for us as a project. > > (In my opinion, the need to configure either `hbase.rootdir` or > `hbase.tmp.dir` away from `/tmp` is equally bad for a Getting Started > experience. It is a second, more subtle question of data durability that we > should avoid out of the box. But I'm happy to leave that for another > thread.) > > Thank you for your time, > Nick
Re: [DISCUSS] Arrange Events for 10-year Anniversary
That was probably when our community was still covered under the Hadoop project. Essentially our version of being in the incubator. This is when we became our own TLP: https://whimsy.apache.org/board/minutes/HBase.html#2010-04-21 On Wed, Apr 15, 2020 at 11:25 AM Vladimir Rodionov wrote: > > 2020 - 10 = 2010. As far as I remember I joined HBase community in 2009 :) > and I am pretty sure that Mr. Stack did it even earlier. > > Best regards, > Vlad > > On Wed, Apr 15, 2020 at 5:57 AM Yu Li wrote: > > > Dear all, > > > > Since our project has reached its 10th birthday, and 10 years is definitely > > a great milestone, I propose to arrange some special (virtual) events for > > celebration. What comes into my mind include: > > > > * Open threads to collect voices from our dev/user mailing list, like "what > > do you want to say to HBase for its 10th birthday" (as well as our twitter > > accounts maybe, if any) > > > > * Arrange some online interviews to both PMC members and our customers. > > Some of us have been in this project all the way and there must be some > > good stories to tell, as well as expectations for the future. > > > > * Join the Apache Feathercast as suggested in another thread. > > > > * Form a blogpost to include all above events as an official celebration. > > > > What do you think? Any other good ideas? Looking forward to more voices > > (smile). > > > > Best Regards, > > Yu > >
Re: [DISCUSS] Change the Location of hbase.rootdir to improve the Quick Start User Experience (was Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086))
On Wed, Apr 15, 2020 at 12:03 PM Nick Dimiduk wrote: > > Branching off this subject from the original thread. > > On Wed, Apr 15, 2020 at 9:56 AM Andrew Purtell > wrote: > > > Quick Start and Production are exclusive configurations. > > > > Yes absolutely. > > Quick Start, as you say, should have as few steps to up and running as > > possible. > > > > Production requires a real distributed filesystem for persistence and that > > means HDFS and that means, whatever the provisioning and deployment and > > process management (Ambari or k8s or...) choices are going to be, they will > > not be a Quick Start. > > > > We’ve always had this problem. The Quick Start simply can’t produce a > > system capable of durability because prerequisites for durability are not > > quick to set up. > > > > Is this exclusively due to the implementation of `LocalFileSystem` or are > there other issues at play? I've seen there's also `RawLocalFileSystem` but > I haven't investigated their relationship, it's capabilities, or if we > might profit from its use for the Quick Start experience. There's a difference between a production system that can survive a single node failure without an outage and a production system that can recover given admin intervention when a single node fails. The quickstart guide can not produce the former. It could produce the latter, but currently does not. LocalFileSystem is the main problem. You can not us RawLocalFileSystem just by setting configuration because both it and LocalFileSystem use the "file://" scheme. I *think* current HBase code looks for LocalFileSystem and attempts to unwrap the RawLocalFileSystem inside. I know the ability to do this is not a supported Hadoop thing. I do not know how robust our handling of doing it is.
Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)
> We could start with a fork of > RawLocalFileSystem (which will call OutputStream flush operations in > response to hflush/hsync) that properly advertises its > StreamCapabilities to say that it supports the operations we need. This is a worthy option to pursue. Nick's mail doesn't make a distinction between avoiding data loss via typical tmp cleaner configurations, unfortunately adjacent to mention of "durability", and real data durability, which implies more than what a single system configuration can offer, no matter how many tweaks we make to LocalFileSystem. Maybe I'm being pedantic but this is something to be really clear about IMHO. On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey wrote: > I think the first assumption no longer holds. Especially with the move > to flexible compute environments I regularly get asked by folks what > the smallest HBase they can start with for production. I can keep > saying 3/5/7 nodes or whatever but I guarantee there are folks who > want to and will run HBase with a single node. Probably those > deployments won't want to have the distributed flag set. None of them > really have a good option for where the WALs go, and failing loud when > they try to go to LocalFileSystem is the best option I've seen so far > to make sure folks realize they are getting into muddy waters. > > I agree with the second assumption. Our quickstart in general is too > complicated. Maybe if we include big warnings in the guide itself, we > could make a quickstart specific artifact to download that has the > unsafe disabling config in place? > > Last fall I toyed with the idea of adding an "hbase-local" module to > the hbase-filesystem repo that could start us out with some > optimizations for single node set ups. We could start with a fork of > RawLocalFileSystem (which will call OutputStream flush operations in > response to hflush/hsync) that properly advertises its > StreamCapabilities to say that it supports the operations we need. > Alternatively we could make our own implementation of FileSystem that > uses NIO stuff. Either of these approaches would solve both problems. > > On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk wrote: > > > > Hi folks, > > > > I'd like to bring up the topic of the experience of new users as it > > pertains to use of the `LocalFileSystem` and its associated (lack of) > data > > durability guarantees. By default, an unconfigured HBase runs with its > root > > directory on a `file:///` path. This patch is picked up as an instance of > > `LocalFileSystem`. Hadoop has long offered this class, but it has never > > supported `hsync` or `hflush` stream characteristics. Thus, when HBase > runs > > on this configuration, it is unable to ensure that WAL writes are > durable, > > and thus will ACK a write without this assurance. This is the case, even > > when running in a fully durable WAL mode. > > > > This impacts a new user, someone kicking the tires on HBase following our > > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase > will > > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book > > describes a process of disabling enforcement of stream capability > > enforcement as a first step. This is a mandatory configuration for > running > > HBase directly out of our binary distribution. > > > > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on > > 2.8: log a warning and cary on. The critique of this approach is that > it's > > far too subtle, too quiet for a system operating in a state known to not > > provide data durability. > > > > I have two assumptions/concerns around the state of things, which > prompted > > my solution on HBASE-24086 and the associated doc update on HBASE-24106. > > > > 1. No one should be running a production system on `LocalFileSystem`. > > > > The initial implementation checked both for `LocalFileSystem` and > > `hbase.cluster.distributed`. When running on the former and the latter is > > false, we assume the user is running a non-production deployment and > carry > > on with the warning. When the latter is true, we assume the user > intended a > > production deployment and the process terminates due to stream capability > > enforcement. Subsequent code review resulted in skipping the > > `hbase.cluster.distributed` check and simply warning, as was done on 2.8 > > and earlier. > > > > (As I understand it, we've long used the `hbase.cluster.distributed` > > configuration to decide if the user intends this runtime to be a > production > > deployment or not.) > > > > Is this a faulty assumption? Is there a use-case we support where we > > condone running production deployment on the non-durable > `LocalFileSystem`? > > > > 2. The Quick Start experience should require no configuration at all. > > > > Our stack is difficult enough to run in a fully durable production > > environment. We should make it a priority to ensure it's as easy as > > possible to try out HBase. Forcing a user to
Re: [DISCUSS] Change the Location of hbase.rootdir to improve the Quick Start User Experience (was Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086))
> I propose changing the default value of `hbase.tmp.dir` as shipped in the > default hbase-site.xml to be simply `tmp`, as I documented in my change on > HBASE-24106. That way it's not hidden somewhere and it's self-contained to > this unpacking of the source/binary distribution. +1, great choice On Wed, Apr 15, 2020 at 10:03 AM Nick Dimiduk wrote: > Branching off this subject from the original thread. > > On Wed, Apr 15, 2020 at 9:56 AM Andrew Purtell > wrote: > > > Quick Start and Production are exclusive configurations. > > > > Yes absolutely. > > Quick Start, as you say, should have as few steps to up and running as > > possible. > > > > Production requires a real distributed filesystem for persistence and > that > > means HDFS and that means, whatever the provisioning and deployment and > > process management (Ambari or k8s or...) choices are going to be, they > will > > not be a Quick Start. > > > > We’ve always had this problem. The Quick Start simply can’t produce a > > system capable of durability because prerequisites for durability are not > > quick to set up. > > > > Is this exclusively due to the implementation of `LocalFileSystem` or are > there other issues at play? I've seen there's also `RawLocalFileSystem` but > I haven't investigated their relationship, it's capabilities, or if we > might profit from its use for the Quick Start experience. > > Specifically about /tmp... I agree that’s not a good default. Time and > > again I’ve heard people complain that the tmp cleaner has removed their > > test data. It shouldn’t be surprising but is and that is real feedback on > > mismatch of user expectation to what we are providing in that > > configuration. Addressing this aspect of the Quick Start experience would > > be a simple change: make the default a new directory in $HOME, perhaps > > “.hbase” . > > > > I propose changing the default value of `hbase.tmp.dir` as shipped in the > default hbase-site.xml to be simply `tmp`, as I documented in my change on > HBASE-24106. That way it's not hidden somewhere and it's self-contained to > this unpacking of the source/binary distribution. I.e., there's no need to > worry about upgrading the data stored there when a user experiments with a > new version. > > > On Apr 15, 2020, at 9:40 AM, Nick Dimiduk wrote: > > > > > > Hi folks, > > > > > > I'd like to bring up the topic of the experience of new users as it > > > pertains to use of the `LocalFileSystem` and its associated (lack of) > > data > > > durability guarantees. By default, an unconfigured HBase runs with its > > root > > > directory on a `file:///` path. This patch is picked up as an instance > of > > > `LocalFileSystem`. Hadoop has long offered this class, but it has never > > > supported `hsync` or `hflush` stream characteristics. Thus, when HBase > > runs > > > on this configuration, it is unable to ensure that WAL writes are > > durable, > > > and thus will ACK a write without this assurance. This is the case, > even > > > when running in a fully durable WAL mode. > > > > > > This impacts a new user, someone kicking the tires on HBase following > our > > > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase > > will > > > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book > > > describes a process of disabling enforcement of stream capability > > > enforcement as a first step. This is a mandatory configuration for > > running > > > HBase directly out of our binary distribution. > > > > > > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on > > > 2.8: log a warning and cary on. The critique of this approach is that > > it's > > > far too subtle, too quiet for a system operating in a state known to > not > > > provide data durability. > > > > > > I have two assumptions/concerns around the state of things, which > > prompted > > > my solution on HBASE-24086 and the associated doc update on > HBASE-24106. > > > > > > 1. No one should be running a production system on `LocalFileSystem`. > > > > > > The initial implementation checked both for `LocalFileSystem` and > > > `hbase.cluster.distributed`. When running on the former and the latter > is > > > false, we assume the user is running a non-production deployment and > > carry > > > on with the warning. When the latter is true, we assume the user > > intended a > > > production deployment and the process terminates due to stream > capability > > > enforcement. Subsequent code review resulted in skipping the > > > `hbase.cluster.distributed` check and simply warning, as was done on > 2.8 > > > and earlier. > > > > > > (As I understand it, we've long used the `hbase.cluster.distributed` > > > configuration to decide if the user intends this runtime to be a > > production > > > deployment or not.) > > > > > > Is this a faulty assumption? Is there a use-case we support where we > > > condone running production deployment on the non-durable > > `LocalFileSystem`? > > > > > > 2. The
Re: [DISCUSS] Arrange Events for 10-year Anniversary
Yeah, the TLP announcement date is good for that. thanks, esteban. -- Cloudera, Inc. On Wed, Apr 15, 2020 at 12:07 PM Sean Busbey wrote: > That was probably when our community was still covered under the > Hadoop project. Essentially our version of being in the incubator. > > This is when we became our own TLP: > > https://whimsy.apache.org/board/minutes/HBase.html#2010-04-21 > > On Wed, Apr 15, 2020 at 11:25 AM Vladimir Rodionov > wrote: > > > > 2020 - 10 = 2010. As far as I remember I joined HBase community in 2009 > :) > > and I am pretty sure that Mr. Stack did it even earlier. > > > > Best regards, > > Vlad > > > > On Wed, Apr 15, 2020 at 5:57 AM Yu Li wrote: > > > > > Dear all, > > > > > > Since our project has reached its 10th birthday, and 10 years is > definitely > > > a great milestone, I propose to arrange some special (virtual) events > for > > > celebration. What comes into my mind include: > > > > > > * Open threads to collect voices from our dev/user mailing list, like > "what > > > do you want to say to HBase for its 10th birthday" (as well as our > twitter > > > accounts maybe, if any) > > > > > > * Arrange some online interviews to both PMC members and our customers. > > > Some of us have been in this project all the way and there must be some > > > good stories to tell, as well as expectations for the future. > > > > > > * Join the Apache Feathercast as suggested in another thread. > > > > > > * Form a blogpost to include all above events as an official > celebration. > > > > > > What do you think? Any other good ideas? Looking forward to more voices > > > (smile). > > > > > > Best Regards, > > > Yu > > > >
[jira] [Resolved] (HBASE-24175) [Flakey Tests] TestSecureExportSnapshot FileNotFoundException
[ https://issues.apache.org/jira/browse/HBASE-24175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24175. --- Resolution: Fixed Pushed addendum to branch-2.2+. Re-resolving. Lets see if this catches all of the '/tmp' references. > [Flakey Tests] TestSecureExportSnapshot FileNotFoundException > - > > Key: HBASE-24175 > URL: https://issues.apache.org/jira/browse/HBASE-24175 > Project: HBase > Issue Type: Sub-task > Components: flakies >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.5 > > Attachments: > 0001-HBASE-24175-Flakey-Tests-TestSecureExportSnapshot-Fi.addendum.patch, > 0001-HBASE-24175-Flakey-Tests-TestSecureExportSnapshot-Fi.patch > > > Why we writing '/tmp' dir? > {code} > Error Message > org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > Stacktrace > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > Caused by: org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > Caused by: java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)
On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey wrote: > I think the first assumption no longer holds. Especially with the move > to flexible compute environments I regularly get asked by folks what > the smallest HBase they can start with for production. I can keep > saying 3/5/7 nodes or whatever but I guarantee there are folks who > want to and will run HBase with a single node. Probably those > deployments won't want to have the distributed flag set. None of them > really have a good option for where the WALs go, and failing loud when > they try to go to LocalFileSystem is the best option I've seen so far > to make sure folks realize they are getting into muddy waters. > I think this is where we disagree. My answer to this same question is 12 node: 3 "coordinator" hosts for HA ZK, HDFS, and HBase master + 9 "worker" hosts for replicated data serving and storage. Tweak the number of workers and the replication factor if you like, but that's how you get a durable, available deployment suitable for an online production solution. Anything smaller than this and you're in the "muddy waters" of under-replicated distributed system failure domains. I agree with the second assumption. Our quickstart in general is too > complicated. Maybe if we include big warnings in the guide itself, we > could make a quickstart specific artifact to download that has the > unsafe disabling config in place? > I'm not a fan of the dedicated artifact as a binary tarball. I think that approach fractures the brand of our product and emphasizes the idea that it's even more complicated. If we want a dedicated quick start experience, I would advocate investing the resources into something more like a learning laboratory that is accompanied with a runtime image in a VM or container. Last fall I toyed with the idea of adding an "hbase-local" module to > the hbase-filesystem repo that could start us out with some > optimizations for single node set ups. We could start with a fork of > RawLocalFileSystem (which will call OutputStream flush operations in > response to hflush/hsync) that properly advertises its > StreamCapabilities to say that it supports the operations we need. > Alternatively we could make our own implementation of FileSystem that > uses NIO stuff. Either of these approaches would solve both problems. > I find this approach more palatable than a custom quick start binary tarball. On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk wrote: > > > > Hi folks, > > > > I'd like to bring up the topic of the experience of new users as it > > pertains to use of the `LocalFileSystem` and its associated (lack of) > data > > durability guarantees. By default, an unconfigured HBase runs with its > root > > directory on a `file:///` path. This patch is picked up as an instance of > > `LocalFileSystem`. Hadoop has long offered this class, but it has never > > supported `hsync` or `hflush` stream characteristics. Thus, when HBase > runs > > on this configuration, it is unable to ensure that WAL writes are > durable, > > and thus will ACK a write without this assurance. This is the case, even > > when running in a fully durable WAL mode. > > > > This impacts a new user, someone kicking the tires on HBase following our > > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase > will > > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book > > describes a process of disabling enforcement of stream capability > > enforcement as a first step. This is a mandatory configuration for > running > > HBase directly out of our binary distribution. > > > > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on > > 2.8: log a warning and cary on. The critique of this approach is that > it's > > far too subtle, too quiet for a system operating in a state known to not > > provide data durability. > > > > I have two assumptions/concerns around the state of things, which > prompted > > my solution on HBASE-24086 and the associated doc update on HBASE-24106. > > > > 1. No one should be running a production system on `LocalFileSystem`. > > > > The initial implementation checked both for `LocalFileSystem` and > > `hbase.cluster.distributed`. When running on the former and the latter is > > false, we assume the user is running a non-production deployment and > carry > > on with the warning. When the latter is true, we assume the user > intended a > > production deployment and the process terminates due to stream capability > > enforcement. Subsequent code review resulted in skipping the > > `hbase.cluster.distributed` check and simply warning, as was done on 2.8 > > and earlier. > > > > (As I understand it, we've long used the `hbase.cluster.distributed` > > configuration to decide if the user intends this runtime to be a > production > > deployment or not.) > > > > Is this a faulty assumption? Is there a use-case we support where we > > condone running production deployment on the non-durable > `LocalFileSystem`
Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)
On Wed, Apr 15, 2020 at 10:28 AM Andrew Purtell wrote: > Nick's mail doesn't make a distinction between avoiding data loss via > typical tmp cleaner configurations, unfortunately adjacent to mention of > "durability", and real data durability, which implies more than what a > single system configuration can offer, no matter how many tweaks we make to > LocalFileSystem. Maybe I'm being pedantic but this is something to be > really clear about IMHO. > I prefer to focus the attention of this thread to the question of data durability via `FileSystem` characteristics. I agree that there are concerns of durability (and others) around the use of the path under /tmp. Let's keep that discussion in the other thread. On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey wrote: > > > I think the first assumption no longer holds. Especially with the move > > to flexible compute environments I regularly get asked by folks what > > the smallest HBase they can start with for production. I can keep > > saying 3/5/7 nodes or whatever but I guarantee there are folks who > > want to and will run HBase with a single node. Probably those > > deployments won't want to have the distributed flag set. None of them > > really have a good option for where the WALs go, and failing loud when > > they try to go to LocalFileSystem is the best option I've seen so far > > to make sure folks realize they are getting into muddy waters. > > > > I agree with the second assumption. Our quickstart in general is too > > complicated. Maybe if we include big warnings in the guide itself, we > > could make a quickstart specific artifact to download that has the > > unsafe disabling config in place? > > > > Last fall I toyed with the idea of adding an "hbase-local" module to > > the hbase-filesystem repo that could start us out with some > > optimizations for single node set ups. We could start with a fork of > > RawLocalFileSystem (which will call OutputStream flush operations in > > response to hflush/hsync) that properly advertises its > > StreamCapabilities to say that it supports the operations we need. > > Alternatively we could make our own implementation of FileSystem that > > uses NIO stuff. Either of these approaches would solve both problems. > > > > On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk > wrote: > > > > > > Hi folks, > > > > > > I'd like to bring up the topic of the experience of new users as it > > > pertains to use of the `LocalFileSystem` and its associated (lack of) > > data > > > durability guarantees. By default, an unconfigured HBase runs with its > > root > > > directory on a `file:///` path. This patch is picked up as an instance > of > > > `LocalFileSystem`. Hadoop has long offered this class, but it has never > > > supported `hsync` or `hflush` stream characteristics. Thus, when HBase > > runs > > > on this configuration, it is unable to ensure that WAL writes are > > durable, > > > and thus will ACK a write without this assurance. This is the case, > even > > > when running in a fully durable WAL mode. > > > > > > This impacts a new user, someone kicking the tires on HBase following > our > > > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase > > will > > > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book > > > describes a process of disabling enforcement of stream capability > > > enforcement as a first step. This is a mandatory configuration for > > running > > > HBase directly out of our binary distribution. > > > > > > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on > > > 2.8: log a warning and cary on. The critique of this approach is that > > it's > > > far too subtle, too quiet for a system operating in a state known to > not > > > provide data durability. > > > > > > I have two assumptions/concerns around the state of things, which > > prompted > > > my solution on HBASE-24086 and the associated doc update on > HBASE-24106. > > > > > > 1. No one should be running a production system on `LocalFileSystem`. > > > > > > The initial implementation checked both for `LocalFileSystem` and > > > `hbase.cluster.distributed`. When running on the former and the latter > is > > > false, we assume the user is running a non-production deployment and > > carry > > > on with the warning. When the latter is true, we assume the user > > intended a > > > production deployment and the process terminates due to stream > capability > > > enforcement. Subsequent code review resulted in skipping the > > > `hbase.cluster.distributed` check and simply warning, as was done on > 2.8 > > > and earlier. > > > > > > (As I understand it, we've long used the `hbase.cluster.distributed` > > > configuration to decide if the user intends this runtime to be a > > production > > > deployment or not.) > > > > > > Is this a faulty assumption? Is there a use-case we support where we > > > condone running production deployment on the non-durable > > `LocalFileSystem`? > > > > > > 2. The Quick
Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)
This thread talks of “durability” via filesystem characteristics but also for single system quick Start type deployments. For durability we need multi server deployments. No amount of hacking a single system deployment is going to give us durability as users will expect (“don’t lose my data”). I believe my comments are on topic. > On Apr 15, 2020, at 11:03 AM, Nick Dimiduk wrote: > > On Wed, Apr 15, 2020 at 10:28 AM Andrew Purtell wrote: > >> Nick's mail doesn't make a distinction between avoiding data loss via >> typical tmp cleaner configurations, unfortunately adjacent to mention of >> "durability", and real data durability, which implies more than what a >> single system configuration can offer, no matter how many tweaks we make to >> LocalFileSystem. Maybe I'm being pedantic but this is something to be >> really clear about IMHO. >> > > I prefer to focus the attention of this thread to the question of data > durability via `FileSystem` characteristics. I agree that there are > concerns of durability (and others) around the use of the path under /tmp. > Let's keep that discussion in the other thread. > >> On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey wrote: >> >>> I think the first assumption no longer holds. Especially with the move >>> to flexible compute environments I regularly get asked by folks what >>> the smallest HBase they can start with for production. I can keep >>> saying 3/5/7 nodes or whatever but I guarantee there are folks who >>> want to and will run HBase with a single node. Probably those >>> deployments won't want to have the distributed flag set. None of them >>> really have a good option for where the WALs go, and failing loud when >>> they try to go to LocalFileSystem is the best option I've seen so far >>> to make sure folks realize they are getting into muddy waters. >>> >>> I agree with the second assumption. Our quickstart in general is too >>> complicated. Maybe if we include big warnings in the guide itself, we >>> could make a quickstart specific artifact to download that has the >>> unsafe disabling config in place? >>> >>> Last fall I toyed with the idea of adding an "hbase-local" module to >>> the hbase-filesystem repo that could start us out with some >>> optimizations for single node set ups. We could start with a fork of >>> RawLocalFileSystem (which will call OutputStream flush operations in >>> response to hflush/hsync) that properly advertises its >>> StreamCapabilities to say that it supports the operations we need. >>> Alternatively we could make our own implementation of FileSystem that >>> uses NIO stuff. Either of these approaches would solve both problems. >>> >>> On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk >> wrote: Hi folks, I'd like to bring up the topic of the experience of new users as it pertains to use of the `LocalFileSystem` and its associated (lack of) >>> data durability guarantees. By default, an unconfigured HBase runs with its >>> root directory on a `file:///` path. This patch is picked up as an instance >> of `LocalFileSystem`. Hadoop has long offered this class, but it has never supported `hsync` or `hflush` stream characteristics. Thus, when HBase >>> runs on this configuration, it is unable to ensure that WAL writes are >>> durable, and thus will ACK a write without this assurance. This is the case, >> even when running in a fully durable WAL mode. This impacts a new user, someone kicking the tires on HBase following >> our Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase >>> will WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book describes a process of disabling enforcement of stream capability enforcement as a first step. This is a mandatory configuration for >>> running HBase directly out of our binary distribution. HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on 2.8: log a warning and cary on. The critique of this approach is that >>> it's far too subtle, too quiet for a system operating in a state known to >> not provide data durability. I have two assumptions/concerns around the state of things, which >>> prompted my solution on HBASE-24086 and the associated doc update on >> HBASE-24106. 1. No one should be running a production system on `LocalFileSystem`. The initial implementation checked both for `LocalFileSystem` and `hbase.cluster.distributed`. When running on the former and the latter >> is false, we assume the user is running a non-production deployment and >>> carry on with the warning. When the latter is true, we assume the user >>> intended a production deployment and the process terminates due to stream >> capability enforcement. Subsequent code review resulted in skipping the `hbase.cluster.distributed` check and simply warning, as was done on >> 2
Re: [DISCUSS] Arrange Events for 10-year Anniversary
Thank you, Yu. Hello, everyone --my apologies for seemingly ignoring you. Misty Linville contacted me earlier about this, but I've been snagged with some urgent business, thus my delayed response. I'll be happy to support your anniversary, and can publish a Foundation announcement, similar to https://s.apache.org/ApacheSVN20 but likely with fewer/shorter testimonials. We can incorporate some from the suggested bullet point below into this document. If the community would like to pursue this, we'll need to coordinate with the PMC. If you can please confirm the date of the anniversary, I'd appreciate it. Warmly, Sally + copying press@ to keep everyone in the loop - - - Vice President Marketing & Publicity Vice President Sponsor Relations The Apache Software Foundation Tel +1 617 921 8656 | s...@apache.org On Wed, Apr 15, 2020, at 08:49, Yu Li wrote: > Dear all, > > Since our project has reached its 10th birthday, and 10 years is definitely a > great milestone, I propose to arrange some special (virtual) events for > celebration. What comes into my mind include: > > * Open threads to collect voices from our dev/user mailing list, like "what > do you want to say to HBase for its 10th birthday" (as well as our twitter > accounts maybe, if any) > > * Arrange some online interviews to both PMC members and our customers. Some > of us have been in this project all the way and there must be some good > stories to tell, as well as expectations for the future. > > * Join the Apache Feathercast as suggested in another thread. > > * Form a blogpost to include all above events as an official celebration. > > What do you think? Any other good ideas? Looking forward to more voices > (smile). > > Best Regards, > Yu
[jira] [Resolved] (HBASE-24183) [flakey test] replication.TestAddToSerialReplicationPeer
[ https://issues.apache.org/jira/browse/HBASE-24183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxiang Sun resolved HBASE-24183. -- Resolution: Fixed Pushed the patch, will monitor flaky test board. > [flakey test] replication.TestAddToSerialReplicationPeer > > > Key: HBASE-24183 > URL: https://issues.apache.org/jira/browse/HBASE-24183 > Project: HBase > Issue Type: Test > Components: Client >Affects Versions: 3.0.0, 2.3.0, 2.4.0 >Reporter: Huaxiang Sun >Assignee: Hua Xiang >Priority: Major > Fix For: 3.0.0, 2.3.0 > > > From both 2.3 and branch-2 flaky test board, it constantly runs into the > following flaky: > > {code:java} > org.apache.hadoop.hbase.replication.TestAddToSerialReplicationPeer.testAddToSerialPeerFailing > for the past 1 build (Since #6069 )Took 15 sec.Error MessageSequence id go > backwards from 122 to 24Stacktracejava.lang.AssertionError: Sequence id go > backwards from 122 to 24 > at > org.apache.hadoop.hbase.replication.TestAddToSerialReplicationPeer.testAddToSerialPeer(TestAddToSerialReplicationPeer.java:176) > Standard Output{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)
FileOutputStream.getFileChannel().force(true) will get all durability we need. Just a simple code change? On Wed, Apr 15, 2020 at 12:32 PM Andrew Purtell wrote: > This thread talks of “durability” via filesystem characteristics but also > for single system quick Start type deployments. For durability we need > multi server deployments. No amount of hacking a single system deployment > is going to give us durability as users will expect (“don’t lose my data”). > I believe my comments are on topic. > > > > On Apr 15, 2020, at 11:03 AM, Nick Dimiduk wrote: > > > > On Wed, Apr 15, 2020 at 10:28 AM Andrew Purtell > wrote: > > > >> Nick's mail doesn't make a distinction between avoiding data loss via > >> typical tmp cleaner configurations, unfortunately adjacent to mention of > >> "durability", and real data durability, which implies more than what a > >> single system configuration can offer, no matter how many tweaks we > make to > >> LocalFileSystem. Maybe I'm being pedantic but this is something to be > >> really clear about IMHO. > >> > > > > I prefer to focus the attention of this thread to the question of data > > durability via `FileSystem` characteristics. I agree that there are > > concerns of durability (and others) around the use of the path under > /tmp. > > Let's keep that discussion in the other thread. > > > >> On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey wrote: > >> > >>> I think the first assumption no longer holds. Especially with the move > >>> to flexible compute environments I regularly get asked by folks what > >>> the smallest HBase they can start with for production. I can keep > >>> saying 3/5/7 nodes or whatever but I guarantee there are folks who > >>> want to and will run HBase with a single node. Probably those > >>> deployments won't want to have the distributed flag set. None of them > >>> really have a good option for where the WALs go, and failing loud when > >>> they try to go to LocalFileSystem is the best option I've seen so far > >>> to make sure folks realize they are getting into muddy waters. > >>> > >>> I agree with the second assumption. Our quickstart in general is too > >>> complicated. Maybe if we include big warnings in the guide itself, we > >>> could make a quickstart specific artifact to download that has the > >>> unsafe disabling config in place? > >>> > >>> Last fall I toyed with the idea of adding an "hbase-local" module to > >>> the hbase-filesystem repo that could start us out with some > >>> optimizations for single node set ups. We could start with a fork of > >>> RawLocalFileSystem (which will call OutputStream flush operations in > >>> response to hflush/hsync) that properly advertises its > >>> StreamCapabilities to say that it supports the operations we need. > >>> Alternatively we could make our own implementation of FileSystem that > >>> uses NIO stuff. Either of these approaches would solve both problems. > >>> > >>> On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk > >> wrote: > > Hi folks, > > I'd like to bring up the topic of the experience of new users as it > pertains to use of the `LocalFileSystem` and its associated (lack of) > >>> data > durability guarantees. By default, an unconfigured HBase runs with its > >>> root > directory on a `file:///` path. This patch is picked up as an instance > >> of > `LocalFileSystem`. Hadoop has long offered this class, but it has > never > supported `hsync` or `hflush` stream characteristics. Thus, when HBase > >>> runs > on this configuration, it is unable to ensure that WAL writes are > >>> durable, > and thus will ACK a write without this assurance. This is the case, > >> even > when running in a fully durable WAL mode. > > This impacts a new user, someone kicking the tires on HBase following > >> our > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase > >>> will > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book > describes a process of disabling enforcement of stream capability > enforcement as a first step. This is a mandatory configuration for > >>> running > HBase directly out of our binary distribution. > > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running > on > 2.8: log a warning and cary on. The critique of this approach is that > >>> it's > far too subtle, too quiet for a system operating in a state known to > >> not > provide data durability. > > I have two assumptions/concerns around the state of things, which > >>> prompted > my solution on HBASE-24086 and the associated doc update on > >> HBASE-24106. > > 1. No one should be running a production system on `LocalFileSystem`. > > The initial implementation checked both for `LocalFileSystem` and > `hbase.cluster.distributed`. When running on the former and the latter > >> is > false, we assume the user is running a n
Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)
This should work for locally attached storage for sure. On Wed, Apr 15, 2020 at 3:52 PM Vladimir Rodionov wrote: > FileOutputStream.getFileChannel().force(true) will get all durability we > need. Just a simple code change? > > > On Wed, Apr 15, 2020 at 12:32 PM Andrew Purtell > wrote: > >> This thread talks of “durability” via filesystem characteristics but also >> for single system quick Start type deployments. For durability we need >> multi server deployments. No amount of hacking a single system deployment >> is going to give us durability as users will expect (“don’t lose my data”). >> I believe my comments are on topic. >> >> >> > On Apr 15, 2020, at 11:03 AM, Nick Dimiduk wrote: >> > >> > On Wed, Apr 15, 2020 at 10:28 AM Andrew Purtell >> wrote: >> > >> >> Nick's mail doesn't make a distinction between avoiding data loss via >> >> typical tmp cleaner configurations, unfortunately adjacent to mention >> of >> >> "durability", and real data durability, which implies more than what a >> >> single system configuration can offer, no matter how many tweaks we >> make to >> >> LocalFileSystem. Maybe I'm being pedantic but this is something to be >> >> really clear about IMHO. >> >> >> > >> > I prefer to focus the attention of this thread to the question of data >> > durability via `FileSystem` characteristics. I agree that there are >> > concerns of durability (and others) around the use of the path under >> /tmp. >> > Let's keep that discussion in the other thread. >> > >> >> On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey >> wrote: >> >> >> >>> I think the first assumption no longer holds. Especially with the move >> >>> to flexible compute environments I regularly get asked by folks what >> >>> the smallest HBase they can start with for production. I can keep >> >>> saying 3/5/7 nodes or whatever but I guarantee there are folks who >> >>> want to and will run HBase with a single node. Probably those >> >>> deployments won't want to have the distributed flag set. None of them >> >>> really have a good option for where the WALs go, and failing loud when >> >>> they try to go to LocalFileSystem is the best option I've seen so far >> >>> to make sure folks realize they are getting into muddy waters. >> >>> >> >>> I agree with the second assumption. Our quickstart in general is too >> >>> complicated. Maybe if we include big warnings in the guide itself, we >> >>> could make a quickstart specific artifact to download that has the >> >>> unsafe disabling config in place? >> >>> >> >>> Last fall I toyed with the idea of adding an "hbase-local" module to >> >>> the hbase-filesystem repo that could start us out with some >> >>> optimizations for single node set ups. We could start with a fork of >> >>> RawLocalFileSystem (which will call OutputStream flush operations in >> >>> response to hflush/hsync) that properly advertises its >> >>> StreamCapabilities to say that it supports the operations we need. >> >>> Alternatively we could make our own implementation of FileSystem that >> >>> uses NIO stuff. Either of these approaches would solve both problems. >> >>> >> >>> On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk >> >> wrote: >> >> Hi folks, >> >> I'd like to bring up the topic of the experience of new users as it >> pertains to use of the `LocalFileSystem` and its associated (lack of) >> >>> data >> durability guarantees. By default, an unconfigured HBase runs with >> its >> >>> root >> directory on a `file:///` path. This patch is picked up as an >> instance >> >> of >> `LocalFileSystem`. Hadoop has long offered this class, but it has >> never >> supported `hsync` or `hflush` stream characteristics. Thus, when >> HBase >> >>> runs >> on this configuration, it is unable to ensure that WAL writes are >> >>> durable, >> and thus will ACK a write without this assurance. This is the case, >> >> even >> when running in a fully durable WAL mode. >> >> This impacts a new user, someone kicking the tires on HBase following >> >> our >> Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase >> >>> will >> WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book >> describes a process of disabling enforcement of stream capability >> enforcement as a first step. This is a mandatory configuration for >> >>> running >> HBase directly out of our binary distribution. >> >> HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running >> on >> 2.8: log a warning and cary on. The critique of this approach is that >> >>> it's >> far too subtle, too quiet for a system operating in a state known to >> >> not >> provide data durability. >> >> I have two assumptions/concerns around the state of things, which >> >>> prompted >> my solution on HBASE-24086 and the associated doc update on >> >> HBASE-24106. >> >> 1. No one should be running a production s
[jira] [Resolved] (HBASE-24193) BackPort HBASE-18651 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-24193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reid Chan resolved HBASE-24193. --- Hadoop Flags: Reviewed Resolution: Fixed > BackPort HBASE-18651 to branch-1 > > > Key: HBASE-24193 > URL: https://issues.apache.org/jira/browse/HBASE-24193 > Project: HBase > Issue Type: Improvement >Reporter: Lokesh Khurana >Assignee: Lokesh Khurana >Priority: Major > Fix For: 1.7.0 > > > Backport Jira : > [HBASE-18651|https://issues.apache.org/jira/browse/HBASE-18651] to branch-1 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24175) [Flakey Tests] TestSecureExportSnapshot FileNotFoundException
[ https://issues.apache.org/jira/browse/HBASE-24175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack reopened HBASE-24175: --- Reopen to apply addendum. Nightly just failed with this: health checks / yetus jdk8 hadoop3 checks / org.apache.hadoop.hbase.snapshot.TestExportSnapshotAdjunct. Failing for the past 1 build (Since Failed#2610 ) Took 15 ms. add description Error Message original.hbase.dir /tmp/hbase-jenkins/hbase Stacktrace java.lang.AssertionError: original.hbase.dir /tmp/hbase-jenkins/hbase at org.apache.hadoop.hbase.snapshot.TestExportSnapshotAdjunct.setUpBeforeClass(TestExportSnapshotAdjunct.java:86) > [Flakey Tests] TestSecureExportSnapshot FileNotFoundException > - > > Key: HBASE-24175 > URL: https://issues.apache.org/jira/browse/HBASE-24175 > Project: HBase > Issue Type: Sub-task > Components: flakies >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.5 > > Attachments: > 0001-HBASE-24175-Flakey-Tests-TestSecureExportSnapshot-Fi.addendum.patch, > 0001-HBASE-24175-Flakey-Tests-TestSecureExportSnapshot-Fi.patch > > > Why we writing '/tmp' dir? > {code} > Error Message > org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > Stacktrace > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > Caused by: org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > Caused by: java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24175) [Flakey Tests] TestSecureExportSnapshot FileNotFoundException
[ https://issues.apache.org/jira/browse/HBASE-24175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24175. --- Resolution: Fixed Re-resovling after attaching second one-liner addendum. > [Flakey Tests] TestSecureExportSnapshot FileNotFoundException > - > > Key: HBASE-24175 > URL: https://issues.apache.org/jira/browse/HBASE-24175 > Project: HBase > Issue Type: Sub-task > Components: flakies >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.5 > > Attachments: > 0001-HBASE-24175-Flakey-Tests-TestSecureExportSnapshot-Fi.addendum.patch, > 0001-HBASE-24175-Flakey-Tests-TestSecureExportSnapshot-Fi.addendum2.patch, > 0001-HBASE-24175-Flakey-Tests-TestSecureExportSnapshot-Fi.patch > > > Why we writing '/tmp' dir? > {code} > Error Message > org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > Stacktrace > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > Caused by: org.apache.hadoop.service.ServiceStateException: > java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > Caused by: java.io.FileNotFoundException: File > file:/tmp/hadoop-yarn-jenkins/node-attribute/nodeattribute.mirror.writing > does not exist > at > org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot.setUpBeforeClass(TestSecureExportSnapshot.java:56) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)