On Sat, Apr 30, 2016 at 6:33 AM, Ted Yu <yuzhih...@gmail.com> wrote: > What about support for Transparent Data Encryption feature which was > introduced in Apache Hadoop 2.6.0 ? > > Transparent: "...(of a process or interface) functioning without the user being aware of its presence." St.Ack
> On Fri, Apr 29, 2016 at 6:24 PM, 张铎 <palomino...@gmail.com> wrote: > > > Yes, it does. There is testcase that enumerates all the possible > protection > > level(authentication, integrity and privacy) and encryption > algorithm(none, > > 3des, rc4). > > > > > > > https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/io/asyncfs/TestSaslFanOutOneBlockAsyncDFSOutput.java > > > > I have also tested it in a secure cluster(hbase-2.0.0-SNAPSHOT and > > hadoop-2.4.0). > > > > Thanks. > > > > 2016-04-30 2:32 GMT+08:00 Gary Helmling <ghelml...@gmail.com>: > > > > > How well has this been tested on secure clusters? I know SASL support > > was > > > lacking initially, but I believe it had been added? Does AsyncFSWAL > > > support all the HDFS transport encryption options? > > > > > > > > > On Fri, Apr 29, 2016 at 12:05 AM Stack <st...@duboce.net> wrote: > > > > > > > I'm +1 on enabling asyncfswal as default in 2.0: > > > > > > > > + We'll have plenty of time to figure issues if any if we get it in > > now, > > > > early. > > > > + The improvement in throughput is substantial > > > > + There are now less moving parts > > > > + A critical piece of our write path is much less opaque in its > > workings > > > > and no longer (effectively) immutable > > > > > > > > St.Ack > > > > > > > > > > > > On Thu, Apr 28, 2016 at 11:53 PM, 张铎 <palomino...@gmail.com> wrote: > > > > > > > > > I‘ve done dig in HDFS and HADOOP proejcts and found that there is > an > > > > active > > > > > issue HADOOP-12910 that related to asynchronous FileSystem > > > > implementation. > > > > > > > > > > I have left some comments on it, maybe we could start from there. > > > > > > > > > > Thanks. > > > > > > > > > > 2016-04-29 14:42 GMT+08:00 Stack <st...@duboce.net>: > > > > > > > > > > > On Thu, Apr 28, 2016 at 8:47 PM, Ted Yu <yuzhih...@gmail.com> > > wrote: > > > > > > > > > > > > > Last comment on HDFS-916 was from 2010. > > > > > > > > > > > > > > Suggest making a new issue or reviving discussion on HDFS-916 > > > > > (currently > > > > > > > assigned to Todd). > > > > > > > > > > > > > > > > > > > > Duo is on it. Some mileage and confidence in the new code would > be > > > good > > > > > to > > > > > > have before going to HDFS (Getting stuff into HDFS is a PITA at > the > > > > best > > > > > of > > > > > > times... lets have a good case when we go there). > > > > > > > > > > > > > > > > > > > bq. The fallback implementation is not aim to get a good > > > performance > > > > > > > > > > > > > > For more than two weeks, I have been working with Azure Data > Lake > > > > > > > developers so that all hbase system tests pass on ADLS - there > > were > > > > > > subtle > > > > > > > differences between ADLS and hdfs. > > > > > > > > > > > > > > If switching to AsyncWAL gives either WASB or ADLS subpar > > > > performance, > > > > > it > > > > > > > would make upgrading to hbase 2.x unacceptable for their users. > > > > > > > > > > > > > > > > > > > > Just use FSHLog instead of asyncfswal when up on WASB. Its just a > > > > config > > > > > > change. > > > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 28, 2016 at 8:39 PM, 张铎 <palomino...@gmail.com> > > wrote: > > > > > > > > > > > > > > > 2016-04-29 11:35 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > > > > > > > > > > > > > > > > > bq. AsyncFSOutput will be in HDFS-3.0 > > > > > > > > > > > > > > > > > > Is there HDFS JIRA for the above ? Can you share the > number ? > > > > > > > > > > > > > > > > > I have not filed a new one but there are bunch of related > > issues > > > > > > already, > > > > > > > > such as this one > > https://issues.apache.org/jira/browse/HDFS-916 > > > > > > > > > > > > > > > > > > > > > > > > > > bq. Just wrap FSDataOutputStream to make it act like an > > > > > asynchronous > > > > > > > > output > > > > > > > > > > > > > > > > > > Can you be a bit more specific ? > > > > > > > > > HBase currently works with WASB and Azure Data Lake. Does > the > > > > above > > > > > > > mean > > > > > > > > > their performance would suffer ? > > > > > > > > > > > > > > > > > Yes, the performance will suffer... > > > > > > > > The fallback implementation is not aim to get a good > > performance, > > > > > just > > > > > > > for > > > > > > > > compatibility with any FileSystem implementation. > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 28, 2016 at 8:30 PM, 张铎 <palomino...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Inline comments. > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > 2016-04-29 10:57 GMT+08:00 Sean Busbey < > > bus...@cloudera.com > > > >: > > > > > > > > > > > > > > > > > > > > > I am nervous about having default out-of-the-box new > > HBase > > > > > users > > > > > > > > > reliant > > > > > > > > > > on > > > > > > > > > > > a bespoke HDFS client, especially given Hadoop's > > > > compatibility > > > > > > > > > > > promises and history. Answers for these questions would > > > make > > > > me > > > > > > > more > > > > > > > > > > > confident: > > > > > > > > > > > > > > > > > > > > > > 1) Where are we on getting the client-side changes to > > HDFS > > > > > pushed > > > > > > > > back > > > > > > > > > > > upstream? > > > > > > > > > > > > > > > > > > > > > No progress yet... Here I want to tell a good story that > > > HBase > > > > is > > > > > > > > already > > > > > > > > > > use it as default :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2) How well do we detect when our FS is not HDFS and > what > > > > does > > > > > > > > > > > fallback look like? > > > > > > > > > > > > > > > > > > > > > Just wrap FSDataOutputStream to make it act like an > > > > asynchronous > > > > > > > > > > output(call hflush in a separated thread). The > performance > > is > > > > not > > > > > > > good > > > > > > > > I > > > > > > > > > > think. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3) Will this mean altering the versions of Hadoop we > > label > > > as > > > > > > > > > > > supported for HBase 2.y+? > > > > > > > > > > > > > > > > > > > > > I have tested with hadoop versions from 2.4.x to 2.7.x, > so > > I > > > > > don't > > > > > > > > think > > > > > > > > > we > > > > > > > > > > need to change the supported versions? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4) How are we going to ensure our client remains > > compatible > > > > > with > > > > > > > > newer > > > > > > > > > > > Hadoop releases? > > > > > > > > > > > > > > > > > > > > > We can not ensure, HDFS always breaks HBase at a new > > > release... > > > > > > > > > > I need to test AsyncFSWAL on every new 2.x release and > make > > > it > > > > > > > > compatible > > > > > > > > > > with that version. And back to #1, I think we should make > > > sure > > > > > that > > > > > > > the > > > > > > > > > > AsyncFSOutput will be in HDFS-3.0. And in HBase-3.0, we > can > > > > > > > introduce a > > > > > > > > > new > > > > > > > > > > 'AsyncFSWAL' that use the AsyncFSOutput in HDFS. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 28, 2016 at 9:42 PM, Duo Zhang < > > > > > zhang...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > Six month after I filed HBASE-14790... > > > > > > > > > > > > > > > > > > > > > > > > Now the AsyncFSWAL is ready. The WALPE result shows > > that > > > it > > > > > is > > > > > > > > > > > *1.4x~3.7x* > > > > > > > > > > > > faster than FSHLog. The ITBLL result turns out that > it > > is > > > > > *not > > > > > > > bad* > > > > > > > > > > than > > > > > > > > > > > > FSHLog(the master branch is not that stable > itself...). > > > > > > > > > > > > > > > > > > > > > > > > More details can be found on HBASE-15536. > > > > > > > > > > > > > > > > > > > > > > > > So here we propose to change the default WAL from > > FSHLog > > > to > > > > > > > > > AsyncFSWAL. > > > > > > > > > > > > Suggestions are welcomed. > > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > busbey > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >