RE: Apache Hadoop 2.8.3 Release Plan
Thanks Andrew for the comments. Yes, if we're "strictly" following the "maintenance release" practice, that'd be great and it's never my intent to overload it and cause mess. >> If we're struggling with being able to deliver new features in a safe and >> timely fashion, let's try to address that... This is interesting. Do you aware any means to do that? Thanks! Regards, Kai -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, November 21, 2017 2:22 PM To: Zheng, Kai Cc: Junping Du ; common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Apache Hadoop 2.8.3 Release Plan I'm against including new features in maintenance releases, since they're meant to be bug-fix only. If we're struggling with being able to deliver new features in a safe and timely fashion, let's try to address that, not overload the meaning of "maintenance release". Best, Andrew On Mon, Nov 20, 2017 at 5:20 PM, Zheng, Kai wrote: > Hi Junping, > > Thank you for making 2.8.2 happen and now planning the 2.8.3 release. > > I have an ask, is it convenient to include the back port work for OSS > connector module? We have some Hadoop users that wish to have it by > default for convenience, though in the past they used it by back > porting themselves. I have raised this and got thoughts from Chris and > Steve. Looks like this is more wanted for 2.9 but I wanted to ask > again here for broad feedback and thoughts by this chance. The back > port patch is available for > 2.8 and the one for branch-2 was already in. IMO, 2.8.x is promising > as we can see some shift from 2.7.x, hence it's worth more important > features and efforts. How would you think? Thanks! > > https://issues.apache.org/jira/browse/HADOOP-14964 > > Regards, > Kai > > -Original Message- > From: Junping Du [mailto:j...@hortonworks.com] > Sent: Tuesday, November 14, 2017 9:02 AM > To: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; > mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org > Subject: Apache Hadoop 2.8.3 Release Plan > > Hi, > We have several important fixes get landed on branch-2.8 and I > would like to cut off branch-2.8.3 now to start 2.8.3 release work. > So far, I don't see any pending blockers on 2.8.3, so my current > plan is to cut off 1st RC of 2.8.3 in next several days: > - For all coming commits to land on branch-2.8, please mark > the fix version as 2.8.4. > - If there is a really important fix for 2.8.3 and getting > closed, please notify me ahead before landing it on branch-2.8.3. > Please let me know if you have any thoughts or comments on the plan. > > Thanks, > > Junping > > From: dujunp...@gmail.com on behalf of 俊平堵 < > junping...@apache.org> > Sent: Friday, October 27, 2017 3:33 PM > To: gene...@hadoop.apache.org > Subject: [ANNOUNCE] Apache Hadoop 2.8.2 Release. > > Hi all, > > It gives me great pleasure to announce that the Apache Hadoop > community has voted to release Apache Hadoop 2.8.2, which is now > available for download from Apache mirrors[1]. For download > instructions please refer to the Apache Hadoop Release page [2]. > > Apache Hadoop 2.8.2 is the first GA release of Apache Hadoop 2.8 line > and our newest stable release for entire Apache Hadoop project. For > major changes incuded in Hadoop 2.8 line, please refer Hadoop 2.8.2 main > page[3]. > > This release has 315 resolved issues since previous 2.8.1 release with > following > breakdown: >- 91 in Hadoop Common >- 99 in HDFS >- 105 in YARN >- 20 in MapReduce > Please read the log of CHANGES[4] and RELEASENOTES[5] for more details. > > The release news is posted on the Hadoop website too, you can go to > the downloads section directly [6]. > > Thank you all for contributing to the Apache Hadoop release! > > > Cheers, > > Junping > > > [1] http://www.apache.org/dyn/closer.cgi/hadoop/common > > [2] http://hadoop.apache.org/releases.html > > [3] http://hadoop.apache.org/docs/r2.8.2/index.html > > [4] > http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/ > hadoop-common/release/2.8.2/CHANGES.2.8.2.html > > [5] > http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/ > hadoop-common/release/2.8.2/RELEASENOTES.2.8.2.html > > [6] http://hadoop.apache.org/releases.html#Download > > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > - > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >
RE: Apache Hadoop 2.8.3 Release Plan
Hi Junping, Thank you for making 2.8.2 happen and now planning the 2.8.3 release. I have an ask, is it convenient to include the back port work for OSS connector module? We have some Hadoop users that wish to have it by default for convenience, though in the past they used it by back porting themselves. I have raised this and got thoughts from Chris and Steve. Looks like this is more wanted for 2.9 but I wanted to ask again here for broad feedback and thoughts by this chance. The back port patch is available for 2.8 and the one for branch-2 was already in. IMO, 2.8.x is promising as we can see some shift from 2.7.x, hence it's worth more important features and efforts. How would you think? Thanks! https://issues.apache.org/jira/browse/HADOOP-14964 Regards, Kai -Original Message- From: Junping Du [mailto:j...@hortonworks.com] Sent: Tuesday, November 14, 2017 9:02 AM To: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Apache Hadoop 2.8.3 Release Plan Hi, We have several important fixes get landed on branch-2.8 and I would like to cut off branch-2.8.3 now to start 2.8.3 release work. So far, I don't see any pending blockers on 2.8.3, so my current plan is to cut off 1st RC of 2.8.3 in next several days: - For all coming commits to land on branch-2.8, please mark the fix version as 2.8.4. - If there is a really important fix for 2.8.3 and getting closed, please notify me ahead before landing it on branch-2.8.3. Please let me know if you have any thoughts or comments on the plan. Thanks, Junping From: dujunp...@gmail.com on behalf of 俊平堵 Sent: Friday, October 27, 2017 3:33 PM To: gene...@hadoop.apache.org Subject: [ANNOUNCE] Apache Hadoop 2.8.2 Release. Hi all, It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.8.2, which is now available for download from Apache mirrors[1]. For download instructions please refer to the Apache Hadoop Release page [2]. Apache Hadoop 2.8.2 is the first GA release of Apache Hadoop 2.8 line and our newest stable release for entire Apache Hadoop project. For major changes incuded in Hadoop 2.8 line, please refer Hadoop 2.8.2 main page[3]. This release has 315 resolved issues since previous 2.8.1 release with following breakdown: - 91 in Hadoop Common - 99 in HDFS - 105 in YARN - 20 in MapReduce Please read the log of CHANGES[4] and RELEASENOTES[5] for more details. The release news is posted on the Hadoop website too, you can go to the downloads section directly [6]. Thank you all for contributing to the Apache Hadoop release! Cheers, Junping [1] http://www.apache.org/dyn/closer.cgi/hadoop/common [2] http://hadoop.apache.org/releases.html [3] http://hadoop.apache.org/docs/r2.8.2/index.html [4] http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/release/2.8.2/CHANGES.2.8.2.html [5] http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/release/2.8.2/RELEASENOTES.2.8.2.html [6] http://hadoop.apache.org/releases.html#Download - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
RE: [VOTE] Merge yarn-native-services branch into trunk
Cool to have this feature! Thanks Jian and all. Regards, Kai -Original Message- From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org] Sent: Tuesday, November 07, 2017 7:20 AM To: Jian He Cc: yarn-...@hadoop.apache.org; common-...@hadoop.apache.org; Hdfs-dev ; mapreduce-...@hadoop.apache.org Subject: Re: [VOTE] Merge yarn-native-services branch into trunk Congratulations to all the contributors involved, this is a great step forward! +Vinod > On Nov 6, 2017, at 2:40 PM, Jian He wrote: > > Okay, I just merged the branch to trunk (108 commits in total !) > Again, thanks for all who contributed to this feature! > > Jian > > On Nov 6, 2017, at 1:26 PM, Jian He > mailto:j...@hortonworks.com>> wrote: > > Here’s +1 from myself. > The vote passes with 7 (+1) bindings and 2 (+1) non-bindings. > > Thanks for all who voted. I’ll merge to trunk by the end of today. > > Jian > > On Nov 6, 2017, at 8:38 AM, Billie Rinaldi > mailto:billie.rina...@gmail.com>> wrote: > > +1 (binding) > > On Mon, Oct 30, 2017 at 1:06 PM, Jian He > mailto:j...@hortonworks.com>> wrote: > Hi All, > > I would like to restart the vote for merging yarn-native-services to trunk. > Since last vote, we have been working on several issues in documentation, > DNS, CLI modifications etc. We believe now the feature is in a much better > shape. > > Some back ground: > At a high level, the following are the key feautres implemented. > - YARN-5079[1]. A native YARN framework (ApplicationMaster) to orchestrate > existing services to YARN either docker or non-docker based. > - YARN-4793[2]. A Rest API service embeded in RM (optional) for user > to deploy a service via a simple JSON spec > - YARN-4757[3]. Extending today's service registry with a simple DNS > service to enable users to discover services deployed on YARN via > standard DNS lookup > - YARN-6419[4]. UI support for native-services on the new YARN UI All > these new services are optional and are sitting outside of the existing > system, and have no impact on existing system if disabled. > > Special thanks to a team of folks who worked hard towards this: Billie > Rinaldi, Gour Saha, Vinod Kumar Vavilapalli, Jonathan Maron, Rohith Sharma K > S, Sunil G, Akhil PB, Eric Yang. This effort could not be possible without > their ideas and hard work. > Also thanks Allen for some review and verifications. > > Thanks, > Jian > > [1] https://issues.apache.org/jira/browse/YARN-5079 > [2] https://issues.apache.org/jira/browse/YARN-4793 > [3] https://issues.apache.org/jira/browse/YARN-4757 > [4] https://issues.apache.org/jira/browse/YARN-6419 > > > - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
RE: [DISCUSS] A final minor release off branch-2?
Thanks Vinod. >> Of the top of my head, one of the biggest areas is application >> compatibility. When folks move from 2.x to 3.x, are their apps binary >> compatible? Source compatible? Or need changes? I thought these are good concerns from overall perspective. On the other hand, I've discussed with quite a few 3.0 potential users, it looks like most of them are interested in the erasure coding feature and a major scenario for that is to back up their large volume of data to save storage cost. They might run analytics workload using Hive, Spark, Impala and Kylin on the new cluster based on the version, but it's not a must at the first time. They understand there might be some gaps so they'd migrate their workloads incrementally. For the major analytics workload, we've performed lots of benchmark and integration tests as well as other sides I believe, we did find some issues but they should be fixed in downstream projects. I thought the release of GA will accelerate the progress and expose the issues if any. We couldn't wait for it being matured. There isn't perfectness. >> The main goal of the bridging release is to ease transition on stuff that is >> guaranteed to be broken. This sounds a good consideration. I'm thinking if I'm a Hadoop user, for example, I'm using 2.7.4 or 2.8.2 or whatever 2.x version, would I first upgrade to this bridging release then use the bridge support to upgrade to 3.x version? I'm not sure. On the other hand, I might tend to look for some guides or supports in 3.x docs about how to upgrade from 2.7 to 3.x. Frankly speaking, working on some bridging release not targeting any feature isn't so attractive to me as a contributor. Overall, the final minor release off branch-2 is good, we should also give 3.x more time to evolve and mature, therefore it looks to me we would have to work on two release lines meanwhile for some time. I'd like option C), and suggest we focus on the recent releases. Just some thoughts. Regards, Kai -Original Message- From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org] Sent: Tuesday, November 07, 2017 9:43 AM To: Andrew Wang Cc: Arun Suresh ; common-...@hadoop.apache.org; yarn-...@hadoop.apache.org; Hdfs-dev ; mapreduce-...@hadoop.apache.org Subject: Re: [DISCUSS] A final minor release off branch-2? The main goal of the bridging release is to ease transition on stuff that is guaranteed to be broken. Of the top of my head, one of the biggest areas is application compatibility. When folks move from 2.x to 3.x, are their apps binary compatible? Source compatible? Or need changes? In 1.x -> 2.x upgrade, we did a bunch of work to atleast make old apps be source compatible. This means relooking at the API compatibility in 3.x and their impact of migrating applications. We will have to revist and un-deprecate old APIs, un-delete old APIs and write documentation on how apps can be migrated. Most of this work will be in 3.x line. The bridging release on the other hand will have deprecation for APIs that cannot be undeleted. This may be already have been done in many places. But we need to make sure and fill gaps if any. Other areas that I can recall from the old days - Config migration: Many configs are deprecated or deleted. We need documentation to help folks to move. We also need deprecations in the bridging release for configs that cannot be undeleted. - You mentioned rolling-upgrades: It will be good to exactly outline the type of testing. For e.g., the rolling-upgrades orchestration order has direct implication on the testing done. - Story for downgrades? - Copying data between 2.x clusters and 3.x clusters: Does this work already? Is it broken anywhere that we cannot fix? Do we need bridging features for this work? +Vinod > On Nov 6, 2017, at 12:49 PM, Andrew Wang wrote: > > What are the known gaps that need bridging between 2.x and 3.x? > > From an HDFS perspective, we've tested wire compat, rolling upgrade, > and rollback. > > From a YARN perspective, we've tested wire compat and rolling upgrade. > Arun just mentioned an NM rollback issue that I'm not familiar with. > > Anything else? External to this discussion, these should be documented > as known issues for 3.0. > > Best. > Andrew > > On Sun, Nov 5, 2017 at 1:46 PM, Arun Suresh wrote: > >> Thanks for starting this discussion VInod. >> >> I agree (C) is a bad idea. >> I would prefer (A) given that ATM, branch-2 is still very close to >> branch-2.9 - and it is a good time to make a collective decision to >> lock down commits to branch-2. >> >> I think we should also clearly define what the 'bridging' release >> should be. >> I assume it means the following: >> * Any 2.x user wanting to move to 3.x must first upgrade to the >> bridging release first and then upgrade to the 3.x release. >> * With regard to state store upgrades (at least NM state stores) the >> bridging state stores should be aware of all new 3.x ke
RE: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0
Thanks Sammi. My non-binding +1 to make the release candidate. Regards, Kai -Original Message- From: Chen, Sammi Sent: Friday, September 02, 2016 4:59 PM To: Zheng, Kai ; Andrew Wang ; Arun Suresh Cc: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org; Chen, Sammi Subject: RE: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0 +1 (non-binding). Thanks for driving this Andrew! * Download and built from source. * Setup a 10 node cluster (1 name node + 9 data nodes) * Verified normal HDFS file put/get operation with 3x replication * With 2 data nodes failure, verified HDFS file put/get operation with 3x replication, file integrity is OK * Enable Erasure Code policy "RS-DEFAULT-6-3-64k", verified HDFS file put/get operation * Enable Erasure Code policy "RS-DEFAULT-6-3-64k", with 3 data nodes failure, verified HDFS file put/get operation, file integrity is OK Cheers -Sammi -Original Message----- From: Zheng, Kai Sent: Friday, September 02, 2016 3:25 PM To: Chen, Sammi Subject: FW: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0 Hi Sammi, Could you help provide our feedback? I know you did lots of tests. Thanks! Regards, Kai -Original Message- From: Arun Suresh [mailto:asur...@apache.org] Sent: Friday, September 02, 2016 11:33 AM To: Andrew Wang Cc: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0 +1 (binding). Thanks for driving this Andrew.. * Download and built from source. * Setup a 5 mode cluster. * Verified that MR works with opportunistic containers * Verified that the AMRMClient supports 'allocationRequestId' Cheers -Arun On Thu, Sep 1, 2016 at 4:31 PM, Aaron Fabbri wrote: > +1, non-binding. > > I built everything on OS X and ran the s3a contract tests successfully: > > mvn test -Dtest=org.apache.hadoop.fs.contract.s3a.\* > > ... > > Results : > > > Tests run: 78, Failures: 0, Errors: 0, Skipped: 1 > > > [INFO] > -- > -- > > [INFO] BUILD SUCCESS > > [INFO] > -- > -- > > On Thu, Sep 1, 2016 at 3:39 PM, Andrew Wang > wrote: > > > Good point Allen, I forgot about `hadoop version`. Since it's > > populated > by > > a version-info.properties file, people can always cat that file. > > > > On Thu, Sep 1, 2016 at 3:21 PM, Allen Wittenauer < > a...@effectivemachines.com > > > > > wrote: > > > > > > > > > On Sep 1, 2016, at 3:18 PM, Allen Wittenauer < > a...@effectivemachines.com > > > > > > wrote: > > > > > > > > > > > >> On Sep 1, 2016, at 2:57 PM, Andrew Wang > > > >> > > > wrote: > > > >> > > > >> Steve requested a git hash for this release. This led us into a > brief > > > >> discussion of our use of git tags, wherein we realized that > > > >> although release tags are immutable (start with "rel/"), RC tags are > > > >> not. > This > > is > > > >> based on the HowToRelease instructions. > > > > > > > > We should probably embed the git hash in one of the files > > > > that > > > gets gpg signed. That's an easy change to create-release. > > > > > > > > > (Well, one more easily accessible than 'hadoop version') > > > - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
RE: [DISCUSS] The order of classpath isolation work and updating/shading dependencies on trunk
For the leveldb thing, wouldn't we have an alternative option in Java for the platforms where leveldb isn't supported yet due to whatever reasons. IMO, native library would be best to be used for optimization and production for performance. For development and pure Java platform, by default pure Java approach should still be provided and used. That is to say, if no Hadoop native is used, all the functionalities should still work and not break. HDFS erasure coding goes in the way. For that, we spent much effort in developing an ISA-L compatible erasure coder in pure Java that's used by default, though for performance the ISA-L native one is recommended in production deployment. Regards, Kai -Original Message- From: Allen Wittenauer [mailto:a...@effectivemachines.com] Sent: Saturday, July 23, 2016 8:16 AM To: Sangjin Lee Cc: Sean Busbey ; common-...@hadoop.apache.org; yarn-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org Subject: Re: [DISCUSS] The order of classpath isolation work and updating/shading dependencies on trunk But if I don't use ApplicationClassLoader, my java app is basically screwed then, right? Also: right now, the non-Linux and/or non-x86 platforms have to supply their own leveldbjni jar (or at least the C level library?) in order to make YARN even functional. How is that going to work with the class path manipulation? > On Jul 22, 2016, at 9:57 AM, Sangjin Lee wrote: > > The work on HADOOP-13070 and the ApplicationClassLoader are generic and go > beyond YARN. It can be used in any JVM that uses hadoop. The current use > cases are MR containers, hadoop's RunJar (as in "hadoop jar"), and the YARN > node manager auxiliary services. I'm not sure if that's what you were asking, > but I hope it helps. > > Regards, > Sangjin > > On Fri, Jul 22, 2016 at 9:16 AM, Sean Busbey wrote: > My work on HADOOP-11804 *only* helps processes that sit outside of > YARN. :) > > On Fri, Jul 22, 2016 at 10:48 AM, Allen Wittenauer > wrote: > > > > Does any of this work actually help processes that sit outside of YARN? > > > >> On Jul 21, 2016, at 12:29 PM, Sean Busbey wrote: > >> > >> thanks for bringing this up! big +1 on upgrading dependencies for 3.0. > >> > >> I have an updated patch for HADOOP-11804 ready to post this week. > >> I've been updating HBase's master branch to try to make use of it, > >> but could use some other reviews. > >> > >> On Thu, Jul 21, 2016 at 4:30 AM, Tsuyoshi Ozawa wrote: > >>> Hi developers, > >>> > >>> I'd like to discuss how to make an advance towards dependency > >>> management in Apache Hadoop trunk code since there has been lots > >>> work about updating dependencies in parallel. Summarizing recent > >>> works and activities as follows: > >>> > >>> 0) Currently, we have merged minimum update dependencies for > >>> making Hadoop JDK-8 compatible(compilable and runnable on JDK-8). > >>> 1) After that, some people suggest that we should update the other > >>> dependencies on trunk(e.g. protobuf, netty, jackthon etc.). > >>> 2) In parallel, Sangjin and Sean are working on classpath isolation: > >>> HADOOP-13070, HADOOP-11804 and HADOOP-11656. > >>> > >>> Main problems we try to solve in the activities above is as follows: > >>> > >>> * 1) tries to solve dependency hell between user-level jar and > >>> system(Hadoop)-level jar. > >>> * 2) tries to solve updating old libraries. > >>> > >>> IIUC, 1) and 2) looks not related, but it's related in fact. 2) > >>> tries to separate class loader between client-side dependencies > >>> and server-side dependencies in Hadoop, so we can the change > >>> policy of updating libraries after doing 2). We can also decide > >>> which libraries can be shaded after 2). > >>> > >>> Hence, IMHO, a straight way we should go to is doing 2 at first. > >>> After that, we can update both client-side and server-side > >>> dependencies based on new policy(maybe we should discuss what kind > >>> of incompatibility is acceptable, and the others are not). > >>> > >>> Thoughts? > >>> > >>> Thanks, > >>> - Tsuyoshi > >>> > >>> -- > >>> --- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > >>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >>> > >> > >> > >> > >> -- > >> busbey > >> > >> --- > >> -- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >> > > > > > > > > - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > > > > > > -- > busbey > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apach
RE: Looking to a Hadoop 3 release
Thanks Andrew for driving this. Wonder if it's a good chance for HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's not an incompatible change, but feel better to be done in the major release. Regards, Kai -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Friday, February 19, 2016 7:04 AM To: hdfs-dev@hadoop.apache.org; Kihwal Lee Cc: mapreduce-...@hadoop.apache.org; common-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release Hi Kihwal, I think there's still value in continuing the 2.x releases. 3.x comes with the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta or GA for some number of months. In the meanwhile, it'd be good to keep putting out regular, stable 2.x releases. Best, Andrew On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee wrote: > Moving Hadoop 3 forward sounds fine. If EC is one of the main > motivations, are we getting rid of branch-2.8? > > Kihwal > > From: Andrew Wang > To: "common-...@hadoop.apache.org" > Cc: "yarn-...@hadoop.apache.org" ; " > mapreduce-...@hadoop.apache.org" ; > hdfs-dev > Sent: Thursday, February 18, 2016 4:35 PM > Subject: Re: Looking to a Hadoop 3 release > > Hi all, > > Reviving this thread. I've seen renewed interest in a trunk release > since HDFS erasure coding has not yet made it to branch-2. Along with > JDK8, the shell script rewrite, and many other improvements, I think > it's time to revisit Hadoop 3.0 release plans. > > My overall plan is still the same as in my original email: a series of > regular alpha releases leading up to beta and GA. Alpha releases make > it easier for downstreams to integrate with our code, and making them > regular means features can be included when they are ready. > > I know there are some incompatible changes waiting in the wings (i.e. > HDFS-6984 making FileStatus a PB rather than Writable, some of > HADOOP-9991 bumping dependency versions) that would be good to get in. > If you have changes like this, please set the target version to 3.0.0 > and mark them "Incompatible". We can use this JIRA query to track: > > > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2 > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20% > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority > > There's some release-related stuff that needs to be sorted out > (namely, the new CHANGES.txt and release note generation from Yetus), > but I'd tentatively like to roll the first alpha a month out, so third > week of March. > > Best, > Andrew > > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata wrote: > > > Avoiding the use of JDK8 language features (and, presumably, APIs) > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK > > source version to JDK8. > > > > Also, note that releasing from trunk is a way of achieving #3, it's > > not a way of abandoning it. > > > > > > > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang > > > > wrote: > > > Hi Raymie, > > > > > > Konst proposed just releasing off of trunk rather than cutting a > > branch-2, > > > and there was general agreement there. So, consider #3 abandoned. > > > 1&2 > can > > > be achieved at the same time, we just need to avoid using JDK8 > > > language features in trunk so things can be backported. > > > > > > Best, > > > Andrew > > > > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata > > > > > wrote: > > > > > >> In this (and the related threads), I see the following three > > requirements: > > >> > > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support). > > >> > > >> 2. "We'll still be releasing 2.x releases for a while, with > > >> similar feature sets as 3.x." > > >> > > >> 3. Avoid the "risk of split-brain behavior" by "minimize > > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is already > > >> tedious. > > >> Adding a branch-3, branch-3.x would be obnoxious." > > >> > > >> These three cannot be achieved at the same time. Which do we abandon? > > >> > > >> > > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia > > >> > > >> wrote: > > >> > > > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth > wrote: > > >> >> > > >> >> 2) Simplification of configs - potentially separating client > > >> >> side > > >> configs > > >> >> and those used by daemons. This is another source of perpetual > > confusion > > >> >> for users. > > >> > + 1 on this. > > >> > > > >> > sanjay > > >> > > > > >
RE: Hadoop encryption module as Apache Chimera incubator project
The encryption or security thing is surely a good starting as the current focus. Considering or having other things like compression would help to determine how to vision, position and layout the new project, in Hadoop side, apache common project, or a new TLP, containing the candidate modules. Yes at the beginning, only the encryption thing. Regards, Kai -Original Message- From: Chen, Haifeng [mailto:haifeng.c...@intel.com] Sent: Thursday, February 04, 2016 10:30 AM To: hdfs-dev@hadoop.apache.org Subject: RE: Hadoop encryption module as Apache Chimera incubator project >> Let's do one step at a time. There is a clear need for common encryption, >> and let's focus on making that happen. Strongly agree. -Original Message- From: Reynold Xin [mailto:r...@databricks.com] Sent: Thursday, February 4, 2016 8:50 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Hadoop encryption module as Apache Chimera incubator project Let's do one step at a time. There is a clear need for common encryption, and let's focus on making that happen. On Wed, Feb 3, 2016 at 4:48 PM, Zheng, Kai wrote: > I thought this discussion would switch to common-dev@ now? > > >> Would it make sense to also package some of the compression > >> libraries, > and maybe some of the text processing from MapReduce? Evolving some of > this code to a common library with few/no dependencies would be > generally useful. As a subproject, it could have a broader scope that > could evolve into a viable TLP. > > Sounds like a great idea to make the potential TLP more sense!! I > thought it could be organized like in Apache common, the security, > compression and other common text related things could be organized in > different independent modules. Perhaps Hadoop conf could also be > considered. These modules could rely on some common utility module. It > can still be Hadoop background or powered, and eventually we would > have a good place for some Hadoop common codes to move into to benefit > and impact even more broad scope than Hadoop itself. > > Regards, > Kai > > -Original Message- > From: Chris Douglas [mailto:cdoug...@apache.org] > Sent: Thursday, February 04, 2016 7:26 AM > To: hdfs-dev@hadoop.apache.org > Subject: Re: Hadoop encryption module as Apache Chimera incubator > project > > I went through the repository, and now understand the reasoning that > would locate this code in Apache Commons. This isn't proposing to > extract much of the implementation and it takes none of the > integration. It's limited to interfaces to crypto libraries and > streams/configuration. It might be a reasonable fit for commons-codec, > but that's a pretty sparse library and driving the release cadence > might be more complicated. It'd be worth discussing on their lists (please > also CC common-dev@). > > Chimera would be a boutique TLP, unless we wanted to draw out more of > the integration and tooling. Is that a goal you're interested in pursuing? > There's a tension between keeping this focused and including enough > functionality to make it viable as an independent component. By way of > example, Hadoop's common project requires too many dependencies and > carries too much historical baggage for other projects to rely on. > I agree with Colin/Steve: we don't want this to grow into another > guava-like dependency that creates more work in conflicts than it > saves in implementation... > > Would it make sense to also package some of the compression libraries, > and maybe some of the text processing from MapReduce? Evolving some of > this code to a common library with few/no dependencies would be > generally useful. As a subproject, it could have a broader scope that > could evolve into a viable TLP. If the encryption libraries are the > only ones you're interested in pulling out, then Apache Commons does > seem like a better target than a separate project. -C > > > On Wed, Feb 3, 2016 at 1:49 AM, Chris Douglas wrote: > > On Wed, Feb 3, 2016 at 12:48 AM, Gangumalla, Uma > > wrote: > >>>Standing in the point of shared fundamental piece of code like > >>>this, I do think Apache Commons might be the best direction which > >>>we can try as the first effort. In this direction, we still need to > >>>work with Apache Common community for buying in and accepting the proposal. > >> Make sense. > > > > Makes sense how? > > > >> For this we should define the independent release cycles for this > >> project and it would just place under Hadoop tree if we all > >> conclude with this option at the end. > > > > Yes. > > &
RE: Hadoop encryption module as Apache Chimera incubator project
I thought this discussion would switch to common-dev@ now? >> Would it make sense to also package some of the compression libraries, and >> maybe some of the text processing from MapReduce? Evolving some of this code >> to a common library with few/no dependencies would be generally useful. As a >> subproject, it could have a broader scope that could evolve into a viable >> TLP. Sounds like a great idea to make the potential TLP more sense!! I thought it could be organized like in Apache common, the security, compression and other common text related things could be organized in different independent modules. Perhaps Hadoop conf could also be considered. These modules could rely on some common utility module. It can still be Hadoop background or powered, and eventually we would have a good place for some Hadoop common codes to move into to benefit and impact even more broad scope than Hadoop itself. Regards, Kai -Original Message- From: Chris Douglas [mailto:cdoug...@apache.org] Sent: Thursday, February 04, 2016 7:26 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Hadoop encryption module as Apache Chimera incubator project I went through the repository, and now understand the reasoning that would locate this code in Apache Commons. This isn't proposing to extract much of the implementation and it takes none of the integration. It's limited to interfaces to crypto libraries and streams/configuration. It might be a reasonable fit for commons-codec, but that's a pretty sparse library and driving the release cadence might be more complicated. It'd be worth discussing on their lists (please also CC common-dev@). Chimera would be a boutique TLP, unless we wanted to draw out more of the integration and tooling. Is that a goal you're interested in pursuing? There's a tension between keeping this focused and including enough functionality to make it viable as an independent component. By way of example, Hadoop's common project requires too many dependencies and carries too much historical baggage for other projects to rely on. I agree with Colin/Steve: we don't want this to grow into another guava-like dependency that creates more work in conflicts than it saves in implementation... Would it make sense to also package some of the compression libraries, and maybe some of the text processing from MapReduce? Evolving some of this code to a common library with few/no dependencies would be generally useful. As a subproject, it could have a broader scope that could evolve into a viable TLP. If the encryption libraries are the only ones you're interested in pulling out, then Apache Commons does seem like a better target than a separate project. -C On Wed, Feb 3, 2016 at 1:49 AM, Chris Douglas wrote: > On Wed, Feb 3, 2016 at 12:48 AM, Gangumalla, Uma > wrote: >>>Standing in the point of shared fundamental piece of code like this, >>>I do think Apache Commons might be the best direction which we can >>>try as the first effort. In this direction, we still need to work >>>with Apache Common community for buying in and accepting the proposal. >> Make sense. > > Makes sense how? > >> For this we should define the independent release cycles for this >> project and it would just place under Hadoop tree if we all conclude >> with this option at the end. > > Yes. > >> [Chris] >>>If Chimera is not successful as an independent project or stalls, >>>Hadoop and/or Spark and/or $project will have to reabsorb it as >>>maintainers. >>> >> I am not so strong on this point. If we assume project would be >> unsuccessful, it can be unsuccessful(less maintained) even under hadoop. >> But if other projects depending on this piece then they would get >> less support. Of course right now we feel this piece of code is very >> important and we feel(expect) it can be successful as independent >> project, irrespective of whether it as separate project outside hadoop or >> inside. >> So, I feel this point would not really influence to judge the discussion. > > Sure; code can idle anywhere, but that wasn't the point I was after. > You propose to extract code from Hadoop, but if Chimera fails then > what recourse do we have among the other projects taking a dependency > on it? Splitting off another project is feasible, but Chimera should > be sustainable before this PMC can divest itself of responsibility for > security libraries. That's a pretty low bar. > > Bundling the library with the jar is helpful; I've used that before. > It should prefer (updated) libraries from the environment, if > configured. Otherwise it's a pain (or impossible) for ops to patch > security bugs. -C > >>>-Original Message- >>>From: Colin P. McCabe [mailto:cmcc...@apache.org] >>>Sent: Wednesday, February 3, 2016 4:56 AM >>>To: hdfs-dev@hadoop.apache.org >>>Subject: Re: Hadoop encryption module as Apache Chimera incubator >>>project >>> >>>It's great to see interest in improving this functionality. I t
RE: Hadoop encryption module as Apache Chimera incubator project
Give the forgotten reference: [1] https://github.com/apache/directory-kerby -Original Message- From: Zheng, Kai [mailto:kai.zh...@intel.com] Sent: Friday, January 29, 2016 9:10 AM To: hdfs-dev@hadoop.apache.org Subject: RE: Hadoop encryption module as Apache Chimera incubator project Sounds good to have further discussions. Mind I have some questions, thanks. @Haifeng: Thanks Uma & Haifeng for your answers about how to scope and vision Chimera. It sounds good to me. So I guess we would prefer to use a generic project name like Chimera to make the project not tightly coupled with the AES encryption things? Would this new project also consider even more general efforts currently not in the big data scope yet? I mean, in ASF, there are various security related projects and many projects that relate to or heavily use security things. Looks like Chimera can focus on and provide high performance security libraries and facilities, it would be good if these projects can also benefit from Chimera as well as Hadoop and Spark. If Chimera would reside in Hadoop, I'm personally wish it could be independent from the main part in codebase and dependency relationships. That means, if other security project would like to use Chimera, then it won't have to rely on Hadoop modules, like in hadoop-common. Otherwise, it will make some messy because in some time future, Hadoop may leverage these security projects to enhance security. For example, Apache Kerby[1] I'd like to mention, it provides almost full Kerberos encryption types compatible with MIT KDC but the underlying encryption ciphers are mainly in JRE, which can be optimized using Chimera. I understand Hadoop specific security issues should go to security@hadoop. How about the general ones, I know there is a mailing list in ASF for security things. This project may support other platforms like Windows. Will Chimera bundle native libraries like OpenSSL in the JAR? As I went through the discussions in HADOOP-11127 as guided by Chris, looks like a challenge thing would be to how to build and bundle various native libraries for the supported platforms with versioning in mind, wherever it's hosted. Maybe another option, have Chimera as a separate project as Yetus? It can still be managed by the committee. :) Thanks for your answers. Regards, Kai -Original Message- From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] Sent: Thursday, January 28, 2016 4:08 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Hadoop encryption module as Apache Chimera incubator project Thanks for the inputs Owen. On 1/27/16, 11:31 AM, "Owen O'Malley" wrote: >On Wed, Jan 27, 2016 at 9:59 AM, Gangumalla, Uma > >wrote: > >> I think Chimera goal is to enhance even for other use cases. > > >Naturally. > > >> For Hadoop, CTR mode should be enough today, > > >This isn't true. Hadoop should use better encryption for RPC and >shuffle, both of which should not use CTR. || Yes, I said later Hadoop could use other options too. > > >> I think separate module and >> independent release is good idea but I am not so strong on the point >> to keep under Hadoop. > > >I believe encryption is becoming a core part of Hadoop. I think that >moving core components out of Hadoop is bad from a project management >perspective. >To put it another way, a bug in the encryption routines will likely >become a security problem that security@hadoop needs to hear about. I >don't think adding a separate project in the middle of that >communication chain is a good idea. The same applies to data corruption >problems, and so on... || I agree on security related discussion we have separate one. Thanks || for this point. > > >> It may be good to keep at generalized place(As in the discussion, we >> thought that place could be Apache Commons). > > >Apache Commons is a collection of *Java* projects, so Chimera as a >JNI-based library isn't a natural fit. Furthermore, Apache Commons >doesn't have its own security list so problems will go to the generic >secur...@apache.org. ||I see some projects including native stuff too. Example: Commons-daemon. ||But, yeah I noticed now Apache commons proper is indicating that for reusable Java sources. > >Why do you think that Apache Commons is a better home than Hadoop? > >.. Owen @ATM, Andrew, Chris, Yi do you want to comment on this proposal? Regards, Uma
RE: Hadoop encryption module as Apache Chimera incubator project
Sounds good to have further discussions. Mind I have some questions, thanks. @Haifeng: Thanks Uma & Haifeng for your answers about how to scope and vision Chimera. It sounds good to me. So I guess we would prefer to use a generic project name like Chimera to make the project not tightly coupled with the AES encryption things? Would this new project also consider even more general efforts currently not in the big data scope yet? I mean, in ASF, there are various security related projects and many projects that relate to or heavily use security things. Looks like Chimera can focus on and provide high performance security libraries and facilities, it would be good if these projects can also benefit from Chimera as well as Hadoop and Spark. If Chimera would reside in Hadoop, I'm personally wish it could be independent from the main part in codebase and dependency relationships. That means, if other security project would like to use Chimera, then it won't have to rely on Hadoop modules, like in hadoop-common. Otherwise, it will make some messy because in some time future, Hadoop may leverage these security projects to enhance security. For example, Apache Kerby[1] I'd like to mention, it provides almost full Kerberos encryption types compatible with MIT KDC but the underlying encryption ciphers are mainly in JRE, which can be optimized using Chimera. I understand Hadoop specific security issues should go to security@hadoop. How about the general ones, I know there is a mailing list in ASF for security things. This project may support other platforms like Windows. Will Chimera bundle native libraries like OpenSSL in the JAR? As I went through the discussions in HADOOP-11127 as guided by Chris, looks like a challenge thing would be to how to build and bundle various native libraries for the supported platforms with versioning in mind, wherever it's hosted. Maybe another option, have Chimera as a separate project as Yetus? It can still be managed by the committee. :) Thanks for your answers. Regards, Kai -Original Message- From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] Sent: Thursday, January 28, 2016 4:08 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Hadoop encryption module as Apache Chimera incubator project Thanks for the inputs Owen. On 1/27/16, 11:31 AM, "Owen O'Malley" wrote: >On Wed, Jan 27, 2016 at 9:59 AM, Gangumalla, Uma > >wrote: > >> I think Chimera goal is to enhance even for other use cases. > > >Naturally. > > >> For Hadoop, CTR mode should be enough today, > > >This isn't true. Hadoop should use better encryption for RPC and >shuffle, both of which should not use CTR. || Yes, I said later Hadoop could use other options too. > > >> I think separate module and >> independent release is good idea but I am not so strong on the point >> to keep under Hadoop. > > >I believe encryption is becoming a core part of Hadoop. I think that >moving core components out of Hadoop is bad from a project management >perspective. >To put it another way, a bug in the encryption routines will likely >become a security problem that security@hadoop needs to hear about. I >don't think adding a separate project in the middle of that >communication chain is a good idea. The same applies to data corruption >problems, and so on... || I agree on security related discussion we have separate one. Thanks || for this point. > > >> It may be good to keep at generalized place(As in the discussion, we >> thought that place could be Apache Commons). > > >Apache Commons is a collection of *Java* projects, so Chimera as a >JNI-based library isn't a natural fit. Furthermore, Apache Commons >doesn't have its own security list so problems will go to the generic >secur...@apache.org. ||I see some projects including native stuff too. Example: Commons-daemon. ||But, yeah I noticed now Apache commons proper is indicating that for reusable Java sources. > >Why do you think that Apache Commons is a better home than Hadoop? > >.. Owen @ATM, Andrew, Chris, Yi do you want to comment on this proposal? Regards, Uma
RE: Hadoop encryption module as Apache Chimera incubator project
Thanks Chris for the pointer and Uma for the confirm! I'm happy to know HADOOP-11127 and there are already so many solid discussions in it. I will go through it, make my investigation and see how I can help in the effort. Sure let's go back to Chimera and sorry fo the interrupt. Regards, Kai -Original Message- From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] Sent: Friday, January 22, 2016 8:38 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Hadoop encryption module as Apache Chimera incubator project >Uma and everyone, thank you for the proposal. +1 to proceed. Thanks Chris for your feedback. Kai Wrote: I believe Haifeng had mentioned the problem in a call when discussing erasure coding work, but until now I got to understand what's the problem and how Chimera or Snappy Java solved it. It looks like there can be some thin clients that don't rely on Hadoop installation so no libhadoop.so is available to use on the client host. The approach mentioned here is to bundle the library file (*.so) into a jar and dynamically extract the file when loading it. When no library file is contained in the jar then it goes to the normal case, loading it from an installation. It's smart and nice! My question is, could we consider to adopt the approach for libhadoop.so library? It might be worth to discuss because, we're bundling more and more things into the library (recently we just put Intel ISA-L support into it), and such things may be desired for such clients. It may also be helpful for development, because sometimes when run unit tests that involve native codes, some error may happen and complain no place to find libhadoop.so. Thanks. [UMA] Good points Kai. It is good to think and invest some efforts to solve libhadoop.so part. As Chris suggested taking this discussion into that JIRA HADOOP-11127 is more appropriate thing to do. Regards, Uma On 1/21/16, 12:18 PM, "Chris Nauroth" wrote: >> My question is, could we consider to adopt the approach for >>libhadoop.so library? > > >This is something that I have proposed already in HADOOP-11127. There >is not consensus on proceeding with it from the contributors in that >discussion. There are some big challenges around how it would impact >the release process. I also have not had availability to prototype an >implementation to make a stronger case for feasibility. Kai, if this >is something that you're interested in, then I encourage you to join >the discussion in HADOOP-11127 or even pick up prototyping work if you'd like. > Since we have that existing JIRA, let's keep this mail thread focused >just on Chimera. Thank you! > >Uma and everyone, thank you for the proposal. +1 to proceed. > >--Chris Nauroth > > > > >On 1/20/16, 11:16 PM, "Zheng, Kai" wrote: > >>Thanks Uma. >> >>I have a question by the way, it's not about Chimera project, but >>about the mentioned advantage 1 and libhadoop.so installation problem. >>I copied the saying as below for convenience. >> >>>>1. As Chimera embedded the native in jar (similar to Snappy java), >>>>it solves the current issues in Hadoop that a HDFS client has to >>>>depend libhadoop.so if the client needs to read encryption zone in >>>>HDFS. This means a HDFS client may has to depend a Hadoop >>>>installation in local machine. For example, HBase uses depends on >>>>HDFS client jar other than a Hadoop installation and then has no >>>>access to libhadoop.so. So HBase cannot use an encryption zone or it cause >>>>error. >> >>I believe Haifeng had mentioned the problem in a call when discussing >>erasure coding work, but until now I got to understand what's the >>problem and how Chimera or Snappy Java solved it. It looks like there >>can be some thin clients that don't rely on Hadoop installation so no >>libhadoop.so is available to use on the client host. The approach >>mentioned here is to bundle the library file (*.so) into a jar and >>dynamically extract the file when loading it. When no library file is >>contained in the jar then it goes to the normal case, loading it from >>an installation. It's smart and nice! My question is, could we >>consider to adopt the approach for libhadoop.so library? It might be >>worth to discuss because, we're bundling more and more things into the >>library (recently we just put Intel ISA-L support into it), and such >>things may be desired for such clients. It may also be helpful for >>development, because sometimes when run unit tests that involve native >>codes, some error may happen and complain no place to find libhadoop.so
RE: Hadoop encryption module as Apache Chimera incubator project
Thanks Uma. I have a question by the way, it's not about Chimera project, but about the mentioned advantage 1 and libhadoop.so installation problem. I copied the saying as below for convenience. >>1. As Chimera embedded the native in jar (similar to Snappy java), it solves >>the current issues in Hadoop that a HDFS client has to depend libhadoop.so if >>the client needs to read encryption zone in HDFS. This means a HDFS client >>may has to depend a Hadoop installation in local machine. For example, HBase >>uses depends on HDFS client jar other than a Hadoop installation and then has >>no access to libhadoop.so. So HBase cannot use an encryption zone or it cause >>error. I believe Haifeng had mentioned the problem in a call when discussing erasure coding work, but until now I got to understand what's the problem and how Chimera or Snappy Java solved it. It looks like there can be some thin clients that don't rely on Hadoop installation so no libhadoop.so is available to use on the client host. The approach mentioned here is to bundle the library file (*.so) into a jar and dynamically extract the file when loading it. When no library file is contained in the jar then it goes to the normal case, loading it from an installation. It's smart and nice! My question is, could we consider to adopt the approach for libhadoop.so library? It might be worth to discuss because, we're bundling more and more things into the library (recently we just put Intel ISA-L support into it), and such things may be desired for such clients. It may also be helpful for development, because sometimes when run unit tests that involve native codes, some error may happen and complain no place to find libhadoop.so. Thanks. Regards, Kai -Original Message- From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] Sent: Thursday, January 21, 2016 11:20 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Hadoop encryption module as Apache Chimera incubator project Hi All, Thanks Andrew, ATM, Yi, Kai, Larry. Thanks Haifeng on clarifying release stuff. Please find my responses below. Andrew wrote: If it becomes part of Apache Commons, could we make Chimera a separate JAR? We have real difficulties bumping dependency versions right now, so ideally we don't need to bump our existing Commons dependencies to use Chimera. [UMA] Yes, We plan to make separate Jar. Andrew wrote: With this refactoring, do we have confidence that we can get our desired changes merged and released in a timely fashion? e.g. if we find another bug like HADOOP-11343, we'll first need to get the fix into Chimera, have a new Chimera release, then bump Hadoop's Chimera dependency. This also relates to the previous point, it's easier to do this dependency bump if Chimera is a separate JAR. [UMA] Yes and the main target users for this project is Hadoop and Spark right now. So, Hadoop requirements would be the priority tasks for it. ATM wrote: Uma, would you be up for approaching the Apache Commons folks saying that you'd like to contribute Chimera? I'd recommend saying that Hadoop and Spark are both on board to depend on this. [UMA] Yes, will do that. Kai wrote: Just a question. Becoming a separate jar/module in Apache Commons means Chimera or the module can be released separately or in a timely manner, not coupling with other modules for release in the project? Thanks. [Haifeng] From apache commons project web (https://commons.apache.org/), we see there is already a long list of components in its Apache Commons Proper list. Each component has its own release version and date. To join and be one of the list is the target. Larry wrote: If what we are looking for is some level of autonomy then it would need to be a module with its own release train - or at least be able to. [UMA] Yes. Agree Kai wrote: So far I saw it's mainly about AES-256. I suggest the scope can be expanded a little bit, perhaps a dedicated high performance encryption library, then we would have quite much to contribute to it, like other ciphers, MACs, PRNGs and so on. Then both Hadoop and Spark can benefit from it. [UMA] Yes, once development started as separate project then its free to evolve and provide more improvements to support more customer/user space for encryption based on demand. Haifeng, would you add some points here? Regards, Uma On 1/20/16, 4:31 PM, "Andrew Wang" wrote: >Thanks Uma for putting together this proposal. Overall sounds good to >me, >+1 for these improvements. A few comments/questions: > >* If it becomes part of Apache Commons, could we make Chimera a >separate JAR? We have real difficulties bumping dependency versions >right now, so ideally we don't need to bump our existing Commons >dependencies to use Chimera. >* With this refactoring, do we have confidence that we can get our >desired changes merged and released in a timely fashion? e.g. if we >find another bug like HADOOP-11343, we'll first need to get the f
RE: Hadoop encryption module as Apache Chimera incubator project
Thanks Haifeng for the clarifying! I thought it addressed well my question and the concern. I see the features list in the Chimera project site and it looks great. Do we have any description about the project initiative, goal, position or like that? So far I saw it's mainly about AES-256. I suggest the scope can be expanded a little bit, perhaps a dedicated high performance encryption library, then we would have quite much to contribute to it, like other ciphers, MACs, PRNGs and so on. Then both Hadoop and Spark can benefit from it. Regards, Kai -Original Message- From: Chen, Haifeng [mailto:haifeng.c...@intel.com] Sent: Thursday, January 21, 2016 10:53 AM To: hdfs-dev@hadoop.apache.org Subject: RE: Hadoop encryption module as Apache Chimera incubator project Agree that if making Chimera part of Apache commons is the desire, it would be better to be a standalone component in commons with its own release traces. >From apache commons project web (https://commons.apache.org/), we see there is >already a long list of components in its Apache Commons Proper list. Each >component has its own release version and date. To join and be one of the list >is the target. Regards, Haifeng -Original Message- From: Larry McCay [mailto:lmc...@hortonworks.com] Sent: Thursday, January 21, 2016 10:43 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Hadoop encryption module as Apache Chimera incubator project That's a good point, Kai. If what we are looking for is some level of autonomy then it would need to be a module with its own release train - or at least be able to. On Jan 20, 2016, at 9:18 PM, Zheng, Kai wrote: > Just a question. Becoming a separate jar/module in Apache Commons means > Chimera or the module can be released separately or in a timely manner, not > coupling with other modules for release in the project? Thanks. > > Regards, > Kai > > -Original Message- > From: Aaron T. Myers [mailto:a...@cloudera.com] > Sent: Thursday, January 21, 2016 9:44 AM > To: hdfs-dev@hadoop.apache.org > Subject: Re: Hadoop encryption module as Apache Chimera incubator > project > > +1 for Hadoop depending upon Chimera, assuming Chimera can get > hosted/released under some Apache project umbrella. If that's Apache Commons > (which makes a lot of sense to me) then I'm also a big +1 on Andrew's > suggestion that we make it a separate module. > > Uma, would you be up for approaching the Apache Commons folks saying that > you'd like to contribute Chimera? I'd recommend saying that Hadoop and Spark > are both on board to depend on this. > > -- > Aaron T. Myers > Software Engineer, Cloudera > > On Wed, Jan 20, 2016 at 4:31 PM, Andrew Wang > > wrote: > >> Thanks Uma for putting together this proposal. Overall sounds good to >> me, >> +1 for these improvements. A few comments/questions: >> >> * If it becomes part of Apache Commons, could we make Chimera a >> separate JAR? We have real difficulties bumping dependency versions >> right now, so ideally we don't need to bump our existing Commons >> dependencies to use Chimera. >> * With this refactoring, do we have confidence that we can get our >> desired changes merged and released in a timely fashion? e.g. if we >> find another bug like HADOOP-11343, we'll first need to get the fix >> into Chimera, have a new Chimera release, then bump Hadoop's Chimera >> dependency. This also relates to the previous point, it's easier to >> do this dependency bump if Chimera is a separate JAR. >> >> Best, >> Andrew >> >> On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma < >> uma.ganguma...@intel.com> >> wrote: >> >>> Hi Devs, >>> >>> Some of our Hadoop developers working with Spark community to >>> implement the shuffle encryption. While implementing that, they >>> realized some/most >> of >>> the code in Hadoop encryption code and their implemention code have >>> to >> be >>> duplicated. This leads to an idea to create separate library, named >>> it as Chimera (https://github.com/intel-hadoop/chimera). It is an >>> optimized cryptographic library. It provides Java API for both >>> cipher level and >> Java >>> stream level to help developers implement high performance AES >>> encryption/decryption with the minimum code and effort. Chimera was >>> originally based Hadoop crypto code but was improved and generalized >>> a >> lot >>> for supporting wider scope of data encryption needs for more >>> components >> in >>> the community. >>>
RE: Hadoop encryption module as Apache Chimera incubator project
Just a question. Becoming a separate jar/module in Apache Commons means Chimera or the module can be released separately or in a timely manner, not coupling with other modules for release in the project? Thanks. Regards, Kai -Original Message- From: Aaron T. Myers [mailto:a...@cloudera.com] Sent: Thursday, January 21, 2016 9:44 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Hadoop encryption module as Apache Chimera incubator project +1 for Hadoop depending upon Chimera, assuming Chimera can get hosted/released under some Apache project umbrella. If that's Apache Commons (which makes a lot of sense to me) then I'm also a big +1 on Andrew's suggestion that we make it a separate module. Uma, would you be up for approaching the Apache Commons folks saying that you'd like to contribute Chimera? I'd recommend saying that Hadoop and Spark are both on board to depend on this. -- Aaron T. Myers Software Engineer, Cloudera On Wed, Jan 20, 2016 at 4:31 PM, Andrew Wang wrote: > Thanks Uma for putting together this proposal. Overall sounds good to > me, > +1 for these improvements. A few comments/questions: > > * If it becomes part of Apache Commons, could we make Chimera a > separate JAR? We have real difficulties bumping dependency versions > right now, so ideally we don't need to bump our existing Commons > dependencies to use Chimera. > * With this refactoring, do we have confidence that we can get our > desired changes merged and released in a timely fashion? e.g. if we > find another bug like HADOOP-11343, we'll first need to get the fix > into Chimera, have a new Chimera release, then bump Hadoop's Chimera > dependency. This also relates to the previous point, it's easier to do > this dependency bump if Chimera is a separate JAR. > > Best, > Andrew > > On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma < > uma.ganguma...@intel.com> > wrote: > > > Hi Devs, > > > > Some of our Hadoop developers working with Spark community to > > implement the shuffle encryption. While implementing that, they > > realized some/most > of > > the code in Hadoop encryption code and their implemention code have > > to > be > > duplicated. This leads to an idea to create separate library, named > > it as Chimera (https://github.com/intel-hadoop/chimera). It is an > > optimized cryptographic library. It provides Java API for both > > cipher level and > Java > > stream level to help developers implement high performance AES > > encryption/decryption with the minimum code and effort. Chimera was > > originally based Hadoop crypto code but was improved and generalized > > a > lot > > for supporting wider scope of data encryption needs for more > > components > in > > the community. > > > > So, now team is thinking to make this library code as open source > > project via Apache Incubation. Proposal is Chimera to join the > > Apache as incubating or Apache commons for facilitating its adoption. > > > > In general this will get the following advantages: > > 1. As Chimera embedded the native in jar (similar to Snappy java), > > it solves the current issues in Hadoop that a HDFS client has to > > depend libhadoop.so if the client needs to read encryption zone in > > HDFS. This means a HDFS client may has to depend a Hadoop > > installation in local machine. For example, HBase uses depends on > > HDFS client jar other than a Hadoop installation and then has no > > access to libhadoop.so. So HBase > cannot > > use an encryption zone or it cause error. > > 2. Apache Spark shuffle and spill encryption could be another > > example where we can use Chimera. We see the fact that the stream > > encryption for Spark shuffle and spill doesn’t require a stream > > cipher like AES/CTR, although the code shares the common > > characteristics of a stream style > API. > > We also see the need of optimized Cipher for non-stream style use > > cases such as network encryption such as RPC. These improvements > > actually can > be > > shared by more projects of need. > > > > 3. Simplified code in Hadoop to use dedicated library. And drives > > more improvements. For example, current the Hadoop crypto code API > > is totally based on AES/CTR although it has cipher suite configurations. > > > > AES/CTR is for HDFS data encryption at rest, but it doesn’t > > necessary to be AES/CTR for all the cases such as Data transfer > > encryption and intermediate file encryption. > > > > > > > > So, we wanted to check with Hadoop community about this proposal. > > Please provide your feedbacks on it. > > > > Regards, > > Uma > > >
RE: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk]
Thanks Andrew for pointing this. It sounds good. Yes we have umbrella JIRAs for the follow-on tasks, HDFS-8031 for the HDFS side, and HADOOP-11842 for the HADOOP side. -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, November 03, 2015 8:49 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk] If we use an umbrella JIRA to categorize all the ongoing EC work, that will let us more easily change the target version later. For instance, if we decide to bump Phase II out of 2.9, then we just need to change the target version of the Phase II umbrella rather than all the subtasks. On Mon, Nov 2, 2015 at 4:26 PM, Zheng, Kai wrote: > Yeah, so for the issues we recently resolved on trunk and are > addressing as follow-on tasks in Phase I, we would label them with "erasure > coding" > and maybe also set the target version as "2.9" for the convenience? > > -Original Message- > From: Jing Zhao [mailto:ji...@apache.org] > Sent: Tuesday, November 03, 2015 8:04 AM > To: hdfs-dev@hadoop.apache.org > Subject: Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge > HDFS-7285 (erasure coding) branch to trunk] > > +1 for the plan about Phase I & II. > > BTW, maybe out of the scope of this thread, just want to mention we > should either move the jira under HDFS-8031 or update the jira > component as "erasure-coding" when making further improvement or > fixing bugs in EC. In this way it will be easier for later backporting EC to > 2.9. > > On Mon, Nov 2, 2015 at 3:48 PM, Vinayakumar B < > vinayakumarb.apa...@gmail.com > > wrote: > > > +1 for the idea. > > On Nov 3, 2015 07:22, "Zheng, Kai" wrote: > > > > > Sounds good to me. When it's determined to include EC in 2.9 > > > release, it may be good to have a rough release date as Zhe asked, > > > so accordingly the scope of EC can be discussed out. We still have > > > quite a few of things as Phase I follow-on tasks to do before EC > > > can be deployed in a production system. Phase II to develop > > > non-striping EC for cold data would possibly > > be > > > started after that. We might consider to include only Phase I and > > > leave Phase II for next release according to the rough release date. > > > > > > Regards, > > > Kai > > > > > > -Original Message- > > > From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] > > > Sent: Tuesday, November 03, 2015 5:41 AM > > > To: hdfs-dev@hadoop.apache.org > > > Subject: Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge > > > HDFS-7285 (erasure coding) branch to trunk] > > > > > > +1 for EC to go into 2.9. Yes, 3.x would be long way to go when we > > > +plan to > > > have 2.8 and 2.9 releases. > > > > > > Regards, > > > Uma > > > > > > On 11/2/15, 11:49 AM, "Vinod Vavilapalli" > > > > > wrote: > > > > > > >Forking the thread. Started looking at the 2.8 list, various > > > >features¹ status and arrived here. > > > > > > > >While I understand the pervasive nature of EC and a need for a > > > >significant bake-in, moving this to a 3.x release is not a good idea. > > > >We will surely get a 2.8 out this year and, as needed, I can even > > > >spend time getting started on a 2.9. OTOH, 3.x is long ways off, > > > >and given all the incompatibilities there, it would be a while > > > >before users can get their hands on EC if it were to be only on > > > >3.x. At best, this may force sites that want EC to backport the > > > >entire EC feature to older releases, at worst this will be repeat > > > >the mess of 0.20 security release > > > forks. > > > > > > > >If we think adding this to 2.8 (even if it switched off) is too > > > >much risk per our original plan, let¹s move this to 2.9, there by > > > >leaving enough time for stability, integration testing and > > > >bake-in, and a realistic chance of having it end up on users¹ clusters > > > >soonish. > > > > > > > >+Vinod > > > > > > > >> On Oct 19, 2015, at 1:44 PM, Andrew Wang > > > >> > > > >>wrote: > > > >> > > > >> I think our plan thus far has been to target this for 3.0. I'm > > > >>okay with putting it in branch-2
RE: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk]
Yeah, so for the issues we recently resolved on trunk and are addressing as follow-on tasks in Phase I, we would label them with "erasure coding" and maybe also set the target version as "2.9" for the convenience? -Original Message- From: Jing Zhao [mailto:ji...@apache.org] Sent: Tuesday, November 03, 2015 8:04 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk] +1 for the plan about Phase I & II. BTW, maybe out of the scope of this thread, just want to mention we should either move the jira under HDFS-8031 or update the jira component as "erasure-coding" when making further improvement or fixing bugs in EC. In this way it will be easier for later backporting EC to 2.9. On Mon, Nov 2, 2015 at 3:48 PM, Vinayakumar B wrote: > +1 for the idea. > On Nov 3, 2015 07:22, "Zheng, Kai" wrote: > > > Sounds good to me. When it's determined to include EC in 2.9 > > release, it may be good to have a rough release date as Zhe asked, > > so accordingly the scope of EC can be discussed out. We still have > > quite a few of things as Phase I follow-on tasks to do before EC can > > be deployed in a production system. Phase II to develop non-striping > > EC for cold data would possibly > be > > started after that. We might consider to include only Phase I and > > leave Phase II for next release according to the rough release date. > > > > Regards, > > Kai > > > > -Original Message- > > From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] > > Sent: Tuesday, November 03, 2015 5:41 AM > > To: hdfs-dev@hadoop.apache.org > > Subject: Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge > > HDFS-7285 (erasure coding) branch to trunk] > > > > +1 for EC to go into 2.9. Yes, 3.x would be long way to go when we > > +plan to > > have 2.8 and 2.9 releases. > > > > Regards, > > Uma > > > > On 11/2/15, 11:49 AM, "Vinod Vavilapalli" > wrote: > > > > >Forking the thread. Started looking at the 2.8 list, various > > >features¹ status and arrived here. > > > > > >While I understand the pervasive nature of EC and a need for a > > >significant bake-in, moving this to a 3.x release is not a good idea. > > >We will surely get a 2.8 out this year and, as needed, I can even > > >spend time getting started on a 2.9. OTOH, 3.x is long ways off, > > >and given all the incompatibilities there, it would be a while > > >before users can get their hands on EC if it were to be only on > > >3.x. At best, this may force sites that want EC to backport the > > >entire EC feature to older releases, at worst this will be repeat > > >the mess of 0.20 security release > > forks. > > > > > >If we think adding this to 2.8 (even if it switched off) is too > > >much risk per our original plan, let¹s move this to 2.9, there by > > >leaving enough time for stability, integration testing and bake-in, > > >and a realistic chance of having it end up on users¹ clusters soonish. > > > > > >+Vinod > > > > > >> On Oct 19, 2015, at 1:44 PM, Andrew Wang > > >> > > >>wrote: > > >> > > >> I think our plan thus far has been to target this for 3.0. I'm > > >>okay with putting it in branch-2 if we've given a hard look at > > >>compatibility, but I'll note though that 2.8 is already looking > > >>like quite a large release, and our release bandwidth has been > > >>focused on the 2.6 and 2.7 maintenance releases. Adding another > > >>multi-hundred JIRAs to 2.8 might make it too unwieldy to get out > > >>the door. If we bump EC past that, 3.0 might very well be our > > >>next release vehicle. I do plan to revive the 3.0 schedule some > > >>time next year. With EC and > > >>JDK8 in a good spot, the only big feature remaining is classpath > > >>isolation. > > >> > > >> EC is also a pretty fundamental change to HDFS. Even if it's > > >>compatible, in terms of size and impact it might best belong in a > > >>new major release. > > >> > > >> Best, > > >> Andrew > > >> > > >> On Fri, Oct 16, 2015 at 7:04 PM, Vinayakumar B < > > >> vinayakumarb.apa...@gmail.com> wrote: > > >> > > >>> Is anyone else also thinks that feature is ready to goto > > >>>branc
RE: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk]
;>>>> feature! >>>>>>> >>>>>>> On Tue, Sep 29, 2015 at 10:44 PM, Zhe Zhang >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks everyone who has participated in this discussion. >>>>>>>> >>>>>>>> With 7 +1's (5 binding and 2 non-binding), and no -1, this vote >>> has >>>>>>> passed. >>>>>>>> I will do a final 'git merge' with trunk and work with Andrew >>>>>>>> to >>>> merge >>>>>>> the >>>>>>>> branch to trunk. I'll update on this thread when the merge is >>> done. >>>>>>>> >>>>>>>> --- >>>>>>>> Zhe Zhang >>>>>>>> >>>>>>>> On Thu, Sep 24, 2015 at 11:08 PM, Liu, Yi A >>>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>>> (Change it to binding.) >>>>>>>>> >>>>>>>>> +1 >>>>>>>>> I have been involved in the development and code review on the >>>>>>> feature >>>>>>>>> branch. It's a great feature and I think it's ready to merge >>>>>>>>> it >>>> into >>>>>>>> trunk. >>>>>>>>> >>>>>>>>> Thanks all for the contribution. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Yi Liu >>>>>>>>> >>>>>>>>> >>>>>>>>> -Original Message- >>>>>>>>> From: Liu, Yi A >>>>>>>>> Sent: Friday, September 25, 2015 1:51 PM >>>>>>>>> To: hdfs-dev@hadoop.apache.org >>>>>>>>> Subject: RE: [VOTE] Merge HDFS-7285 (erasure coding) branch to >>>> trunk >>>>>>>>> >>>>>>>>> +1 (non-binding) >>>>>>>>> I have been involved in the development and code review on the >>>>>>> feature >>>>>>>>> branch. It's a great feature and I think it's ready to merge >>>>>>>>> it >>>> into >>>>>>>> trunk. >>>>>>>>> >>>>>>>>> Thanks all for the contribution. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Yi Liu >>>>>>>>> >>>>>>>>> >>>>>>>>> -Original Message- >>>>>>>>> From: Vinayakumar B [mailto:vinayakum...@apache.org] >>>>>>>>> Sent: Friday, September 25, 2015 12:21 PM >>>>>>>>> To: hdfs-dev@hadoop.apache.org >>>>>>>>> Subject: Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to >>>> trunk >>>>>>>>> >>>>>>>>> +1, >>>>>>>>> >>>>>>>>> I've been involved starting from design and development of >>>>>>> ErasureCoding. >>>>>>>>> I think phase 1 of this development is ready to be merged to >>>> trunk. >>>>>>>>> It had come a long way to the current state with significant >>>> effort >>>>>>> of >>>>>>>>> many Contributors and Reviewers for both design and code. >>>>>>>>> >>>>>>>>> Thanks Everyone for the efforts. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vinay >>>>>>>>> >>>>>>>>> On Wed, Sep 23, 2015 at 10:53 PM, Jing Zhao >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 >>>>>>>>>> >>>>>>>>>> I've been involved in both development and review on the >>> branch, >>>>>>> and >>>>>>> I >>>>>>>>>> believe it's now ready to get merged into trunk. Many thanks >>> to >>>>>>> all >>>>&
RE: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk
Non-binding +1 According to our extensive performance tests, striping + ISA-L coder based erasure coding not only can save storage, but also can increase the throughput of a client or a cluster. It will be a great addition to HDFS and its users. Based on the latest branch codes, we also observed it's very reliable in the concurrent tests. We'll provide the perf test report after it's sorted out and hope it helps. Thanks! Regards, Kai -Original Message- From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com] Sent: Wednesday, September 23, 2015 8:50 AM To: hdfs-dev@hadoop.apache.org; common-...@hadoop.apache.org Subject: Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk +1 Great addition to HDFS. Thanks all contributors for the nice work. Regards, Uma On 9/22/15, 3:40 PM, "Zhe Zhang" wrote: >Hi, > >I'd like to propose a vote to merge the HDFS-7285 feature branch back >to trunk. Since November 2014 we have been designing and developing >this feature under the umbrella JIRAs HDFS-7285 and HADOOP-11264, and >have committed approximately 210 patches. > >The HDFS-7285 feature branch was created to support the first phase of >HDFS erasure coding (HDFS-EC). The objective of HDFS-EC is to >significantly reduce storage space usage in HDFS clusters. Instead of >always creating 3 replicas of each block with 200% storage space >overhead, HDFS-EC provides data durability through parity data blocks. >With most EC configurations, the storage overhead is no more than 50%. >Based on profiling results of production clusters, we decided to >support EC with the striped block layout in the first phase, so that >small files can be better handled. This means dividing each logical >HDFS file block into smaller units (striping cells) and spreading them >on a set of DataNodes in round-robin fashion. Parity cells are >generated for each stripe of original data cells. We have made changes >to NameNode, client, and DataNode to generalize the block concept and >handle the mapping between a logical file block and its internal >storage blocks. For further details please see the design doc on >HDFS-7285. >HADOOP-11264 focuses on providing flexible and high-performance codec >calculation support. > >The nightly Jenkins job of the branch has reported several successful >runs, and doesn't show new flaky tests compared with trunk. We have >posted several versions of the test plan including both unit testing >and cluster testing, and have executed most tests in the plan. The most >basic functionalities have been extensively tested and verified in >several real clusters with different hardware configurations; results >have been very stable. We have created follow-on tasks for more >advanced error handling and optimization under the umbrella HDFS-8031. >We also plan to implement or harden the integration of EC with existing >features such as WebHDFS, snapshot, append, truncate, hflush, hsync, >and so forth. > >Development of this feature has been a collaboration across many >companies and institutions. I'd like to thank J. Andreina, Takanobu >Asanuma, Vinayakumar B, Li Bo, Takuya Fukudome, Uma Maheswara Rao G, >Rui Li, Yi Liu, Colin McCabe, Xinwei Qin, Rakesh R, Gao Rui, Kai >Sasaki, Walter Su, Tsz Wo Nicholas Sze, Andrew Wang, Yong Zhang, Jing >Zhao, Hui Zheng and Kai Zheng for their code contributions and reviews. >Andrew and Kai Zheng also made fundamental contributions to the initial >design. Rui Li, Gao Rui, Kai Sasaki, Kai Zheng and many other >contributors have made great efforts in system testing. Many thanks go >to Weihua Jiang for proposing the JIRA, and ATM, Todd Lipcon, Silvius >Rus, Suresh, as well as many others for providing helpful feedbacks. > >Following the community convention, this vote will last for 7 days >(ending September 29th). Votes from Hadoop committers are binding but >non-binding votes are very welcome as well. And here's my non-binding +1. > >Thanks, >--- >Zhe Zhang
RE: IMPORTANT: testing patches for branches
Thanks Allen for the great work. I tried in HADOOP-11847 (branch HDFS-7285) and it went well, very helpfully! Regards, Kai -Original Message- From: Allen Wittenauer [mailto:a...@altiscale.com] Sent: Thursday, April 23, 2015 7:22 PM To: common-...@hadoop.apache.org Cc: hdfs-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: IMPORTANT: testing patches for branches On Apr 22, 2015, at 11:34 PM, Zheng, Kai wrote: > Hi Allen, > > This sounds great. > >>> Naming a patch foo-HDFS-7285.00.patch should get tested on the HDFS-7285 >>> branch. > Does it happen locally in developer's machine when running test-patch.sh, or > also mean something in Hadoop Jenkins building when a JIRA becoming patch > available? Thanks. Both, now that a fix has been committed last night (there was a bug in the Jenkins handling). Given a patch name or URL, Jenkins and even running locally will try a few different methods to figure out which branch to use out. Note that a branch name of 'gitX' where X is a valid git reference also works to force a patch to start at a particular commit. For local use, you'll want to use a 'spare' copy of the source tree via the -basedir option and use the -resetrepo flag. That will enable Jenkins-like behavior and gives it permission to make modifications and effectively nuke any changes in the source tree you point it at. (Basically the opposite of the -dirty-workspace flag). If you want to force a branch (for whatever reason, including where the branch can't be figured out), you can use the -branch option. If you don't use -resetrepo, test-patch.sh will warn that it thinks the wrong branch is being used but will push on anyway. In any case, the result of what it thinks the branch is/should be will be in the summary output at the bottom along with the git ref that it specifically used for the test.
RE: IMPORTANT: testing patches for branches
Hi Allen, This sounds great. >> Naming a patch foo-HDFS-7285.00.patch should get tested on the HDFS-7285 >> branch. Does it happen locally in developer's machine when running test-patch.sh, or also mean something in Hadoop Jenkins building when a JIRA becoming patch available? Thanks. Regards, Kai -Original Message- From: Allen Wittenauer [mailto:a...@altiscale.com] Sent: Thursday, April 23, 2015 3:35 AM To: common-...@hadoop.apache.org Cc: yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org Subject: IMPORTANT: testing patches for branches Hey gang, Just so everyone is aware, if you are working on a patch for either a feature branch or a major branch, if you name the patch with the branch name following the spec in HowToContribute (and a few other ways... test-patch tries to figure it out!), test-patch.sh *should* be switching the repo over to that branch for testing. For example, naming a patch foo-branch-2.01.patch should get tested on branch-2. Naming a patch foo-HDFS-7285.00.patch should get tested on the HDFS-7285 branch. This hopefully means that there should really be no more 'blind' +1's to patches that go to branches. The "we only test against trunk" argument is no longer valid. :)
RE: Looking to a Hadoop 3 release
Might I have some comments for this, just providing my thought. Thanks. >> If we start now, it might make it out by 2016. If we start now, >> downstreamers can start aligning themselves to land versions that suit at >> about the same time. Not only for down streamers to align with the long term release, but also for contributors like me to align with their future effort, maybe. In addition to the JDK8 support and classpath isolation, might we add more possible candidate considerations. How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ? https://issues.apache.org/jira/browse/HADOOP-9797 The benefits: 1) allow multiple login sessions/contexts and authentication methods to be used in the same Java application/process without conflicts, providing good isolation by getting rid of globals and statics. 2) allow to pluggable new authentication methods for UGI, in modular, manageable and maintainable manner. Another, we would also push the first release of Apache Kerby, preparing for a strong dedicated and clean Kerberos library in Java for both client and KDC sides, and by leveraging the library, update Hadoop-MiniKDC and perform more security tests. https://issues.apache.org/jira/browse/DIRKRB-102 Hope this makes sense. Thanks. Regards, Kai -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Thursday, March 05, 2015 2:47 AM To: common-...@hadoop.apache.org Cc: mapreduce-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release In general +1 on 3.0.0. Its time. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time. While two big items have been called out as possible incompatible changes, and there is ongoing discussion as to whether they are or not*, is there any chance of getting a longer list of big differences between the branches? In particular I'd be interested in improvements that are 'off' by default that would be better defaulted 'on'. Thanks, St.Ack * Let me note that 'compatible' around these parts is a trampled concept seemingly open to interpretation with a definition that is other than prevails elsewhere in software. See Allen's list above, and in our downstream project, the recent HBASE-13149 "HBase server MR tools are broken on Hadoop 2.5+ Yarn", among others. Let 3.x be incompatible with 2.x if only so we can leave behind all current notions of 'compatibility' and just start over (as per Allen). On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang wrote: > Hi devs, > > It's been a year and a half since 2.x went GA, and I think we're about > due for a 3.x release. > Notably, there are two incompatible changes I'd like to call out, that > will have a tremendous positive impact for our users. > > First, classpath isolation being done at HADOOP-11656, which has been > a long-standing request from many downstreams and Hadoop users. > > Second, bumping the source and target JDK version to JDK8 (related to > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two > months from now). In the past, we've had issues with our dependencies > discontinuing support for old JDKs, so this will future-proof us. > > Between the two, we'll also have quite an opportunity to clean up and > upgrade our dependencies, another common user and developer request. > > I'd like to propose that we start rolling a series of monthly-ish > series of > 3.0 alpha releases ASAP, with myself volunteering to take on the RM > and other cat herding responsibilities. There are already quite a few > changes slated for 3.0 besides the above (for instance the shell > script rewrite) so there's already value in a 3.0 alpha, and the more > time we give downstreams to integrate, the better. > > This opens up discussion about inclusion of other changes, but I'm > hoping to freeze incompatible changes after maybe two alphas, do a > beta (with no further incompat changes allowed), and then finally a > 3.x GA. For those keeping track, that means a 3.x GA in about four months. > > I would also like to stress though that this is not intended to be a > big bang release. For instance, it would be great if we could maintain > wire compatibility between 2.x and 3.x, so rolling upgrades work. > Keeping > branch-2 and branch-3 similar also makes backports easier, since we're > likely maintaining 2.x for a while yet. > > Please let me know any comments / concerns related to the above. If > people are friendly to the idea, I'd like to cut a branch-3 and start > working on the first alpha. > > Best, > Andrew >
RE: 2.7 status
Thanks Vinod for the hints. I have updated the both patches aligning with latest codes, and added more unit tests. The building results look reasonable. Thanks anyone that would give them more review and I would update in timely manner. Regards, Kai -Original Message- From: Vinod Kumar Vavilapalli [mailto:vino...@hortonworks.com] Sent: Tuesday, March 03, 2015 11:31 AM To: Zheng, Kai Cc: mapreduce-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; Hadoop Common; yarn-...@hadoop.apache.org Subject: Re: 2.7 status Kai, please ping the reviewers that were already looking at your patches before. If the patches go in by end of this week, we can include them. Thanks, +Vinod On Mar 2, 2015, at 7:04 PM, Zheng, Kai wrote: > Is it interested to get the following issues in the release ? Thanks ! > > HADOOP-10670 > HADOOP-10671 > > Regards, > Kai > > -Original Message- > From: Yongjun Zhang [mailto:yzh...@cloudera.com] > Sent: Monday, March 02, 2015 4:46 AM > To: hdfs-dev@hadoop.apache.org > Cc: Vinod Kumar Vavilapalli; Hadoop Common; > mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org > Subject: Re: 2.7 status > > Hi, > > Thanks for working on 2.7 release. > > Currently the fallback from KerberosAuthenticator to PseudoAuthenticator is > enabled by default in a hardcoded way. HAOOP-10895 changes the default and > requires applications (such as oozie) to set a config property or call an API > to enable the fallback. > > This jira has been reviewed, and "almost" ready to get in. However, there is > a concern that we have to change the relevant applications. Please see my > comment here: > > https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14 > 321823&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta > bpanel#comment-14321823 > > Any of your comments will be highly appreciated. This jira was postponed from > 2.6. I think it should be no problem to skip 2.7. But your comments would > help us to decide what to do with this jira for future releases. > > Thanks. > > --Yongjun > > > On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy wrote: > >> Sounds good, thanks for the help Vinod! >> >> Arun >> >> >> From: Vinod Kumar Vavilapalli >> Sent: Sunday, March 01, 2015 11:43 AM >> To: Hadoop Common; Jason Lowe; Arun Murthy >> Subject: Re: 2.7 status >> >> Agreed. How about we roll an RC end of this week? As a Java 7+ >> release with features, patches that already got in? >> >> Here's a filter tracking blocker tickets - >> https://issues.apache.org/jira/issues/?filter=12330598. Nine open now. >> >> +Arun >> Arun, I'd like to help get 2.7 out without further delay. Do you mind >> me taking over release duties? >> >> Thanks, >> +Vinod >> >> From: Jason Lowe >> Sent: Friday, February 13, 2015 8:11 AM >> To: common-...@hadoop.apache.org >> Subject: Re: 2.7 status >> >> I'd like to see a 2.7 release sooner than later. It has been almost >> 3 months since Hadoop 2.6 was released, and there have already been >> 634 JIRAs committed to 2.7. That's a lot of changes waiting for an official >> release. >> >> https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2 >> C >> hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolut >> i >> on%3DFixed >> Jason >> >> From: Sangjin Lee >> To: "common-...@hadoop.apache.org" >> Sent: Tuesday, February 10, 2015 1:30 PM >> Subject: 2.7 status >> >> Folks, >> >> What is the current status of the 2.7 release? I know initially it >> started out as a "java-7" only release, but looking at the JIRAs that >> is very much not the case. >> >> Do we have a certain timeframe for 2.7 or is it time to discuss it? >> >> Thanks, >> Sangjin >> >>
RE: 2.7 status
Is it interested to get the following issues in the release ? Thanks ! HADOOP-10670 HADOOP-10671 Regards, Kai -Original Message- From: Yongjun Zhang [mailto:yzh...@cloudera.com] Sent: Monday, March 02, 2015 4:46 AM To: hdfs-dev@hadoop.apache.org Cc: Vinod Kumar Vavilapalli; Hadoop Common; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: 2.7 status Hi, Thanks for working on 2.7 release. Currently the fallback from KerberosAuthenticator to PseudoAuthenticator is enabled by default in a hardcoded way. HAOOP-10895 changes the default and requires applications (such as oozie) to set a config property or call an API to enable the fallback. This jira has been reviewed, and "almost" ready to get in. However, there is a concern that we have to change the relevant applications. Please see my comment here: https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823 Any of your comments will be highly appreciated. This jira was postponed from 2.6. I think it should be no problem to skip 2.7. But your comments would help us to decide what to do with this jira for future releases. Thanks. --Yongjun On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy wrote: > Sounds good, thanks for the help Vinod! > > Arun > > > From: Vinod Kumar Vavilapalli > Sent: Sunday, March 01, 2015 11:43 AM > To: Hadoop Common; Jason Lowe; Arun Murthy > Subject: Re: 2.7 status > > Agreed. How about we roll an RC end of this week? As a Java 7+ release > with features, patches that already got in? > > Here's a filter tracking blocker tickets - > https://issues.apache.org/jira/issues/?filter=12330598. Nine open now. > > +Arun > Arun, I'd like to help get 2.7 out without further delay. Do you mind > me taking over release duties? > > Thanks, > +Vinod > > From: Jason Lowe > Sent: Friday, February 13, 2015 8:11 AM > To: common-...@hadoop.apache.org > Subject: Re: 2.7 status > > I'd like to see a 2.7 release sooner than later. It has been almost 3 > months since Hadoop 2.6 was released, and there have already been 634 > JIRAs committed to 2.7. That's a lot of changes waiting for an official > release. > > https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2C > hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resoluti > on%3DFixed > Jason > > From: Sangjin Lee > To: "common-...@hadoop.apache.org" > Sent: Tuesday, February 10, 2015 1:30 PM > Subject: 2.7 status > > Folks, > > What is the current status of the 2.7 release? I know initially it > started out as a "java-7" only release, but looking at the JIRAs that > is very much not the case. > > Do we have a certain timeframe for 2.7 or is it time to discuss it? > > Thanks, > Sangjin > >
RE: Looking to a Hadoop 3 release
Sorry for the bad. I thought it was sending to my colleagues. By the way, for the JDK8 support, we (Intel) would like to investigate further and help, thanks. Regards, Kai -Original Message- From: Zheng, Kai Sent: Tuesday, March 03, 2015 8:49 AM To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: RE: Looking to a Hadoop 3 release JDK8 support is in the consideration, looks like many issues were reported and resolved already. https://issues.apache.org/jira/browse/HADOOP-11090 -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
RE: Looking to a Hadoop 3 release
JDK8 support is in the consideration, looks like many issues were reported and resolved already. https://issues.apache.org/jira/browse/HADOOP-11090 -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: Tuesday, March 03, 2015 7:20 AM To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Looking to a Hadoop 3 release Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
RE: Anyone know how to mock a secured hdfs for unit test?
Hi Chris, Thanks for your great info. I would paste it in the JIRA for future reference if I or somebody else get the chance to work on it. Regards, Kai -Original Message- From: Chris Nauroth [mailto:cnaur...@hortonworks.com] Sent: Saturday, June 28, 2014 4:27 AM To: secur...@hadoop.apache.org Cc: yarn-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; hdfs-iss...@hadoop.apache.org; yarn-iss...@hadoop.apache.org; mapreduce-...@hadoop.apache.org Subject: Re: Anyone know how to mock a secured hdfs for unit test? Hi David and Kai, There are a couple of challenges with this, but I just figured out a pretty decent setup while working on HDFS-2856. That code isn't committed yet, but if you open patch version 5 attached to that issue and look for the TestSaslDataTransfer class, then you'll see how it works. Most of the logic for bootstrapping a MiniKDC and setting up the right HDFS configuration properties is in an abstract base class named SaslDataTransferTestCase. I hope this helps. There are a few other open issues out there related to tests in secure mode. I know of HDFS-4312 and HDFS-5410. It would be great to get more regular test coverage with something that more closely approximates a secured deployment. Chris Nauroth Hortonworks http://hortonworks.com/ On Thu, Jun 26, 2014 at 7:27 AM, Zheng, Kai wrote: > Hi David, > > Quite some time ago I opened HADOOP-9952 and planned to create secured > MiniClusters by making use of MiniKDC. Unfortunately since then I > didn't get the chance to work on it yet. If you need something like > that and would contribute, please let me know and see if anything I can help > with. Thanks. > > Regards, > Kai > > -Original Message- > From: Liu, David [mailto:liujion...@gmail.com] > Sent: Thursday, June 26, 2014 10:12 PM > To: hdfs-dev@hadoop.apache.org; hdfs-iss...@hadoop.apache.org; > yarn-...@hadoop.apache.org; yarn-iss...@hadoop.apache.org; > mapreduce-...@hadoop.apache.org; secur...@hadoop.apache.org > Subject: Anyone know how to mock a secured hdfs for unit test? > > Hi all, > > I need to test my code which read data from secured hdfs, is there any > library to mock secured hdfs, can minihdfscluster do the work? > Any suggestion is appreciated. > > > Thanks > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: Anyone know how to mock a secured hdfs for unit test?
Hi David, Quite some time ago I opened HADOOP-9952 and planned to create secured MiniClusters by making use of MiniKDC. Unfortunately since then I didn't get the chance to work on it yet. If you need something like that and would contribute, please let me know and see if anything I can help with. Thanks. Regards, Kai -Original Message- From: Liu, David [mailto:liujion...@gmail.com] Sent: Thursday, June 26, 2014 10:12 PM To: hdfs-dev@hadoop.apache.org; hdfs-iss...@hadoop.apache.org; yarn-...@hadoop.apache.org; yarn-iss...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; secur...@hadoop.apache.org Subject: Anyone know how to mock a secured hdfs for unit test? Hi all, I need to test my code which read data from secured hdfs, is there any library to mock secured hdfs, can minihdfscluster do the work? Any suggestion is appreciated. Thanks
RE: Replacing the JSP web UIs to HTML 5 applications
> having /JMX for monitoring integration and a /JSON end point for the UI IMHO, this makes sense, especially for the long term. JMX interface serves as management console in admin perspective, WebUI serves as end user interface. Both might share same functionality codes, but that does not validate we couple them together. Thanks & regards, Kai -Original Message- From: Alejandro Abdelnur [mailto:t...@cloudera.com] Sent: Tuesday, October 29, 2013 8:14 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Replacing the JSP web UIs to HTML 5 applications Isn't using JMX to expose JSON for the web UI misusing JMX? I would think a more appropriate approach would be having /JMX for monitoring integration and a /JSON end point for the UI data. Thanks. On Mon, Oct 28, 2013 at 4:58 PM, Haohui Mai wrote: > Alejandro, > > If I understand correctly, that is the exact approach that the new web > UI is taking. The new web UI takes the output from JMX and renders > them as HTML at the client side. > > ~Haohui > > > On Mon, Oct 28, 2013 at 4:18 PM, Alejandro Abdelnur >wrote: > > > Haohui, > > > > If you have NN and DNs producing JSON instead HTML, then you can > > build JS based web UIs. Take for example Oozie, Oozie produces JSON, > > it has a > built > > in JS web ui that consumes JSON and Hue has built an external web UI > > that also consumes JSON. In the case of Hue UI, Oozie didn't have to > > change anything to get that UI and improvements on the Hue UI don't > > require changes in Oozie unless it is to produce additional information. > > > > hope this clarifies. > > > > Thx > > > > > > On Mon, Oct 28, 2013 at 4:06 PM, Haohui Mai > wrote: > > > > > Echo my comments on HDFS-5402: > > > > > > bq. If we're going to remove the old web UI, I think the new web > > > UI has to have the same level of unit testing. We shouldn't go > > > backwards in terms of unit testing. > > > > > > I take a look at TestNamenodeJspHelper / TestDatanodeJspHelper / > > > TestClusterJspHelper. It seems to me that we can merge these tests > > > with > > the > > > unit tests on JMX. > > > > > > bq. If we are going to > > > remove this capability, we need to add some other command-line > > > tools to get the same functionality. These tools could use REST if > > > we have that, or JMX, but they need to exist before we can > > > consider removing the old UI. > > > > > > This is a good point. Since all information are available through > > > JMX, > > the > > > easiest way to approach it is to write some scripts using Node.js. > > > The architecture of the new Web UIs is ready for this. > > > > > > > > > On Mon, Oct 28, 2013 at 3:57 PM, Alejandro Abdelnur > > > > > >wrote: > > > > > > > Producing JSON would be great. Agree with Colin that we should > > > > leave > > for > > > > now the current JSP based web ui. > > > > > > > > thx > > > > > > > > > > > > On Mon, Oct 28, 2013 at 11:16 AM, Colin McCabe < > cmcc...@alumni.cmu.edu > > > > >wrote: > > > > > > > > > This is a really interesting project, Haohui. I think it will > > > > > make our web UI much nicer. > > > > > > > > > > I have a few concerns about removing the old web UI, however: > > > > > > > > > > * If we're going to remove the old web UI, I think the new web > > > > > UI > has > > > > > to have the same level of unit testing. We shouldn't go > > > > > backwards > in > > > > > terms of unit testing. > > > > > > > > > > * Most of the deployments of elinks and links out there don't > support > > > > > Javascript. This is just a reality of life when using CentOS > > > > > 5 or > 6, > > > > > which many users are still using. I have used "links" to > > > > > diagnose problems through the web UI in the past, in systems > > > > > where access to the cluster was available only through telnet. > > > > > If we are going to remove this capability, we need to add some > > > > > other command-line > tools > > > > > to get the same functionality. These tools could use REST if > > > > > we > have > > > > > that, or JMX, but they need to exist before we can consider > removing > > > > > the old UI. > > > > > > > > > > best, > > > > > Colin > > > > > > > > > > On Fri, Oct 25, 2013 at 7:31 PM, Haohui Mai > > > > > > > > > wrote: > > > > > > Thanks for the reply, Luke. Here I just echo my response > > > > > > from the > > > jira: > > > > > > > > > > > > bq. this client-side js only approach, which is less secure > > > > > > than > a > > > > > > progressively enhanced hybrid approach used by YARN. The > > > > > > recent > > gmail > > > > > > XSS fiasco highlights the issue. > > > > > > > > > > > > I'm presenting an informal security analysis to compare the > > security > > > of > > > > > the > > > > > > old and the new web UIs. > > > > > > > > > > > > An attacker launches an XSS attack by injecting malicious > > > > > > code > > which > > > > are > > > > > > usually HTML or JavaScript fragments into the web page, so > > > > > > that > the > > > > > >