Re: [DISCUSS] Merge HDFS-10467 to (Router-based federation) trunk

Iñigo Goiri Mon, 28 Aug 2017 20:02:30 -0700

Brahma, thank you for the comments.
i) I can send a patch with the diff between branches.
ii) Working with Giovanni for the review.
iii) We had some numbers in our cluster.
iv) We could have a Router just for giving a view of all the namespaces
without giving RPC accesses. Another case might be only allowing WebHDFS
and not RPC. We could consolidate nevertheless.
I will open a JIRA to extend the documentation with the configuration keys.
v) I'm open to do more tests. I think the guys from LinkedIn wanted to test
some more frameworks in their dev setup. In addition, before merging, I'd
run the version in trunk for a few days.
v) Good catches, I'll open JIRAs for those.


On Mon, Aug 28, 2017 at 6:12 AM, Brahma Reddy Battula <
[email protected]> wrote:

> Nice Feature, Great work Guys. Looking forward getting in this, as already
> YARN federation is in.
>
> At first glance I have few questions
>
> i) Could have a consolidated patch for better review..?
>
> ii) Hoping  "Federation Metrics" and "Federation UI" will be included.
>
> iii) do we've RPC benchmarks ?
>
> iv) As of now "dfs.federation.router.rpc.enable"  and
> "dfs.federation.router.store.enable" made "true", does we need to keep
> this configs..? since without this router might not be useful..?
>
> iv) bq. The rest of the options are documented in [hdfs-default.xml]
>  I feel, better to document  all the configurations. I see, there are so
> many, how about document in tabular format..?
>
> v) Downstream projects (Spark,HBASE,HIVE..) integration testing..? looks
> you mentioned, is that enough..?
>
> v) mvn install (and package) is failing with following error
>
> [INFO]   Adding ignore: *
> [WARNING] Rule 1: org.apache.maven.plugins.enforcer.BanDuplicateClasses
> failed with message:
> Duplicate classes found:
>
>   Found in:
>     org.apache.hadoop:hadoop-client-minicluster:jar:3.0.0-
> beta1-SNAPSHOT:compile
>     org.apache.hadoop:hadoop-client-runtime:jar:3.0.0-
> beta1-SNAPSHOT:compile
>   Duplicate classes:
>     org/apache/hadoop/shaded/org/apache/curator/framework/api/
> DeleteBuilder.class
>     org/apache/hadoop/shaded/org/apache/curator/framework/
> CuratorFramework.class
>
>
> I added "hadoop-client-minicluster" to ignore list to get success
>
> hadoop\hadoop-client-modules\hadoop-client-integration-tests\pom.xml
>
>                   <dependencies>
>                     <dependency>
>                       <groupId>org.apache.hadoop</groupId>
>                       <artifactId>hadoop-annotations</artifactId>
>                       <ignoreClasses>
>                         <ignoreClass>*</ignoreClass>
>                       </ignoreClasses>
>                     </dependency>
>                     <dependency>
>                       <groupId>org.apache.hadoop</groupId>
>                       <artifactId>hadoop-client-minicluster</artifactId>
>                       <ignoreClasses>
>                         <ignoreClass>*</ignoreClass>
>                       </ignoreClasses>
>                     </dependency>
>
>
> Please correct me If I am wrong.
>
>
> --Brahma Reddy Battula
>
> -----Original Message-----
> From: Chris Douglas [mailto:[email protected]]
> Sent: 25 August 2017 06:37
> To: Andrew Wang
> Cc: Iñigo Goiri; [email protected]; [email protected]
> Subject: Re: [DISCUSS] Merge HDFS-10467 to (Router-based federation) trunk
>
> On Thu, Aug 24, 2017 at 2:25 PM, Andrew Wang <[email protected]>
> wrote:
> > Do you mind holding this until 3.1? Same reasoning as for the other
> > branch merge proposals, we're simply too late in the 3.0.0 release cycle.
>
> That wouldn't be too dire.
>
> That said, this has the same design and impact as YARN federation.
> Specifically, it sits almost entirely outside core HDFS, so it will not
> affect clusters running without R-BF.
>
> Merging would allow the two router implementations to converge on a common
> backend, which has started with HADOOP-14741 [1]. If the HDFS side only
> exists in 3.1, then that work would complicate maintenance of YARN in
> 3.0.x, which may require bug fixes as it stabilizes.
>
> Merging lowers costs for maintenance with a nominal risk to stability.
> The feature is well tested, deployed, and actively developed. The
> modifications to core HDFS [2] (~23k) are trivial.
>
> So I'd still advocate for this particular merge on those merits. -C
>
> [1] https://issues.apache.org/jira/browse/HADOOP-14741
> [2] git diff --diff-filter=M $(git merge-base apache/HDFS-10467
> apache/trunk)..apache/HDFS-10467
>
> > On Thu, Aug 24, 2017 at 1:39 PM, Chris Douglas <[email protected]>
> wrote:
> >>
> >> I'd definitely support merging this to trunk. The implementation is
> >> almost entirely outside of HDFS and, as Inigo detailed, has been
> >> tested at scale. The branch is in a functional state with
> >> documentation and tests. -C
> >>
> >> On Mon, Aug 21, 2017 at 6:11 PM, Iñigo Goiri <[email protected]> wrote:
> >> > Hi all,
> >> >
> >> >
> >> >
> >> > We would like to open a discussion on merging the Router-based
> >> > Federation feature to trunk.
> >> >
> >> > Last week, there was a thread about which branches would go into
> >> > 3.0 and given that YARN federation is going, this might be a good
> >> > time for this to be merged too.
> >> >
> >> >
> >> > We have been running "Router-based federation" in production for a
> year.
> >> >
> >> > Meanwhile, we have been releasing it in a feature branch
> >> > (HDFS-10467
> >> > [1])
> >> > for a while.
> >> >
> >> > We are reasonably confident that the state of the branch is about
> >> > to meet the criteria to be merged onto trunk.
> >> >
> >> >
> >> > *Feature*:
> >> >
> >> > This feature aggregates multiple namespaces into a single one
> >> > transparently to the user.
> >> >
> >> > It has a similar architecture to YARN federation (YARN-2915).
> >> >
> >> > It consists on Routers that handle requests from the clients and
> >> > forwards them to the right subcluster and exposes the same API as
> >> > the Namenode.
> >> >
> >> > Currently we use a mount table (similar to ViewFs) but can be
> >> > replaced by other approaches.
> >> >
> >> > The Routers share their state in a State Store.
> >> >
> >> >
> >> >
> >> > The main advantage is that clients interact with the Routers as
> >> > they were Namenode so there is no changes in the client required
> >> > other than poiting to the right address.
> >> >
> >> > In addition, all the management is moved to the server side so
> >> > changes to the Mount Table can be done without having to sync the
> >> > clients (pull/push).
> >> >
> >> >
> >> >
> >> > *Status*:
> >> >
> >> > The branch already contains all the features required to work
> >> > end-to-end.
> >> >
> >> > There are a couple open JIRAs that would be required for the merged
> >> > (i.e., Web UI) but they should be finished soon.
> >> >
> >> > We have been running it in production for the last year and we have
> >> > a paper with some of the details of our production deployment [2].
> >> >
> >> > We have 4 production deployments with the largest one spanning more
> >> > than 20k servers across 6 subclusters.
> >> >
> >> > In addition, the guys at LinkedIn had started testing Router-based
> >> > federation and they will be adding security to the branch.
> >> >
> >> >
> >> >
> >> > The modifications to the rest of HDFS are minimal:
> >> >
> >> >    - Changed visibility for some methods (e.g., MiniDFSCluster)
> >> >    - Added some utilities to extract addresses
> >> >    - Modified hdfs and hdfs.cmd to start the Router and manager the
> >> >    federation
> >> >    - Modified hdfs-default.xml
> >> >
> >> > Everything else is self-contained in a federation package.
> >> >
> >> > In addition, all the functionality is in the Router so it’s
> >> > disabled by default.
> >> >
> >> > Even when enabled, there is no impact for regular HDFS and it would
> >> > only require to configure the trust between the Namenode and the
> >> > Router once security is enabled.
> >> >
> >> >
> >> >
> >> > I have been continuously rebasing the feature branch (updated up to
> >> > 1 week
> >> > ago) so the merge should be a straightforward cherry-pick.
> >> >
> >> >
> >> >
> >> > *Problems*:
> >> >
> >> > The problems I’m aware of are the following:
> >> >
> >> >    - We implement ClientProtocol so anytime a new method is added
> >> > there, we
> >> >    would need to add it to the Router. However, it’s
> >> > straightforward to add
> >> >    unimplemented methods.
> >> >    - There is some argument about naming the feature as “Router-based
> >> >    federation” but I’m open for better names.
> >> >
> >> >
> >> >
> >> > *Credits*:
> >> >
> >> > I’d like to thank the people at Microsoft (specially, Jason,
> >> > Ricardo, Chris, Subru, Jakob, Carlo and Giovanni), Twitter (Ming
> >> > and Gera), and LinkedIn (Zhe, Erik and Konstantin) for the discussion
> and the ideas.
> >> >
> >> > Special thanks to Chris Douglas for the thorough reviews!
> >> >
> >> >
> >> >
> >> > Please look through the branch; feedback is welcome. Thanks!
> >> >
> >> >
> >> > Cheers,
> >> >
> >> > Inigo
> >> >
> >> >
> >> >
> >> >
> >> > [1] https://issues.apache.org/jira/browse/HDFS-10467
> >> >
> >> > [2] https://www.usenix.org/conference/atc17/technical-
> >> > sessions/presentation/misra
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [DISCUSS] Merge HDFS-10467 to (Router-based federation) trunk

Reply via email to