Hi Rahul,

I was looking for something more detailed and low-level like how the code
> for the various services in HDFS is organized, entrypoints etc.

I found this book useful to get a good idea of Hadoop in general - Apache
Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache
Hadoop™ 2 [Book] (oreilly.com)
<https://www.oreilly.com/library/view/apache-hadooptm-yarn/9780133441925/>.

In my opinion, you get into Open Source contributions by just doing so. You
don't have to know HDFS in detail to start contributing to it. Now that
you've gone through the Hadoop documentation, try setting up Hadoop in
pseudo-distributed mode. If you notice any glitch, try fixing it and send
out a PR. You never know what issue you'll find. I ran into this when I
tried compiling Hadoop on Windows - [HDFS-15385] Upgrade boost library to
1.72 - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HDFS-15385> (And yes, this was my
first PR to Hadoop). Then use Docker and set up the Hadoop cluster with
multiple nodes. Once you're able to do this, try browsing issues.apache.org
and you'll find tons of issues that you can work on. There's always so much
work to do in Open Source and the thing that I like the most is that
"there's no deadline on anything" :) So, you can really work on some
awesome stuff, own it, perfect it and share it with the world.

Best of luck.

Thanks,
--Gautham

On Sun, 12 Jun 2022 at 16:34, Rahul Bhardwaj <rahul265...@gmail.com> wrote:

> Hi all,
> I am a newbie wanting to start contributing to the hadoop ecosystem. I want
> to start by contributing to HDFS and was looking for resources to
> understand the architecture and I just found this -
>
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> which is a fairly high level documentation. I was looking for something
> more detailed and low-level like how the code for the various services in
> HDFS is organized, entrypoints etc. Can someone point me to such resources?
> Also is there a slack workspace for such discussions? Not sure if this
> mailing list is the right forum for such doubts.
>

Reply via email to