Thank you Xiaoqiao.

I have been pondering about fostering a better community, one that
advocates more collaboration. It is my intend to make that happen.

Clearly, a lot of great success had happened in this community, and this is
a highly professional community.

But we set the bar so high, that I feel like it is not very friendly to
newbies. And clearly, I see a lot of folks eager to contribute to this
project.
How can we work together to make this a more newbie-friendly community, is
my question.

I think one observation is that there are only a limited number of active
committers in the HDFS project (take this year for example, only 5
committers have made more than 10 HDFS commits). A limited review bandwidth
means some patches are left unreviewed. We typically nominate a new
committer when he/she contribute make a certain sizable amount of
contribution. Without sufficient review bandwidth, it gets harder for a
contributor to progress into a committer. Eventually, HDFS project goes
into a slow death when we are unable to nominate new committers faster than
the speed for committers to go inactive.

The jira isn't a reviewer friendly place either. I see a lot of great ideas
left un-reviewed, or unresolved, because it is hard to track what I am
reviewing or what I plan to do with a jira. We keep a similar spreadsheet
at Cloudera for patches made available by Clouderans. But there's no reason
why we can't do this across all contributors, as long as people find it
useful.

On Sat, Jun 22, 2019 at 3:17 AM Xiaoqiao He <xq.he2...@gmail.com> wrote:

> Thanks Wei-Chiu for your great work.
>
> All JIRAs listed is very valuable and I would like to try my best to
> participate to review and give some feedback.
> Another side, I think there are also some helpful JIRAs but not digged up.
> Does the spreadsheet support to insert more candidate JIRAs about
> performance? (to Wei-Chiu)
>
Please feel free to enhance the spreadsheet.

>
> Some other discussion,
> a. I suggest that we should go through all JIRAs regularly and report some
> performance improvement JIRAs, Of course it really takes up lots of time,
> and I believe many guys/contributors would like to participate in.
> Meanwhile it may be good topic for community sync up (cc @Wangda).

 Sounds like a great idea. It would also be a great opportunity to talk
about a bigger initiative. Like I see a few folks from Xiaomi making really
good work there, and I'd be interested to learn more.



b. Beyond that, I think we should also scan some BUG JIRAs (for instance
> HDFS-12862) reported but not fixed up to now.
> Thanks Wei-Chiu again.
>
> Best Regards,
> Hexiaoqiao
>
>
> On Sat, Jun 22, 2019 at 11:47 AM Wei-Chiu Chuang <weic...@apache.org>
> wrote:
>
> > I spent the past week going through most of the jiras with a patch
> attached
> > in the past, and turned up some really good stuff to helps improve HDFS
> > performance.
> >
> > The list of jiras are listed in the following spreadsheet. If you are
> > interested in reviewing those jiras, please update the following
> > spreadsheet and add you as a reviewer. A reviewer does not need to be a
> > Hadoop committer, but it helps to give the author the feedback.
> >
> >
> >
> https://docs.google.com/spreadsheets/d/1dvLoZ039ZirdZF9p0wWKhFCtD91jfbdkPg4XZ-AnMNg/edit?usp=sharing
> >
> > I am doing this exercise to identify known performance limitations +
> fixes
> > submitted but never got committed. There are cases where patch was
> reviewed
> > or even blessed with +1, but didn't pushed to the repo; there are cases
> > where good ideas never got reviewed.
> >
> > I think this is the low hanging fruit that we as a community should do.
> >
> > I use this filter to search for Hadoop/HDFS patches, if you are
> interested:
> >
> >
> https://issues.apache.org/jira/issues/?filter=12311124&jql=project%20in%20(HADOOP%2C%20HDFS)%20AND%20status%20%3D%20%22Patch%20Available%22%20ORDER%20BY%20updated%20DESC%2C%20key%20DESC
> >
> > Best,
> > Wei-Chiu
> >
>

Reply via email to