from:"Weiwei Yang"

Re: [VOTE] Moving Ozone to a separated Apache project

2020-09-25 Thread Weiwei Yang

+1

Weiwei

On Friday, September 25, 2020, Dinesh Chitlangia  wrote:

> +1
>
> Thanks,
> Dinesh
>
> On Fri, Sep 25, 2020 at 2:00 AM Elek, Marton  wrote:
>
> > Hi all,
> >
> > Thank you for all the feedback and requests,
> >
> > As we discussed in the previous thread(s) [1], Ozone is proposed to be a
> > separated Apache Top Level Project (TLP)
> >
> > The proposal with all the details, motivation and history is here:
> >
> >
> > https://cwiki.apache.org/confluence/display/HADOOP/
> Ozone+Hadoop+subproject+to+Apache+TLP+proposal
> >
> > This voting runs for 7 days and will be concluded at 2nd of October, 6AM
> > GMT.
> >
> > Thanks,
> > Marton Elek
> >
> > [1]:
> >
> > https://lists.apache.org/thread.html/rc6c79463330b3e993e24a564c6817
> aca1d290f186a1206c43ff0436a%40%3Chdfs-dev.hadoop.apache.org%3E
> >
> > -
> > To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
> >
> >
>

Re: [ANNOUNCE] New Apache Hadoop Committer - He Xiaoqiao

2020-06-11 Thread Weiwei Yang

Congratulations Xiaoqiao!

On Thu, Jun 11, 2020 at 11:20 AM Sree Vaddi 
wrote:

> Congratulations, He Xiaoqiao.
>
> Thank you./Sree
>
>
>
> On Thursday, June 11, 2020, 9:54:32 AM PDT, Chao Sun <
> sunc...@apache.org> wrote:
>
>  Congrats Xiaoqiao!
>
> On Thu, Jun 11, 2020 at 9:36 AM Ayush Saxena  wrote:
>
> > Congratulations He Xiaoqiao!!!
> >
> > -Ayush
> >
> > > On 11-Jun-2020, at 9:30 PM, Wei-Chiu Chuang 
> wrote:
> > >
> > > In bcc: general@
> > >
> > > It's my pleasure to announce that He Xiaoqiao has been elected as a
> > > committer on the Apache Hadoop project recognizing his continued
> > > contributions to the
> > > project.
> > >
> > > Please join me in congratulating him.
> > >
> > > Hearty Congratulations & Welcome aboard Xiaoqiao!
> > >
> > > Wei-Chiu Chuang
> > > (On behalf of the Hadoop PMC)
> >
> > -
> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> >

Re: [DISCUSS] making Ozone a separate Apache project

2020-05-13 Thread Weiwei Yang

+1
Thanks, Marton for starting the discussion.
One question, for the committers who contributed to Ozone before and got
the committer-role in the past (like me), will they carry the
committer-role to the new repo?

On Wed, May 13, 2020 at 9:07 AM Dinesh Chitlangia 
wrote:

> +1
> Thank you Marton for writing up the history and supporting comments.
>
> -Dinesh
>
> On Wed, May 13, 2020 at 3:53 AM Elek, Marton  wrote:
>
> >
> >
> > I would like to start a discussion to make a separate Apache project for
> > Ozone
> >
> >
> >
> > ### HISTORY [1]
> >
> >   * Apache Hadoop Ozone development started on a feature branch of
> > Hadoop repository (HDFS-7240)
> >
> >   * In the October of 2017 a discussion has been started to merge it to
> > the Hadoop main branch
> >
> >   * After a long discussion it's merged to Hadoop trunk at the March of
> > 2018
> >
> >   * During the discussion of the merge, it was suggested multiple times
> > to create a separated project for the Ozone. But at that time:
> >  1). Ozone was tightly integrated with Hadoop/HDFS
> >  2). There was an active plan to use Block layer of Ozone (HDDS or
> > HDSL at that time) as the block level of HDFS
> >  3). The community of Ozone was a subset of the HDFS community
> >
> >   * The first beta release of Ozone was just released. Seems to be a
> > good time before the first GA to make a decision about the future.
> >
> >
> >
> > ### WHAT HAS BEEN CHANGED
> >
> >   During the last years Ozone became more and more independent both at
> > the community and code side. The separation has been suggested again and
> > again (for example by Owen [2] and Vinod [3])
> >
> >
> >
> >   From COMMUNITY point of view:
> >
> >
> >* Fortunately more and more new contributors are helping Ozone.
> > Originally the Ozone community was a subset of HDFS project. But now a
> > bigger and bigger part of the community is related to Ozone only.
> >
> >* It seems to be easier to _build_ the community as a separated
> project.
> >
> >* A new, younger project might have different practices
> > (communication, commiter criteria, development style) compared to old,
> > mature project
> >
> >* It's easier to communicate (and improve) these standards in a
> > separated projects with clean boundaries
> >
> >* Separated project/brand can help to increase the adoption rate and
> > attract more individual contributor (AFAIK it has been seen in Submarine
> > after a similar move)
> >
> >   * Contribution process can be communicated more easily, we can make
> > first time contribution more easy
> >
> >
> >
> >   From CODE point of view Ozone became more and more independent:
> >
> >
> >   * Ozone has different release cycle
> >
> >   * Code is already separated from Hadoop code base
> > (apache/hadoop-ozone.git)
> >
> >   * It has separated CI (github actions)
> >
> >   * Ozone uses different (more strict) coding style (zero toleration of
> > unit test / checkstyle errors)
> >
> >   * The code itself became more and more independent from Hadoop on
> > Maven level. Originally it was compiled together with the in-tree latest
> > Hadoop snapshot. Now it depends on released Hadoop artifacts (RPC,
> > Configuration...)
> >
> >   * It starts to use multiple version of Hadoop (on client side)
> >
> >   * Volume of resolved issues are already very high on Ozone side (Ozone
> > had slightly more resolved issues than HDFS/YARN/MAPREDUCE/COMMON all
> > together in the last 2-3 months)
> >
> >
> > Summary: Before the first Ozone GA release, It seems to be a good time
> > to discuss the long-term future of Ozone. Managing it as a separated TLP
> > project seems to have more benefits.
> >
> >
> > Please let me know what your opinion is...
> >
> > Thanks a lot,
> > Marton
> >
> >
> >
> >
> >
> > [1]: For more details, see:
> > https://github.com/apache/hadoop-ozone/blob/master/HISTORY.md
> >
> > [2]:
> >
> >
> https://lists.apache.org/thread.html/0d0253f6e5fa4f609bd9b917df8e1e4d8848e2b7fdb3099b730095e6%40%3Cprivate.hadoop.apache.org%3E
> >
> > [3]:
> >
> >
> https://lists.apache.org/thread.html/8be74421ea495a62e159f2b15d74627c63ea1f67a2464fa02c85d4aa%40%3Chdfs-dev.hadoop.apache.org%3E
> >
> > -
> > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >
> >
>

Re: [Announcement] DiDi HDFS successfully upgraded from 2.7.2 to 3.2.0

2019-12-09 Thread Weiwei Yang

Awesome! You guys rock!!
Any plan to upgrade YARN to latest version too?

--
Weiwei
On Dec 9, 2019, 5:20 PM -0800, Hui Fei , wrote:
> Hi Eric,
>
> It's Rolling Upgrade without downtime. We have a cluster with thousands of
> nodes and 5 namespace, it takes 1 month to do the upgrade. Upgrading
> Service includes JournalNode, NameNode, ZKFC and DataNode.
> Related issues of rolling upgrade
> are HDFS-13596，HDFS-14396，HDFS-14831，HDFS-14509.
> Watching HDFS-13671 about performance.
>
> Thanks!
>
> Eric Badger  于2019年12月9日周一 上午11:54写道：
>
> > Hi Runlin,
> >
> > Awesome to hear that you've been able to upgrade! Do you mind sharing some
> > thoughts on your experience with the upgrade? Did you take a cluster
> > downtime to do the upgrade? How long did the upgrade take? Did you find any
> > incompatibilities or issues during or after the upgrade? Do you know if
> > your performance is better, worse, or similar?
> >
> > Feel free to answer as many or as few of these as you would like.
> >
> > Thanks!
> >
> > Eric
> >
> > On Sun, Dec 8, 2019 at 9:14 PM Runlin zhang  wrote:
> >
> > > Hi Folks,
> > >
> > > This is Runlin Zhang from DiDi. I'm glad to announce that our HDFS
> > has
> > > just been successfully upgraded from 2.7.2 to 3.2.0， Our cluster has
> > nearly
> > > 10,000 nodes，Now the cluster services are stable and we are using new
> > > features, such as EC, that bring huge benefits！ Here I want to thank my
> > > colleagues for their efforts, Fei Hui, Hu Haiyang, Wang Hongbing, Zhu
> > > Linhai, Wu Tong; And thanks to the community.
> > >
> > > It is also highly recommended that you upgrade the version to 3.2.1,
> > If
> > > you have any problems in the upgrade, you can also communicate and
> > discuss
> > > with me.
> > >
> > >
> > > Thanks
> > > Runlin Zhang
> > >
> >

Re: [VOTE] Release Hadoop-3.1.3-RC0

2019-09-19 Thread Weiwei Yang

+1 (binding)

- Downloaded source, setup a single node cluster
- Verified basic HDFS operations, put/get/cat etc
- Verified basic YARN restful apis, cluster/nodes/scheduler, all seem good
- Run several distributed shell job

Thanks
Weiwei
On Sep 19, 2019, 4:28 PM +0800, Sunil Govindan , wrote:
> +1 (binding)
>
> Thanks Zhankun for putting up the release. Thanks for leading this.
>
> - verified signature
> - ran a local cluster from tar ball
> - ran some MR jobs
> - perform CLI ops, and looks good
> - UI seems fine
>
> Thanks
> Sunil
>
> On Thu, Sep 12, 2019 at 1:34 PM Zhankun Tang  wrote:
>
> > Hi folks,
> >
> > Thanks to everyone's help on this release. Special thanks to Rohith,
> > Wei-Chiu, Akira, Sunil, Wangda!
> >
> > I have created a release candidate (RC0) for Apache Hadoop 3.1.3.
> >
> > The RC release artifacts are available at:
> > http://home.apache.org/~ztang/hadoop-3.1.3-RC0/
> >
> > The maven artifacts are staged at:
> > https://repository.apache.org/content/repositories/orgapachehadoop-1228/
> >
> > The RC tag in git is here:
> > https://github.com/apache/hadoop/tree/release-3.1.3-RC0
> >
> > And my public key is at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > *This vote will run for 7 days, ending on Sept.19th at 11:59 pm PST.*
> >
> > For the testing, I have run several Spark and distributed shell jobs in my
> > pseudo cluster.
> >
> > My +1 (non-binding) to start.
> >
> > BR,
> > Zhankun
> >
> > On Wed, 4 Sep 2019 at 15:56, zhankun tang  wrote:
> >
> > > Hi all,
> > >
> > > Thanks for everyone helping in resolving all the blockers targeting
> > Hadoop
> > > 3.1.3[1]. We've cleaned all the blockers and moved out non-blockers
> > issues
> > > to 3.1.4.
> > >
> > > I'll cut the branch today and call a release vote soon. Thanks!
> > >
> > >
> > > [1]. https://s.apache.org/5hj5i
> > >
> > > BR,
> > > Zhankun
> > >
> > >
> > > On Wed, 21 Aug 2019 at 12:38, Zhankun Tang  wrote:
> > >
> > > > Hi folks,
> > > >
> > > > We have Apache Hadoop 3.1.2 released on Feb 2019.
> > > >
> > > > It's been more than 6 months passed and there're
> > > >
> > > > 246 fixes[1]. 2 blocker and 4 critical Issues [2]
> > > >
> > > > (As Wei-Chiu Chuang mentioned, HDFS-13596 will be another blocker)
> > > >
> > > >
> > > > I propose my plan to do a maintenance release of 3.1.3 in the next few
> > > > (one or two) weeks.
> > > >
> > > > Hadoop 3.1.3 release plan:
> > > >
> > > > Code Freezing Date: *25th August 2019 PDT*
> > > >
> > > > Release Date: *31th August 2019 PDT*
> > > >
> > > >
> > > > Please feel free to share your insights on this. Thanks!
> > > >
> > > >
> > > > [1] https://s.apache.org/zw8l5
> > > >
> > > > [2] https://s.apache.org/fjol5
> > > >
> > > >
> > > > BR,
> > > >
> > > > Zhankun
> > > >
> > >
> >

Re: [VOTE] Release Apache Hadoop 3.2.1 - RC0

2019-09-18 Thread Weiwei Yang

+1 (binding)

Downloaded tarball, setup a pseudo cluster manually
Verified basic HDFS operations, copy/view files
Verified basic YARN operations, run sample DS jobs
Verified basic YARN restful APIs, e.g cluster/nodes info etc
Set and verified YARN node-attributes, including CLI

Thanks
Weiwei
On Sep 18, 2019, 11:41 AM +0800, zhankun tang , wrote:
> +1 (non-binding).
> Installed and verified it by running several Spark job and DS jobs.
>
> BR,
> Zhankun
>
> On Wed, 18 Sep 2019 at 08:05, Naganarasimha Garla <
> naganarasimha...@apache.org> wrote:
>
> > Verified the source and the binary tar and the sha512 checksums
> > Installed and verified the basic hadoop operations (ran few MR tasks)
> >
> > +1.
> >
> > Thanks,
> > + Naga
> >
> > On Wed, Sep 18, 2019 at 1:32 AM Anil Sadineni 
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > On Tue, Sep 17, 2019 at 9:55 AM Santosh Marella 
> > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > On Wed, Sep 11, 2019 at 12:26 AM Rohith Sharma K S <
> > > > rohithsharm...@apache.org> wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > I have put together a release candidate (RC0) for Apache Hadoop
> > 3.2.1.
> > > > >
> > > > > The RC is available at:
> > > > > http://home.apache.org/~rohithsharmaks/hadoop-3.2.1-RC0/
> > > > >
> > > > > The RC tag in git is release-3.2.1-RC0:
> > > > > https://github.com/apache/hadoop/tree/release-3.2.1-RC0
> > > > >
> > > > >
> > > > > The maven artifacts are staged at
> > > > >
> > > https://repository.apache.org/content/repositories/orgapachehadoop-1226/
> > > > >
> > > > > You can find my public key at:
> > > > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> > > > >
> > > > > This vote will run for 7 days(5 weekdays), ending on 18th Sept at
> > 11:59
> > > > pm
> > > > > PST.
> > > > >
> > > > > I have done testing with a pseudo cluster and distributed shell job.
> > My
> > > > +1
> > > > > to start.
> > > > >
> > > > > Thanks & Regards
> > > > > Rohith Sharma K S
> > > > >
> > > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Sadineni
> > > Solutions Architect, Optlin Inc
> > > Ph: 571-438-1974 | www.optlin.com
> > >
> >

Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree

2019-09-17 Thread Weiwei Yang

+1 (binding)

Thanks
Weiwei

On Wed, Sep 18, 2019 at 6:35 AM Wangda Tan  wrote:

> +1 (binding).
>
> From my experiences of Submarine project, I think moving to a separate repo
> helps.
>
> - Wangda
>
> On Tue, Sep 17, 2019 at 11:41 AM Subru Krishnan  wrote:
>
> > +1 (binding).
> >
> > IIUC, there will not be an Ozone module in trunk anymore as that was my
> > only concern from the original discussion thread? IMHO, this should be
> the
> > default approach for new modules.
> >
> > On Tue, Sep 17, 2019 at 9:58 AM Salvatore LaMendola (BLOOMBERG/ 731 LEX)
> <
> > slamendo...@bloomberg.net> wrote:
> >
> > > +1
> > >
> > > From: e...@apache.org At: 09/17/19 05:48:32To:
> > hdfs-dev@hadoop.apache.org,
> > > mapreduce-...@hadoop.apache.org,  common-...@hadoop.apache.org,
> > > yarn-...@hadoop.apache.org
> > > Subject: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk
> > > source tree
> > >
> > >
> > > TLDR; I propose to move Ozone related code out from Hadoop trunk and
> > > store it in a separated *Hadoop* git repository apache/hadoop-ozone.git
> > >
> > >
> > > When Ozone was adopted as a new Hadoop subproject it was proposed[1] to
> > > be part of the source tree but with separated release cadence, mainly
> > > because it had the hadoop-trunk/SNAPSHOT as compile time dependency.
> > >
> > > During the last Ozone releases this dependency is removed to provide
> > > more stable releases. Instead of using the latest trunk/SNAPSHOT build
> > > from Hadoop, Ozone uses the latest stable Hadoop (3.2.0 as of now).
> > >
> > > As we have no more strict dependency between Hadoop trunk SNAPSHOT and
> > > Ozone trunk I propose to separate the two code base from each other
> with
> > > creating a new Hadoop git repository (apache/hadoop-ozone.git):
> > >
> > > With moving Ozone to a separated git repository:
> > >
> > >   * It would be easier to contribute and understand the build (as of
> now
> > > we always need `-f pom.ozone.xml` as a Maven parameter)
> > >   * It would be possible to adjust build process without breaking
> > > Hadoop/Ozone builds.
> > >   * It would be possible to use different Readme/.asf.yaml/github
> > > template for the Hadoop Ozone and core Hadoop. (For example the current
> > > github template [2] has a link to the contribution guideline [3]. Ozone
> > > has an extended version [4] from this guideline with additional
> > > information.)
> > >   * Testing would be more safe as it won't be possible to change core
> > > Hadoop and Hadoop Ozone in the same patch.
> > >   * It would be easier to cut branches for Hadoop releases (based on
> the
> > > original consensus, Ozone should be removed from all the release
> > > branches after creating relase branches from trunk)
> > >
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Marton
> > >
> > > [1]:
> > >
> > >
> >
> https://lists.apache.org/thread.html/c85e5263dcc0ca1d13cbbe3bcfb53236784a39111b8
> > > c353f60582eb4@%3Chdfs-dev.hadoop.apache.org%3E
> > > [2]:
> > >
> > >
> >
> https://github.com/apache/hadoop/blob/trunk/.github/pull_request_template.md
> > > [3]:
> > https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
> > > [4]:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute+to+Ozone
> > >
> > > -
> > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> > >
> > >
> > >
> >
>

Re: [VOTE] Move Submarine source code, documentation, etc. to a separate Apache Git repo

2019-08-26 Thread Weiwei Yang

+1

Thanks
Weiwei
On Aug 27, 2019, 9:40 AM +0800, Xun Liu , wrote:
> +1 (non-binding)
> This is the best news.
>
> Peter Bacsko  于2019年8月27日周二 上午4:59写道：
>
> > +1 (non-binding)
> >
> > On Sat, Aug 24, 2019 at 4:06 AM Wangda Tan  wrote:
> >
> > > Hi devs,
> > >
> > > This is a voting thread to move Submarine source code, documentation from
> > > Hadoop repo to a separate Apache Git repo. Which is based on discussions
> > of
> > >
> > >
> > https://lists.apache.org/thread.html/e49d60b2e0e021206e22bb2d430f4310019a8b29ee5020f3eea3bd95@%3Cyarn-dev.hadoop.apache.org%3E
> > >
> > > Contributors who have permissions to push to Hadoop Git repository will
> > > have permissions to push to the new Submarine repository.
> > >
> > > This voting thread will run for 7 days and will end at Aug 30th.
> > >
> > > Please let me know if you have any questions.
> > >
> > > Thanks,
> > > Wangda Tan
> > >
> >

Re: Aug Hadoop Community Meetup in China

2019-07-23 Thread Weiwei Yang

Hi Junping

Thanks. I would like to get a slot to talk about our new open source project: 
YuniKorn.

Thanks
Weiwei
On Jul 23, 2019, 5:08 PM +0800, 俊平堵 , wrote:
> Thanks for these positive feedbacks! The local community has voted the date 
> and location to be 8/10, Beijing. So please book your time ahead if you have 
> interest to join.
> I have gathered a few topics, and some candidate places for hosting this 
> meetup. If you would like to propose more topics, please nominate it here or 
> ping me before this weekend (7/28, CST time).
> Will update here when I have more to share. thx!
>
>
>
>
> <>
>
> <>
>
>
>
> Thanks,
>
> Junping
>
> > 俊平堵  于2019年7月18日周四 下午3:28写道：
> > > Hi, all!
> > > I am glad to let you know that we are organizing Hadoop Contributors 
> > > Meetup in China on Aug.
> > >
> > > This could be the first time hadoop community meetup in China and many 
> > > attendees are expected to come from big data pioneers, such as: Cloudera, 
> > > Tencent, Alibaba, Xiaomi, Didi, JD, Meituan, Toutiao, Sina, etc.
> > >
> > > We're still working out the details, such as dates, contents and 
> > > locations. Here is a quick survey: https://www.surveymonkey.com/r/Y99RT3W 
> > > where you can vote your prefer dates and locations if you would like to 
> > > attend - the survey will end in July. 21. 12PM China Standard Time, and 
> > > result will go public in next day.
> > >
> > > Also, please feel free to reach out to me if you have a topic to propose 
> > > for the meetup.  Will send out an update later with more details when I 
> > > get more to share. Thanks!
> > >
> > > Cheers,
> > >
> > > Junping

Re: Aug Hadoop Community Meetup in China

2019-07-18 Thread Weiwei Yang

Hi Junping

Thanks for organizing this event !
Just finished the survey, looking forward to meet folks in the coming meet up.

Thanks
Weiwei
On Jul 18, 2019, 3:28 PM +0800, 俊平堵 , wrote:
> Hi, all!
>
> I am glad to let you know that we are organizing
> Hadoop Contributors Meetup in China on Aug.
>
>
> This could be the first time hadoop community meetup in China and many
> attendees are expected to come from big data pioneers, such as: Cloudera,
> Tencent, Alibaba, Xiaomi, Didi, JD, Meituan, Toutiao, Sina, etc.
>
>
> We're still working out the details, such as dates, contents and locations.
> Here is a quick survey: https://www.surveymonkey.com/r/Y99RT3W where you
> can vote your prefer dates and locations if you would like to attend - the
> survey will end in July. 21. 12PM China Standard Time, and result will go
> public in next day.
>
>
> Also, please feel free to reach out to me if you have a topic to propose
> for the meetup. Will send out an update later with more details when I get
> more to share. Thanks!
>
>
> Cheers,
>
>
> Junping

Re: [VOTE] Force "squash and merge" option for PR merge on github UI

2019-07-16 Thread Weiwei Yang

Thanks Marton, +1 on this.

Weiwei

On Jul 17, 2019, 2:07 PM +0800, Elek, Marton , wrote:
> Hi,
>
> Github UI (ui!) helps to merge Pull Requests to the proposed branch.
> There are three different ways to do it [1]:
>
> 1. Keep all the different commits from the PR branch and create one
> additional merge commit ("Create a merge commit")
>
> 2. Squash all the commits and commit the change as one patch ("Squash
> and merge")
>
> 3. Keep all the different commits from the PR branch but rebase, merge
> commit will be missing ("Rebase and merge")
>
>
>
> As only the option 2 is compatible with the existing development
> practices of Hadoop (1 issue = 1 patch = 1 commit), I call for a lazy
> consensus vote: If no objections withing 3 days, I will ask INFRA to
> disable the options 1 and 3 to make the process less error prone.
>
> Please let me know, what do you think,
>
> Thanks a lot
> Marton
>
> ps: Personally I prefer to merge from local as it enables to sign the
> commits and do a final build before push. But this is a different story,
> this proposal is only about removing the options which are obviously
> risky...
>
> ps2: You can always do any kind of merge / commits from CLI, for example
> to merge a feature branch together with keeping the history.
>
> [1]:
> https://help.github.com/en/articles/merging-a-pull-request#merging-a-pull-request-on-github
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>

[ANNOUNCE] New Apache Hadoop Committer - Tao Yang

2019-07-15 Thread Weiwei Yang

Hi Dear Apache Hadoop Community

It's my pleasure to announce that Tao Yang has been elected as an Apache
Hadoop committer, this is to recognize his contributions to Apache Hadoop
YARN project.

Congratulations and welcome on board!

Weiwei
(On behalf of the Apache Hadoop PMC)

[jira] [Reopened] (HDFS-12748) NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY

2019-07-08 Thread Weiwei Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened HDFS-12748:


> NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY
> 
>
> Key: HDFS-12748
> URL: https://issues.apache.org/jira/browse/HDFS-12748
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Jiandan Yang 
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-12748-branch-3.1.01.patch, HDFS-12748.001.patch, 
> HDFS-12748.002.patch, HDFS-12748.003.patch, HDFS-12748.004.patch, 
> HDFS-12748.005.patch
>
>
> In our production environment, the standby NN often do fullgc, through mat we 
> found the largest object is FileSystem$Cache, which contains 7,844,890 
> DistributedFileSystem.
> By view hierarchy of method FileSystem.get() , I found only 
> NamenodeWebHdfsMethods#get call FileSystem.get(). I don't know why creating 
> different DistributedFileSystem every time instead of get a FileSystem from 
> cache.
> {code:java}
> case GETHOMEDIRECTORY: {
>   final String js = JsonUtil.toJsonString("Path",
>   FileSystem.get(conf != null ? conf : new Configuration())
>   .getHomeDirectory().toUri().getPath());
>   return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
> }
> {code}
> When we close FileSystem when GETHOMEDIRECTORY, NN don't do fullgc.
> {code:java}
> case GETHOMEDIRECTORY: {
>   FileSystem fs = null;
>   try {
> fs = FileSystem.get(conf != null ? conf : new Configuration());
> final String js = JsonUtil.toJsonString("Path",
> fs.getHomeDirectory().toUri().getPath());
> return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
>   } finally {
> if (fs != null) {
>   fs.close();
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Agenda & More Information about Hadoop Community Meetup @ Palo Alto, June 26

2019-06-25 Thread Weiwei Yang

Thanks Wangda.
Will this event be recorded? It will be extremely helpful for people who are 
unable to join to catch up.

Thanks
Weiwei
On Jun 26, 2019, 4:12 AM +0800, Wangda Tan , wrote:
A friendly reminder,

The meetup will take place tomorrow at 9:00 AM PDT to 4:00 PM PDT.

The address is: 395 Page Mill Rd, Palo Alto, CA 94306
We’ll be in the Bigtop conference room on the 1st floor. Go left after
coming through the main entrance, and it will be on the right.

Zoom: https://cloudera.zoom.us/j/606607666

Please let me know if you have any questions. If you haven't RSVP yet,
please go ahead and RSVP so we can better prepare food, seat, etc.

Thanks,
Wangda

On Wed, Jun 19, 2019 at 4:49 PM Wangda Tan  wrote:

Hi All,

I want to let you know that we have confirmed most of the agenda for
Hadoop Community Meetup. It will be a whole day event.

Agenda & Dial-In info because see below, *please RSVP
at https://www.meetup.com/Hadoop-Contributors/events/262055924/
*

Huge thanks to Daniel Templeton, Wei-Chiu Chuang, Christina Vu for helping
with organizing and logistics.

*Please help to promote meetup information on Twitter, LinkedIn, etc.
Appreciated! *

Best,
Wangda

























































*AM:9:00: Arrival and check-in--9:30 -
10:15:-Talk: Hadoop storage in cloud-native
environmentsAbstract: Hadoop is a mature storage system but designed years
before the cloud-native movement. Kubernetes and other cloud-native tools
are emerging solutions for containerized environments but sometimes they
require different approaches.In this presentation we would like to share
our experiences to run Apache Hadoop Ozone in Kubernetes and the connection
point to other cloud-native ecosystem elements. We will compare the
benefits and drawbacks to use Kubernetes and Hadoop storage together and
show our current achievements and future plans.Speaker: Marton Elek
(Cloudera)10:20 - 11:00:--Talk: Selective Wire Encryption In
HDFSAbstract: Wire data encryption is a key component of the Hadoop
Distributed File System (HDFS). However, such encryption enforcement comes
in as an all-or-nothing feature. In our use case at LinkedIn, we would like
to selectively expose fast unencrypted access to fully managed internal
clients, which can be trusted, while only expose encrypted access to
clients outside of the trusted circle with higher security risks. That way
we minimize performance overhead for trusted internal clients while still
securing data from potential outside threats. Our design extends HDFS
NameNode to run on multiple ports, connecting to different NameNode ports
would end up with different levels of encryption protection. This
protection then gets enforced for both NameNode RPC and the subsequent data
transfers to/from DataNode. This approach comes with minimum operational
and performance overhead.Speaker: Konstantin Shvachko (LinkedIn), Chen
Liang (LinkedIn)11:10 - 11:55:-Talk: YuniKorn: Next Generation
Scheduling for YARN and K8sAbstract: We will talk about our open source
work - YuniKorn scheduler project (Y for YARN, K for K8s, uni- for Unified)
brings long-wanted features such as hierarchical queues, fairness between
users/jobs/queues, preemption to Kubernetes; and it brings service
scheduling enhancements to YARN. Any improvements to this scheduler can
benefit both Kubernetes and YARN community.Speaker: Wangda Tan
(Cloudera)PM:12:00 - 12:55 Lunch Break (Provided by
Cloudera)1:00 -
1:25---Talk: Yarn Efficiency at UberAbstract: We will present the
work done at Uber to improve YARN cluster utilization and job SOA with
elastic resource management, low compute workload on passive datacenter,
preemption, larger container, etc. We will also go through YARN upgrade in
order to adopt new features and talk about the challenges.Speaker: Aihua Xu
(Uber), Prashant Golash (Uber)1:30 - 2:10 One more
talk-2:20 - 4:00---BoF sessions &
Breakout Sessions & Group discussions: Talk about items like JDK 11
support, next releases (2.10.0, 3.3.0, etc.), Hadoop on Cloud, etc.4:00:
Reception provided by
Cloudera.==Join Zoom
Meetinghttps://cloudera.zoom.us/j/116816195
*

Re: [VOTE] Release Apache Hadoop Submarine 0.2.0 - RC0

2019-06-21 Thread Weiwei Yang

+1 (binding)

Thanks
Weiwei
On Jun 21, 2019, 5:33 AM +0800, Wangda Tan , wrote:
+1 Binding. Tested in local cluster and reviewed docs.

Thanks!

On Wed, Jun 19, 2019 at 3:20 AM Sunil Govindan  wrote:

+1 binding

- tested in local cluster.
- tried tony run time as well
- doc seems fine now.

- Sunil


On Thu, Jun 6, 2019 at 6:53 PM Zhankun Tang  wrote:

Hi folks,

Thanks to all of you who have contributed in this submarine 0.2.0
release.
We now have a release candidate (RC0) for Apache Hadoop Submarine 0.2.0.


The Artifacts for this Submarine-0.2.0 RC0 are available here:

https://home.apache.org/~ztang/submarine-0.2.0-rc0/


It's RC tag in git is "submarine-0.2.0-RC0".



The maven artifacts are available via repository.apache.org at
https://repository.apache.org/content/repositories/orgapachehadoop-1221/


This vote will run 7 days (5 weekdays), ending on 13th June at 11:59 pm
PST.



The highlights of this release.

1. Linkedin's TonY runtime support in Submarine

2. PyTorch enabled in Submarine with both YARN native service runtime
(single node) and TonY runtime

3. Support uber jar of Submarine to submit the job

4. The YAML file to describe a job

5. The Notebook support (by Apache Zeppelin Submarine interpreter)


Thanks to Sunil, Wangda, Xun, Zac, Keqiu, Szilard for helping me in
preparing the release.

I have done a few testing with my pseudo cluster. My +1 (non-binding) to
start.



Regards,
Zhankun

Re: [VOTE] Propose to start new Hadoop sub project "submarine"

2019-02-04 Thread Weiwei Yang

+1

Weiwei

--
Weiwei
On Feb 5, 2019, 2:11 AM +0800, Steve Loughran , wrote:
> +1, binding
>
> > On 1 Feb 2019, at 22:15, Wangda Tan  wrote:
> >
> > Hi all,
> >
> > According to positive feedbacks from the thread [1]
> >
> > This is vote thread to start a new subproject named "hadoop-submarine"
> > which follows the release process already established for ozone.
> >
> > The vote runs for usual 7 days, which ends at Feb 8th 5 PM PDT.
> >
> > Thanks,
> > Wangda Tan
> >
> > [1]
> > https://lists.apache.org/thread.html/f864461eb188bd12859d51b0098ec38942c4429aae7e4d001a633d96@%3Cyarn-dev.hadoop.apache.org%3E
>
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>

Re: [DISCUSS] Making submarine to different release model like Ozone

2019-01-31 Thread Weiwei Yang

Thanks for proposing this Wangda, my +1 as well.
It is amazing to see the progress made in Submarine last year, the community 
grows fast and quiet collaborative. I can see the reasons to get it release 
faster in its own cycle. And at the same time, the Ozone way works very well.

—
Weiwei
On Feb 1, 2019, 10:49 AM +0800, Xun Liu , wrote:
> +1
>
> Hello everyone,
>
> I am Xun Liu, the head of the machine learning team at Netease Research 
> Institute. I quite agree with Wangda.
>
> Our team is very grateful for getting Submarine machine learning engine from 
> the community.
> We are heavy users of Submarine.
> Because Submarine fits into the direction of our big data team's hadoop 
> technology stack,
> It avoids the needs to increase the manpower investment in learning other 
> container scheduling systems.
> The important thing is that we can use a common YARN cluster to run machine 
> learning,
> which makes the utilization of server resources more efficient, and reserves 
> a lot of human and material resources in our previous years.
>
> Our team have finished the test and deployment of the Submarine and will 
> provide the service to our e-commerce department (http://www.kaola.com/) 
> shortly.
>
> We also plan to provides the Submarine engine in our existing YARN cluster in 
> the next six months.
> Because we have a lot of product departments need to use machine learning 
> services,
> for example:
> 1) Game department (http://game.163.com/) needs AI battle training,
> 2) News department (http://www.163.com) needs news recommendation,
> 3) Mailbox department (http://www.163.com) requires anti-spam and illegal 
> detection,
> 4) Music department (https://music.163.com/) requires music recommendation,
> 5) Education department (http://www.youdao.com) requires voice recognition,
> 6) Massive Open Online Courses (https://open.163.com/) requires multilingual 
> translation and so on.
>
> If Submarine can be released independently like Ozone, it will help us 
> quickly get the latest features and improvements, and it will be great 
> helpful to our team and users.
>
> Thanks hadoop Community!
>
>
> > 在 2019年2月1日，上午2:53，Wangda Tan  写道：
> >
> > Hi devs,
> >
> > Since we started submarine-related effort last year, we received a lot of
> > feedbacks, several companies (such as Netease, China Mobile, etc.) are
> > trying to deploy Submarine to their Hadoop cluster along with big data
> > workloads. Linkedin also has big interests to contribute a Submarine TonY (
> > https://github.com/linkedin/TonY) runtime to allow users to use the same
> > interface.
> >
> > From what I can see, there're several issues of putting Submarine under
> > yarn-applications directory and have same release cycle with Hadoop:
> >
> > 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> > 2019. Because of non-predictable blockers and security issues, it got
> > delayed a lot. We need to iterate submarine fast at this point.
> >
> > 2) We also see a lot of requirements to use Submarine on older Hadoop
> > releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> > short time, but the requirement to run deep learning is urgent to them. We
> > should decouple Submarine from Hadoop version.
> >
> > And why we wanna to keep it within Hadoop? First, Submarine included some
> > innovation parts such as enhancements of user experiences for YARN
> > services/containerization support which we can add it back to Hadoop later
> > to address common requirements. In addition to that, we have a big overlap
> > in the community developing and using it.
> >
> > There're several proposals we have went through during Ozone merge to trunk
> > discussion:
> > https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3ccahfhakh6_m3yldf5a2kq8+w-5fbvx5ahfgs-x1vajw8gmnz...@mail.gmail.com%3E
> >
> > I propose to adopt Ozone model: which is the same master branch, different
> > release cycle, and different release branch. It is a great example to show
> > agile release we can do (2 Ozone releases after Oct 2018) with less
> > overhead to setup CI, projects, etc.
> >
> > *Links:*
> > - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> > - Design doc
> > 
> > - User doc
> > 
> > (3.2.0
> > release)
> > - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> > ,
> > (Chinese Translation: Link )
> > - Talks: Strata Data Conf NY
> > 
> >
> > Thoughts?
> >
> > Thanks,
> > Wangda Tan
>
>
>
> -
> To unsubscribe, e-mail: hdfs-dev-un

Re: [VOTE] Release Apache Hadoop 3.2.0 - RC1

2019-01-10 Thread Weiwei Yang

Hi Sunil

I tried to verify site docs, but it seems
"hadoop-site/hadoop-project/index.html" is still the old content. Should
that be updated before creating the RCs?
And another minor thing, looks like it now requires to install
"python-dateutil" before building the docs, we may need to update BUILD.txt
accordingly.

Thanks
--
Weiwei

On Thu, Jan 10, 2019 at 1:03 PM Wilfred Spiegelenburg
 wrote:

> +1 (non binding)
>
> - build from source on MacOSX 10.14.2, 1.8.0u181
> - successful native build on Ubuntu 16.04.3
> - confirmed the checksum and signature
> - deployed a single node cluster  (openjdk 1.8u191 / centos 7.5)
> - uploaded the MR framework
> - configured YARN with the FS
> - ran multiple MR jobs
>
> > On 8 Jan 2019, at 22:42, Sunil G  wrote:
> >
> > Hi folks,
> >
> >
> > Thanks to all of you who helped in this release [1] and for helping to
> vote
> > for RC0. I have created second release candidate (RC1) for Apache Hadoop
> > 3.2.0.
> >
> >
> > Artifacts for this RC are available here:
> >
> > http://home.apache.org/~sunilg/hadoop-3.2.0-RC1/
> >
> >
> > RC tag in git is release-3.2.0-RC1.
> >
> >
> >
> > The maven artifacts are available via repository.apache.org at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1178/
> >
> >
> > This vote will run 7 days (5 weekdays), ending on 14th Jan at 11:59 pm
> PST.
> >
> >
> >
> > 3.2.0 contains 1092 [2] fixed JIRA issues since 3.1.0. Below feature
> > additions
> >
> > are the highlights of this release.
> >
> > 1. Node Attributes Support in YARN
> >
> > 2. Hadoop Submarine project for running Deep Learning workloads on YARN
> >
> > 3. Support service upgrade via YARN Service API and CLI
> >
> > 4. HDFS Storage Policy Satisfier
> >
> > 5. Support Windows Azure Storage - Blob file system in Hadoop
> >
> > 6. Phase 3 improvements for S3Guard and Phase 5 improvements S3a
> >
> > 7. Improvements in Router-based HDFS federation
> >
> >
> >
> > Thanks to Wangda, Vinod, Marton for helping me in preparing the release.
> >
> > I have done few testing with my pseudo cluster. My +1 to start.
> >
> >
> >
> > Regards,
> >
> > Sunil
> >
> >
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/68c1745dcb65602aecce6f7e6b7f0af3d974b1bf0048e7823e58b06f@%3Cyarn-dev.hadoop.apache.org%3E
> >
> > [2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.2.0)
> > AND fixVersion not in (3.1.0, 3.0.0, 3.0.0-beta1) AND status = Resolved
> > ORDER BY fixVersion ASC
>
>
> Wilfred Spiegelenburg | Software Engineer
> cloudera.com <https://www.cloudera.com/>
>
>
>
>
>
>
>
>

-- 
Weiwei Yang

Re: [DISCUSS] Move to gitbox

2018-12-10 Thread Weiwei Yang

+1

On Tue, Dec 11, 2018 at 10:51 AM Anu Engineer 
wrote:

> +1
> --Anu
>
>
> On 12/10/18, 6:38 PM, "Vinayakumar B"  wrote:
>
> +1
>
> -Vinay
>
> On Mon, 10 Dec 2018, 1:22 pm Elek, Marton 
> >
> > Thanks Akira,
> >
> > +1 (non-binding)
> >
> > I think it's better to do it now at a planned date.
> >
> > If I understood well the only bigger task here is to update all the
> > jenkins jobs. (I am happy to help/contribute what I can do)
> >
> >
> > Marton
> >
> > On 12/8/18 6:25 AM, Akira Ajisaka wrote:
> > > Hi all,
> > >
> > > Apache Hadoop git repository is in git-wip-us server and it will be
> > > decommissioned.
> > > If there are no objection, I'll file a JIRA ticket with INFRA to
> > > migrate to https://gitbox.apache.org/ and update documentation.
> > >
> > > According to ASF infra team, the timeframe is as follows:
> > >
> > >> - December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
> > relocation
> > >> - January 9th -> February 6th: Mandated (coordinated) relocation
> > >> - February 7th: All remaining repositories are mass migrated.
> > >> This timeline may change to accommodate various scenarios.
> > >
> > > If we got consensus by January 9th, I can file a ticket with INFRA
> and
> > > migrate it.
> > > Even if we cannot got consensus, the repository will be migrated by
> > > February 7th.
> > >
> > > Regards,
> > > Akira
> > >
> > >
> -
> > > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> > >
> >
> > ---------
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
> >
>
>
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>


-- 
Weiwei Yang

Re: [Vote] Merge discussion for Node attribute support feature YARN-3409

2018-09-09 Thread Weiwei Yang

+1 for the merge

On Mon, Sep 10, 2018 at 12:06 PM Rohith Sharma K S <
rohithsharm...@apache.org> wrote:

> +1 for merge
>
> -Rohith Sharma K S
>
> On Wed, 5 Sep 2018 at 18:01, Naganarasimha Garla <
> naganarasimha...@apache.org> wrote:
>
> > Hi All,
> >  Thanks for feedback folks, based on the positive response
> starting
> > a Vote thread for merging YARN-3409 to master.
> >
> > Regards,
> > + Naga & Sunil
> >
> > On Wed, 5 Sep 2018 2:51 am Wangda Tan,  wrote:
> >
> > > +1 for the merge, it gonna be a great addition to 3.2.0 release. Thanks
> > to
> > > everybody for pushing this feature to complete.
> > >
> > > Best,
> > > Wangda
> > >
> > > On Tue, Sep 4, 2018 at 8:25 AM Bibinchundatt <
> bibin.chund...@huawei.com>
> > > wrote:
> > >
> > >> +1 for merge. Fetaure would be a good addition to 3.2 release.
> > >>
> > >> --
> > >> Bibin A Chundatt
> > >> M: +91-9742095715
> > >> E: bibin.chund...@huawei.com<mailto:bibin.chund...@huawei.com>
> > >> 2012实验室-印研IT&Cloud BU分部
> > >> 2012 Laboratories-IT&Cloud BU Branch Dept.
> > >> From:Naganarasimha Garla
> > >> To:common-...@hadoop.apache.org,Hdfs-dev,yarn-...@hadoop.apache.org,
> > >> mapreduce-...@hadoop.apache.org,
> > >> Date:2018-08-29 20:00:44
> > >> Subject:[Discuss] Merge discussion for Node attribute support feature
> > >> YARN-3409
> > >>
> > >> Hi All,
> > >>
> > >> We would like to hear your thoughts on merging “Node Attributes
> Support
> > in
> > >> YARN” branch (YARN-3409) [2] into trunk in a few weeks. The goal is to
> > get
> > >> it in for HADOOP 3.2.
> > >>
> > >> *Major work happened in this branch*
> > >>
> > >> YARN-6858. Attribute Manager to store and provide node attributes in
> RM
> > >> YARN-7871. Support Node attributes reporting from NM to RM(
> distributed
> > >> node attributes)
> > >> YARN-7863. Modify placement constraints to support node attributes
> > >> YARN-7875. Node Attribute store for storing and recovering attributes
> > >>
> > >> *Detailed Design:*
> > >>
> > >> Please refer [1] for detailed design document.
> > >>
> > >> *Testing Efforts:*
> > >>
> > >> We did detailed tests for the feature in the last few weeks.
> > >> This feature will be enabled only when Node Attributes constraints are
> > >> specified through SchedulingRequest from AM.
> > >> Manager implementation will help to store and recover Node Attributes.
> > >> This
> > >> works with existing placement constraints.
> > >>
> > >> *Regarding to API stability:*
> > >>
> > >> All newly added @Public APIs are @Unstable.
> > >>
> > >> Documentation jira [3] could help to provide detailed configuration
> > >> details. This feature works from end-to-end and we tested this in our
> > >> local
> > >> cluster. Branch code is run against trunk and tracked via [4].
> > >>
> > >> We would love to get your thoughts before opening a voting thread.
> > >>
> > >> Special thanks to a team of folks who worked hard and contributed
> > towards
> > >> this efforts including design discussion / patch / reviews, etc.:
> Weiwei
> > >> Yang, Bibin Chundatt, Wangda Tan, Vinod Kumar Vavilappali,
> Konstantinos
> > >> Karanasos, Arun Suresh, Varun Saxena, Devaraj Kavali, Lei Guo, Chong
> > Chen.
> > >>
> > >> [1] :
> > >>
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12937633/Node-Attributes-Requirements-Design-doc_v2.pdf
> > >> [2] : https://issues.apache.org/jira/browse/YARN-3409
> > >> [3] : https://issues.apache.org/jira/browse/YARN-7865
> > >> [4] : https://issues.apache.org/jira/browse/YARN-8718
> > >>
> > >> Thanks,
> > >> + Naga & Sunil Govindan
> > >>
> > >
> >
>


-- 
Weiwei Yang

Re: [Discuss] Merge discussion for Node attribute support feature YARN-3409

2018-09-04 Thread Weiwei Yang

Hi Naga

+1 for the merge. We need it for 3.2.0.
This brings a lot of value when it works with placement constraints APIs,
it will be very important for long-running jobs and service type tasks.

Thanks!
--
Weiwei

On Wed, Aug 29, 2018 at 10:30 PM Naganarasimha Garla <
naganarasimha...@apache.org> wrote:

> Hi All,
>
> We would like to hear your thoughts on merging “Node Attributes Support in
> YARN” branch (YARN-3409) [2] into trunk in a few weeks. The goal is to get
> it in for HADOOP 3.2.
>
> *Major work happened in this branch*
>
> YARN-6858. Attribute Manager to store and provide node attributes in RM
> YARN-7871. Support Node attributes reporting from NM to RM( distributed
> node attributes)
> YARN-7863. Modify placement constraints to support node attributes
> YARN-7875. Node Attribute store for storing and recovering attributes
>
> *Detailed Design:*
>
> Please refer [1] for detailed design document.
>
> *Testing Efforts:*
>
> We did detailed tests for the feature in the last few weeks.
> This feature will be enabled only when Node Attributes constraints are
> specified through SchedulingRequest from AM.
> Manager implementation will help to store and recover Node Attributes. This
> works with existing placement constraints.
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we tested this in our local
> cluster. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Weiwei
> Yang, Bibin Chundatt, Wangda Tan, Vinod Kumar Vavilappali, Konstantinos
> Karanasos, Arun Suresh, Varun Saxena, Devaraj Kavali, Lei Guo, Chong Chen.
>
> [1] :
>
> https://issues.apache.org/jira/secure/attachment/12937633/Node-Attributes-Requirements-Design-doc_v2.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-3409
> [3] : https://issues.apache.org/jira/browse/YARN-7865
> [4] : https://issues.apache.org/jira/browse/YARN-8718
>
> Thanks,
> + Naga & Sunil Govindan
>


-- 
Weiwei Yang

Re: HADOOP-14163 proposal for new hadoop.apache.org

2018-09-02 Thread Weiwei Yang

io/hadoop-site-proposal/
> > and
> > >>> https://issues.apache.org/jira/browse/HADOOP-14163
> > >>>
> > >>> Please let me know what you think about it.
> > >>>
> > >>>
> > >>> Longer version:
> > >>>
> > >>> This thread started long time ago to use a more modern hadoop
> site:
> > >>>
> > >>> Goals were:
> > >>>
> > >>> 1. To make it easier to manage it (the release entries could be
> > >>> created by a script as part of the release process)
> > >>> 2. To use a better look-and-feel
> > >>> 3. Move it out from svn to git
> > >>>
> > >>> I proposed to:
> > >>>
> > >>> 1. Move the existing site to git and generate it with hugo (which
> > is
> > >>> a single, standalone binary)
> > >>> 2. Move both the rendered and source branches to git.
> > >>> 3. (Create a jenkins job to generate the site automatically)
> > >>>
> > >>> NOTE: this is just about forrest based hadoop.apache.org, NOT
> > about
> > >>> the documentation which is generated by mvn-site (as before)
> > >>>
> > >>>
> > >>> I got multiple valuable feedback and I improved the proposed site
> > >>> according to the comments. Allen had some concerns about the used
> > >>> technologies (hugo vs. mvn-site) and I answered all the questions
> > why
> > >>> I think mvn-site is the best for documentation and hugo is best
> > for
> > >>> generating site.
> > >>>
> > >>>
> > >>> I would like to finish this effort/jira: I would like to start a
> > >>> discussion about using this proposed version and approach as a
> new
> > >>> site of Apache Hadoop. Please let me know what you think.
> > >>>
> > >>>
> > >>> Thanks a lot,
> > >>> Marton
> > >>>
> > >>>
> > -
> > >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > >>> For additional commands, e-mail:
> common-dev-h...@hadoop.apache.org
> > >>>
> > >>
> > >>
> > >>
> > -
> > >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > >> For additional commands, e-mail:
> common-dev-h...@hadoop.apache.org
> > >>
> > >
> > >
> -
> > > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> > -
> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> >
> >
> >
>


-- 
Weiwei Yang

Re: Apache Hadoop 3.1.1 release plan

2018-05-10 Thread Weiwei Yang

Hi Wangda

I would propose to have https://issues.apache.org/jira/browse/YARN-8015 
included in 3.1.1.

Once this is done, we get both intra and inter placement constraint covered so 
users could start to explore this feature. Otherwise the functionality is 
pretty limited. It has been Patch Available for a while, I just promoted it 
targeting to 3.1.1. Hope that makes sense.

Thanks!

--
Weiwei

On 11 May 2018, 9:02 AM +0800, Wangda Tan , wrote:
Hi all,

As we previously proposed RC time (May 1st), we want to release 3.1.1
sooner if possible. As of now, 3.1.1 has 187 fixes [1] on top of 3.1.0, and
there're 10 open blockers/criticals which target to 3.1.1 [2]. I just
posted comments to these open criticals/blockers ticket owners asking about
statuses.

If everybody agrees, I propose start code freeze of branch-3.1 from Sat PDT
time this week, only blockers/criticals can be committed to branch-3.1. To
avoid the burden of committers, I want to delay cutting branch-3.1.1 as
late as possible. If you have any major/minors (For severe issues please
update priorities) tickets want to go to 3.1.1, please reply this email
thread and we can look at them and make a call together.

Please feel free to share your comments and suggestions.

Thanks,
Wangda

[1] project in (YARN, "Hadoop HDFS", "Hadoop Common", "Hadoop Map/Reduce")
AND status = Resolved AND fixVersion = 3.1.1
[2] project in (YARN, HADOOP, MAPREDUCE, "Hadoop Development Tools") AND
priority in (Blocker, Critical) AND resolution = Unresolved AND "Target
Version/s" = 3.1.1 ORDER BY priority DESC


On Thu, May 10, 2018 at 5:48 PM, Wangda Tan  wrote:

Thanks Brahma/Sunil,

For YARN-8265, it is a too big change for 3.1.1, I just removed 3.1.1 from
target version.
For YARN-8236, it is a severe issue and I think it is close to finish.



On Thu, May 10, 2018 at 3:08 AM, Sunil G  wrote:


Thanks Brahma.
Yes, Billie is reviewing YARN-8265 and I am helping in YARN-8236.

- Sunil


On Thu, May 10, 2018 at 2:25 PM Brahma Reddy Battula <
brahmareddy.batt...@huawei.com> wrote:

Thanks Wangda Tan for driving the 3.1.1 release.Yes,This can be better
addition to 3.1 line release for improving quality.

Looks only following two are pending which are in review state. Hope you
are monitoring these two.

https://issues.apache.org/jira/browse/YARN-8265
https://issues.apache.org/jira/browse/YARN-8236



Note : https://issues.apache.org/jira/browse/YARN-8247==> committed
branch-3.1


-Original Message-
From: Wangda Tan [mailto:wheele...@gmail.com]
Sent: 19 April 2018 17:49
To: Hadoop Common ;
mapreduce-...@hadoop.apache.org; Hdfs-dev ;
yarn-...@hadoop.apache.org
Subject: Apache Hadoop 3.1.1 release plan

Hi, All

We have released Apache Hadoop 3.1.0 on Apr 06. To further improve the
quality of the release, we plan to release 3.1.1 at May 06. The focus of
3.1.1 will be fixing blockers / critical bugs and other enhancements. So
far there are 100 JIRAs [1] have fix version marked to 3.1.1.

We plan to cut branch-3.1.1 on May 01 and vote for RC on the same day.

Please feel free to share your insights.

Thanks,
Wangda Tan

[1] project in (YARN, "Hadoop HDFS", "Hadoop Common", "Hadoop
Map/Reduce") AND fixVersion = 3.1.1

Re: [VOTE] Merge HDDS (HDFS-7240) code into trunk

2018-04-16 Thread Weiwei Yang

+1

--
Weiwei

On 17 Apr 2018, 8:59 AM +0800, Jitendra Pandey , 
wrote:

dopt
HDDS/Ozone as a sub-project of Hadoop, here is the formal vote for code merge.

Re: [VOTE] Release Apache Hadoop 3.1.0 (RC1)

2018-04-01 Thread Weiwei Yang

Hi Wangda

+1 (non-binding)

- Smoke tests with teragen/terasort/DS jobs
- Various of metrics, UI displays validation, compatible tests
- Tested GPU resource-type
- Verified RM fail-over, app-recovery
- Verified 2 threads async-scheduling
- Enabled placement constraints, tested affinity/anti-affinity allocations
- SLS tests

--
Weiwei

On 2 Apr 2018, 1:13 PM +0800, Brahma Reddy Battula , wrote:
Wangda thanks for driving this.

+1(binding)

--Built from source
--Installed HA cluster
--Verified Basic Shell commands
--Ran Sample Jobs
--Browsed the UI's.


On Fri, Mar 30, 2018 at 9:45 AM, Wangda Tan  wrote:

Hi folks,

Thanks to the many who helped with this release since Dec 2017 [1]. We've
created RC1 for Apache Hadoop 3.1.0. The artifacts are available here:

http://people.apache.org/~wangda/hadoop-3.1.0-RC1

The RC tag in git is release-3.1.0-RC1. Last git commit SHA is
16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d

The maven artifacts are available via repository.apache.org at
https://repository.apache.org/content/repositories/orgapachehadoop-1090/
This vote will run 5 days, ending on Apr 3 at 11:59 pm Pacific.

3.1.0 contains 766 [2] fixed JIRA issues since 3.0.0. Notable additions
include the first class GPU/FPGA support on YARN, Native services, Support
rich placement constraints in YARN, S3-related enhancements, allow HDFS
block replicas to be provided by an external storage system, etc.

For 3.1.0 RC0 vote discussion, please see [3].

We’d like to use this as a starting release for 3.1.x [1], depending on how
it goes, get it stabilized and potentially use a 3.1.1 in several weeks as
the stable release.

We have done testing with a pseudo cluster:
- Ran distributed job.
- GPU scheduling/isolation.
- Placement constraints (intra-application anti-affinity) by using
distributed shell.

My +1 to start.

Best,
Wangda/Vinod

[1]
https://lists.apache.org/thread.html/b3fb3b6da8b6357a68513a6dfd104b
c9e19e559aedc5ebedb4ca08c8@%3Cyarn-dev.hadoop.apache.org%3E
[2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.1.0)
AND fixVersion not in (3.0.0, 3.0.0-beta1) AND status = Resolved ORDER BY
fixVersion ASC
[3]
https://lists.apache.org/thread.html/b3a7dc075b7329fd660f65b48237d7
2d4061f26f83547e41d0983ea6@%3Cyarn-dev.hadoop.apache.org%3E

Re: [VOTE] Release Apache Hadoop 3.1.0 (RC0)

2018-03-29 Thread Weiwei Yang

Hi Wangda

While testing the build, we found a bug
https://issues.apache.org/jira/browse/YARN-8085, it might cause NPE during RM
fail-over. I think that needs to be included in 3.1.0.

We have run some other tests, and they look good so far. This includes
- Some basic teragen/terasort/DS jobs
- Various of metrics, UI displays, compatible tests
- Creating queues, different queue configurations
- Verified RM fail-over, app-recovery
- Enabled async scheduling
- Enabled placement constraints
- SLS tests

Hope it helps.
Thanks

--
Weiwei

On 29 Mar 2018, 12:34 PM +0800, Ajay Kumar , wrote:
+1 (non-binding)

- verified binary checksum
- built from source and setup 4 node cluster
- run basic hdfs command
- run wordcount, pi & TestDFSIO (read/write)

Ajay

On 3/28/18, 5:45 PM, "Jonathan Hung" wrote:

Hi Wangda, thanks for handling this release.

+1 (non-binding)

- verified binary checksum
- launched single node RM
- verified refreshQueues functionality
- verified capacity scheduler conf mutation disabled in this case
- verified capacity scheduler conf mutation with leveldb storage
- verified refreshQueues mutation is disabled in this case

Jonathan Hung

On Thu, Mar 22, 2018 at 9:10 AM, Wangda Tan wrote:

Thanks @Bharat for the quick check, the previously staged repository has
some issues. I re-deployed jars to nexus.

Here's the new repo (1087)

https://repository.apache.org/content/repositories/orgapachehadoop-1087/

Other artifacts remain same, no additional code changes.

On Wed, Mar 21, 2018 at 11:54 PM, Bharat Viswanadham <
bviswanad...@hortonworks.com> wrote:

Hi Wangda,
Maven Artifact repositories is not having all Hadoop jars. (It is missing
many like hadoop-hdfs, hadoop-client etc.,)
https://repository.apache.org/content/repositories/orgapachehadoop-1086/

Thanks,
Bharat

On 3/21/18, 11:44 PM, "Wangda Tan" wrote:

Hi folks,

Thanks to the many who helped with this release since Dec 2017 [1].
We've
created RC0 for Apache Hadoop 3.1.0. The artifacts are available
here:

http://people.apache.org/~wangda/hadoop-3.1.0-RC0/

The RC tag in git is release-3.1.0-RC0.

The maven artifacts are available via repository.apache.org at
https://repository.apache.org/content/repositories/
orgapachehadoop-1086/

This vote will run 7 days (5 weekdays), ending on Mar 28 at 11:59 pm
Pacific.

3.1.0 contains 727 [2] fixed JIRA issues since 3.0.0. Notable
additions
include the first class GPU/FPGA support on YARN, Native services,
Support
rich placement constraints in YARN, S3-related enhancements, allow
HDFS
block replicas to be provided by an external storage system, etc.

We’d like to use this as a starting release for 3.1.x [1], depending
on how
it goes, get it stabilized and potentially use a 3.1.1 in several
weeks as
the stable release.

We have done testing with a pseudo cluster and distributed shell job.
My +1
to start.

Best,
Wangda/Vinod

[1]
https://lists.apache.org/thread.html/b3fb3b6da8b6357a68513a6dfd104b
c9e19e559aedc5ebedb4ca08c8@%3Cyarn-dev.hadoop.apache.org%3E
[2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in
(3.1.0)
AND fixVersion not in (3.0.0, 3.0.0-beta1) AND status = Resolved
ORDER
BY
fixVersion ASC

B�CB��[��X��ܚX�KK[XZ[���Y]�][��X��ܚX�PY���\X�K�ܙ�B��܈Y][ۘ[��[X[��K[XZ[���Y]�Z[Y���\X�K�ܙ�B�

[jira] [Created] (HDFS-13351) Revert HDFS-11156 from branch-2

2018-03-26 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-13351:
--

 Summary: Revert HDFS-11156 from branch-2
 Key: HDFS-13351
 URL: https://issues.apache.org/jira/browse/HDFS-13351
 Project: Hadoop HDFS
  Issue Type: Task
  Components: webhdfs
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Per discussion in HDFS-11156, lets revert the change from branch-2. New patch 
can be tracked in HDFS-12459 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Adopt HDSL as a new Hadoop subproject

2018-03-20 Thread Weiwei Yang

+1 (non-binding)
I really like this proposal and thanks for all the discussions.

--
Weiwei

On 21 Mar 2018, 8:39 AM +0800, Arpit Agarwal , wrote:
+1 (binding)

Arpit

On 3/20/18, 11:21 AM, "Owen O'Malley" wrote:

All,

Following our discussions on the previous thread (Merging branch HDFS-7240
to trunk), I'd like to propose the following:

* HDSL become a subproject of Hadoop.
* HDSL will release separately from Hadoop. Hadoop releases will not
contain HDSL and vice versa.
* HDSL will get its own jira instance so that the release tags stay
separate.
* On trunk (as opposed to release branches) HDSL will be a separate module
in Hadoop's source tree. This will enable the HDSL to work on their trunk
and the Hadoop trunk without making releases for every change.
* Hadoop's trunk will only build HDSL if a non-default profile is enabled.
* When Hadoop creates a release branch, the RM will delete the HDSL module
from the branch.
* HDSL will have their own Yetus checks and won't cause failures in the
Hadoop patch check.

I think this accomplishes most of the goals of encouraging HDSL development
while minimizing the potential for disruption of HDFS development.

The vote will run the standard 7 days and requires a lazy 2/3 vote. PMC
votes are binding, but everyone is encouraged to vote.

+1 (binding)

.. Owen

Т�ХF�V�7V'67&�&R�R���â�Fg2�FWb�V�7V'67&�&T�F���6�R��&pФf�"FF�F6G2�R���â�Fg2�FWbֆV��F���6�R��&pР

Re: [NOTICE] branch-3.1 created. 3.1.0 code freeze started.

2018-02-10 Thread Weiwei Yang

Hi Wangda

I am a bit confused about the branches, I am expecting something like following

 branch-3.1.0: 3.1.0
 branch-3.1: 3.1.x
 trunk: 3.2.x

Code freezes in branch-3.1.0 which only allows blocks.
However, did you mean you created branch-3.1 for 3.1.0? In this case, what 
branch should be used for 3.1.x?

--
Weiwei

On 11 Feb 2018, 11:11 AM +0800, Wangda Tan , wrote:
Hi all,

As proposed in [1], code freeze of 3.1.0 is already started. I created
branch-3.1 from trunk and will set target version of trunk to 3.2.0.

Please note that only blockers/criticals can be committed to 3.1.0, and all
such commits should be cherry-picked to branch-3.1.

[1]
https://lists.apache.org/thread.html/0d8050ecef4181f8355b3fed732018ad970b6cd11e9436db322b0dd9@%3Cyarn-dev.hadoop.apache.org%3E

Thanks,
Wangda

Re: [VOTE] Merge YARN-6592 feature branch to trunk

2018-01-26 Thread Weiwei Yang

+1
We are also looking forward to this : )

--
Weiwei

On 27 Jan 2018, 9:16 AM +0800, Wangda Tan , wrote:
+1

On Sat, Jan 27, 2018 at 2:05 AM, Chris Douglas 
mailto:cdoug...@apache.org>> wrote:
+1 Looking forward to this. -C

On Fri, Jan 26, 2018 at 7:28 AM, Arun Suresh 
mailto:asur...@apache.org>> wrote:
> Hello yarn-dev@
>
> Based on the positive feedback from the DISCUSS thread [1], I'd like to
> start a formal vote to merge YARN-6592 [2] to trunk. The vote will run for 5
> days, and will end Jan 31 7:30AM PDT.
>
> This feature adds support for placing containers in YARN using rich
> placement constraints. For example, this can be used by applications to
> co-locate containers on a node or rack (affinity constraint), spread
> containers across nodes or racks (anti-affinity constraint), or even specify
> the maximum number of containers on a node/rack (cardinality constraint).
>
> We have integrated this feature into the Distributed-Shell application for
> feature testing. We have performed end-to-end testing on moderately-sized
> clusters to verify that constraints work fine. Performance tests have been
> done via both SLS tests and Capacity Scheduler performance unit tests, and
> no regression was found. We have opened a JIRA to track Jenkins acceptance
> of the aggregated patch [3]. Documentation is in the process of being
> completed [4]. You can also check our design document for more details [5].
>
> Config flags are needed to enable this feature and it should not have any
> effect on YARN when turned off. Once merged, we plan to work on further
> improvements, which can be tracked in the umbrella YARN-7812 [6].
>
> Kindly do take a look at the branch and raise issues/concerns that need to
> be addressed before the merge.
>
> Many thanks to Konstantinos, Wangda, Panagiotis, Weiwei, and Sunil for their
> contributions to this effort, as well as Subru, Chris, Carlo, and Vinod for
> their inputs and discussions.
>
>
> Cheers
> -Arun
>
> [1]
> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201801.mbox/%3CCAMreUaz%3DGnsjOLZ%3Dem2x%3DQS7qh27euCWNw6Bo_4Cu%2BfXnXhyNA%40mail.gmail.com%3E
> [2] https://issues.apache.org/jira/browse/YARN-6592
> [3] https://issues.apache.org/jira/browse/YARN-7792
> [4] https://issues.apache.org/jira/browse/YARN-7780
> [5]
> https://issues.apache.org/jira/secure/attachment/12867869/YARN-6592-Rich-Placement-Constraints-Design-V1.pdf
> [6] https://issues.apache.org/jira/browse/YARN-7812

-
To unsubscribe, e-mail: 
yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 
yarn-dev-h...@hadoop.apache.org

Re: [DISCUSS] Merge YARN-6592 to trunk

2018-01-25 Thread Weiwei Yang

+1, thanks for getting to this milestone Arun.
I’ve done some basic validations on a 4 nodes cluster, with some general
affinity/anti-affinty/cardinality constraints, it worked. I’ve also reviewed
the doc, it’s in good shape and very illustrative.

Thanks.

--
Weiwei

On 26 Jan 2018, 10:44 AM +0800, Sunil G , wrote:
+1.

Thanks Arun.

I did manual testing for check affinity and anti-affinity features with
placement allocator. Also checked SLS to see any performance regression, and
there are not much difference as Arun mentioned.

Thanks all the folks for working on this. Kudos!

- Sunil

On Fri, Jan 26, 2018 at 5:16 AM Arun Suresh
mailto:asur...@apache.org>> wrote:
Hello yarn-dev@

We feel that the YARN-6592 dev branch mostly in shape to be merged into trunk.
This branch adds support for placing containers in YARN using rich placement
constraints. For example, this can be used by applications to co-locate
containers on a node or rack (affinity constraint), spread containers across
nodes or racks (anti-affinity constraint), or even specify the maximum number
of containers on a node/rack (cardinality constraint).

We have integrated this feature into the Distributed-Shell application for
feature testing. We have performed end-to-end testing on moderately-sized
clusters to verify that constraints work fine. Performance tests have been done
via both SLS tests and Capacity Scheduler performance unit tests, and no
regression was found. We have opened a JIRA to track Jenkins acceptance of the
aggregated patch [2]. Documentation is in the process of being completed [3].
You can also check our design document for more details [4].

Config flags are needed to enable this feature and it should not have any
effect on YARN when turned off. Once merged, we plan to work on further
improvements, which can be tracked in the umbrella YARN-7812 [5].

Kindly do take a look at the branch and raise issues/concerns that need to be
addressed before the merge.

Many thanks to Konstantinos, Wangda, Panagiotis, Weiwei, and Sunil for their
contributions to this effort, as well as Subru, Chris, Carlo, and Vinod for
their inputs and discussions.

Cheers
-Arun

[1] https://issues.apache.org/jira/browse/YARN-6592
[2] https://issues.apache.org/jira/browse/YARN-7792
[3] https://issues.apache.org/jira/browse/YARN-7780
[4]
https://issues.apache.org/jira/secure/attachment/12867869/YARN-6592-Rich-Placement-Constraints-Design-V1.pdf
[5] https://issues.apache.org/jira/browse/YARN-7812

[jira] [Resolved] (HDFS-12936) java.lang.OutOfMemoryError: unable to create new native thread

2017-12-18 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12936.

Resolution: Not A Bug

> java.lang.OutOfMemoryError: unable to create new native thread
> --
>
> Key: HDFS-12936
> URL: https://issues.apache.org/jira/browse/HDFS-12936
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
> Environment: CDH5.12
> hadoop2.6
>Reporter: Jepson
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> I configure the max user processes  65535 with any user ,and the datanode 
> memory is 8G.
> When a log of data was been writeen,the datanode was been shutdown.
> But I can see the memory use only < 1000M.
> Please to see https://pan.baidu.com/s/1o7BE0cy
> *DataNode shutdown error log:*  
> {code:java}
> 2017-12-17 23:58:14,422 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder: 
> BP-1437036909-192.168.17.36-1509097205664:blk_1074725940_987917, 
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2017-12-17 23:58:31,425 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. 
> Will retry in 30 seconds.
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:714)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-12-17 23:59:01,426 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. 
> Will retry in 30 seconds.
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:714)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-12-17 23:59:05,520 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. 
> Will retry in 30 seconds.
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:714)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-12-17 23:59:31,429 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving BP-1437036909-192.168.17.36-1509097205664:blk_1074725951_987928 
> src: /192.168.17.54:40478 dest: /192.168.17.48:50010
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [ANNOUNCE] Apache Hadoop 3.0.0 GA is released

2017-12-14 Thread Weiwei Yang

Hi Andrew

Congratulations on the 3.0 GA, another mile stone of Hadoop history. Thanks for 
driving this release, excellent work!
Took a look at the release notes, also seems nice. But one question, who is 
maintaining https://wiki.apache.org/hadoop/PoweredBy? Some of info is quite out 
of date, would like to know how to update.

Thanks

--
Weiwei

On 15 Dec 2017, 2:45 AM +0800, Andrew Wang , wrote:
Hi all,

I'm pleased to announce that Apache Hadoop 3.0.0 is generally available
(GA).

3.0.0 GA consists of 302 bug fixes, improvements, and other enhancements
since 3.0.0-beta1. This release marks a point of quality and stability for
the 3.0.0 release line, and users of earlier 3.0.0-alpha and -beta releases
are encouraged to upgrade.

Looking back, 3.0.0 GA is the culmination of over a year of work on the
3.0.0 line, starting with 3.0.0-alpha1 which was released in September
2016. Altogether, 3.0.0 incorporates 6,242 changes since 2.7.0.

Users are encouraged to read the overview of major changes
 in 3.0.0. The GA release
notes

Re: [VOTE] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-12-04 Thread Weiwei Yang

+1 (non-binding)
Thanks for getting this done Sunil.

--
Weiwei

On 5 Dec 2017, 4:06 AM +0800, Eric Payne , 
wrote:
+1. Thanks Sunil for the work on this branch.
Eric

From: Sunil G ; Hdfs-dev 
; Hadoop Common ; 
"mapreduce-...@hadoop.apache.org" http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201711.mbox/%3CCACYiTuhKhF1JCtR7ZFuZSEKQ4sBvN_n_tV5GHsbJ3YeyJP%2BP4Q%40mail.gmail.com%3E

[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7510

[4] : https://issues.apache.org/jira/browse/YARN-7533


Regards

Sunil and Wangda

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-11-29 Thread Weiwei Yang

Hi Sunil

+1 from my side.
Actually we have applied some of these patches to our production cluster since 
Sep this year, on over 2000+ nodes and it works nicely. +1 for the merge. I am 
pretty sure this feature will help a lot of users, especially those on cloud. 
Thanks for getting this done, great job!

--
Weiwei

On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S , 
wrote:
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G  wrote:

Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

- YARN-6471. Support to add min/max resource configuration for a queue
- YARN-7332. Compute effectiveCapacity per each resource vector
- YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881*

Re: Apache Hadoop 2.8.3 Release Plan

2017-11-21 Thread Weiwei Yang

Agree with Konstantin. This two issues has been opened for a while but could 
not reach a consensus on the fix, hope this gets enough attention from the 
community and get them resolved.

Thanks

--
Weiwei

On 22 Nov 2017, 11:18 AM +0800, Konstantin Shvachko , 
wrote:
I would consider these two blockers for 2.8.3 as they crash NN:

https://issues.apache.org/jira/browse/HDFS-12638
https://issues.apache.org/jira/browse/HDFS-12832

Thanks,
--Konstantin

On Tue, Nov 21, 2017 at 11:16 AM, Junping Du  wrote:

Thanks Andrew and Wangda for comments!

To me, an improvement with 17 patches is not a big problem as this is
self-contained and I didn't see a single line of delete/update on existing
code - well, arguably, patches with only adding code can also have big
impact but not likely the case here.

While the dependency discussions on HADOOP-14964 are still going on, I
will leave the decision to JIRA discussion based on which approach we will
choose(shaded?) and impact. If we cannot make consensus in short term,
probably we have to miss this in 2.8.3 release.


Okay. Last call for blocker/critical fixes landing on branch-2.8.3. RC0
will get cut-off shortly.



Thanks,


Junping



From: Wangda Tan https://issues.apache.org/jira/browse/YARN-2113. It is a
big change in terms of patch size, but since it fixes broken use case
(balance user usage under Capacity Scheduler leaf queue), we backported it
to 2.8.2 after thorough tests and validations by Yahoo.

I'm not quite familiar with HADOOP-14964, I will leave the decision to
committers who know more about the field.

Just my 2 cents.

Regards,
Wangda


On Tue, Nov 21, 2017 at 10:21 AM, Andrew Wang mailto:andrew.w...@cloudera.com>> wrote:
The Aliyun OSS code isn't a small improvement. If you look at Sammi's
comment
,
it's
a 17 patch series that is being backported in one shot. What we're talking
about is equivalent to merging a feature branch in a maintenance release. I
see that Kai and Chris are having a discussion about the dependency
changes, which indicates this is not a zero-risk change either. We really
should not be changing dependency versions in a maintenance unless it's
because of a bug.

It's unfortunate from a timing perspective that this missed 2.9.0, but I
still think it should wait for the next minor. Merging a feature into a
maintenance release sets the wrong precedent.

Best,
Andrew

On Tue, Nov 21, 2017 at 1:08 AM, Junping Du mailto:jd
u...@hortonworks.com>> wrote:

Thanks Kai for calling out this feature/improvement for attention and
Andrew for comments.


While I agree that maintenance release should focus on important bug fix
only, I doubt we have strict rules to disallow any features/improvements
to
land on maint release especially when those are small footprint or low
impact on existing code/features. In practice, we indeed has 77 new
features/improvements in latest 2.7.3 and 2.7.4 release.


Back to HADOOP-14964, I did a quick check and it looks like case here
belongs to self-contained improvement that has very low impact on
existing
code base, so I am OK with the improvement get landed on branch-2.8 in
case
it is well reviewed and tested.


However, as RM of branch-2.8, I have two concerns to accept it in our
2.8.3 release:

1. Timing - as I mentioned in beginning, the main purpose of 2.8.3 are
for
several critical bug fixes and we should target to release it very soon -
my current plan is to cut RC out within this week inline with waiting
for 3.0.0 vote closing. Can this improvement be well tested against
branch-2.8.3 within this strictly timeline? It seems a bit rush unless we
have strong commitment on test plan and activities in such a tight time.


2. Upgrading - I haven't heard we settle down the plan of releasing this
feature in 2.9.1 release - though I saw some discussions are going on
at HADOOP-14964. Assume 2.8.3 is released ahead of 2.9.1 and it includes
this improvement, then users consuming this feature/improvement have no
2.9
release to upgrade or forcefully upgrade with regression. We may need a
better upgrade story here.


Pls let me know what you think. Thanks!



Thanks,


Junping


--
*From:* Andrew Wang mailto:common-dev@hadoop.
apache.org>; hdfs-dev@hadoop.apache.org;
yarn-...@hadoop.apache.org

[jira] [Created] (HDFS-12770) Add doc about how to disable client socket cache

2017-11-03 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12770:
--

 Summary: Add doc about how to disable client socket cache
 Key: HDFS-12770
 URL: https://issues.apache.org/jira/browse/HDFS-12770
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


After HDFS-3365, client socket cache (PeerCache) can be disabled, but there is 
no doc about this. We should add some doc in hdfs-default.xml to instruct user 
how to disable it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Improvement on DN HB lock

2017-11-02 Thread Weiwei Yang

Hi everyone

Here is a JIRA discussing about improving DN HB lock 
https://issues.apache.org/jira/browse/HDFS-7060 , it seems folks who 
participated the discussion both agreed on reducing the use of dataset log but 
the ticket goes to stale for a while. Recently we hit the issue in a 450 nodes 
cluster which has very heavy write load, after applying the patch, it improves 
the HB latency a lot. So I appreciate if someone can help to review and get 
this move forward.

Appreciate your comments on this.
Thanks!

--
Weiwei

[jira] [Resolved] (HDFS-12757) DeadLock Happened Between DFSOutputStream and LeaseRenewer when LeaseRenewer#renew SocketTimeException

2017-11-02 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12757.

Resolution: Duplicate

> DeadLock Happened Between DFSOutputStream and LeaseRenewer when 
> LeaseRenewer#renew SocketTimeException
> --
>
> Key: HDFS-12757
> URL: https://issues.apache.org/jira/browse/HDFS-12757
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: HDFS-12757.patch
>
>
> Java stack is :
> {code:java}
> Found one Java-level deadlock:
> =
> "Topology-2 (735/2000)":
>   waiting to lock monitor 0x7fff4523e6e8 (object 0x0005d3521078, a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer),
>   which is held by "LeaseRenewer:admin@na61storage"
> "LeaseRenewer:admin@na61storage":
>   waiting to lock monitor 0x7fff5d41e838 (object 0x0005ec0dfa88, a 
> org.apache.hadoop.hdfs.DFSOutputStream),
>   which is held by "Topology-2 (735/2000)"
> Java stack information for the threads listed above:
> ===
> "Topology-2 (735/2000)":
> at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:227)
> - waiting to lock <0x0005d3521078> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
> at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:86)
> at 
> org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:467)
> at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:479)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.setClosed(DFSOutputStream.java:776)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeThreads(DFSOutputStream.java:791)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:848)
> - locked <0x0005ec0dfa88> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:805)
> - locked <0x0005ec0dfa88> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
> ..
> "LeaseRenewer:admin@na61storage":
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:750)
> - waiting to lock <0x0005ec0dfa88> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:586)
> at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:453)
> - locked <0x0005d3521078> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
> at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:76)
> at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:310)
> at java.lang.Thread.run(Thread.java:834)
> Found 1 deadlock.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12744) More logs when short-circuit read is failed and disabled

2017-10-29 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12744:
--

 Summary: More logs when short-circuit read is failed and disabled
 Key: HDFS-12744
 URL: https://issues.apache.org/jira/browse/HDFS-12744
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Short-circuit read (SCR) failed with following error

{noformat}
2017-10-21 16:42:28,024 WARN  [B.defaultRpcServer.handler=7,queue=7,port=16020] 
impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR
while attempting to set up short-circuit access. Block xxx is not valid
{noformat}

then short-circuit read is disabled for *10 minutes* without any warning 
message given in the log. This causes us spent some more time to figure out why 
we had a long time window that SCR was not working. Propose to add a warning 
log (other places already did) to indicate SCR is disabled and some more 
logging in DN to display what happened.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12701) More fine-grained locks in ShortCircuitCache

2017-10-24 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12701:
--

 Summary: More fine-grained locks in ShortCircuitCache
 Key: HDFS-12701
 URL: https://issues.apache.org/jira/browse/HDFS-12701
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.8.1
Reporter: Weiwei Yang


When cluster is heavily loaded, we found HBase regionserver handlers are often 
blocked by {{ShortCircuitCache}}. Dumped jstack and found more lots of thread 
waiting on obtain the cache lock. It should be able to be improved by using 
more fine-grained locks to improve the performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics

2017-10-19 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12684:
--

 Summary: Ozone: SCM metrics NodeCount is overlapping with node 
manager metrics
 Key: HDFS-12684
 URL: https://issues.apache.org/jira/browse/HDFS-12684
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, scm
Reporter: Weiwei Yang
Priority: Minor


I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
both SCM and SCMNodeManager has {{NodeCount}} metrics

{noformat}
 {
"name" : 
"Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
"modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
"ClientRpcPort" : "9860",
"DatanodeRpcPort" : "9861",
"NodeCount" : [ {
  "key" : "STALE",
  "value" : 0
}, {
  "key" : "DECOMMISSIONING",
  "value" : 0
}, {
  "key" : "DECOMMISSIONED",
  "value" : 0
}, {
  "key" : "FREE_NODE",
  "value" : 0
}, {
  "key" : "RAFT_MEMBER",
  "value" : 0
}, {
  "key" : "HEALTHY",
  "value" : 0
}, {
  "key" : "DEAD",
  "value" : 0
}, {
  "key" : "UNKNOWN",
  "value" : 0
} ],
"CompileInfo" : "2017-10-17T06:47Z xxx",
"Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
"SoftwareVersion" : "3.1.0-SNAPSHOT",
"StartedTimeInMillis" : 1508393551065
  }, {
"name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
"modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
"NodeCount" : [ {
  "key" : "STALE",
  "value" : 0
}, {
  "key" : "DECOMMISSIONING",
  "value" : 0
}, {
  "key" : "DECOMMISSIONED",
  "value" : 0
}, {
  "key" : "FREE_NODE",
  "value" : 0
}, {
  "key" : "RAFT_MEMBER",
  "value" : 0
}, {
  "key" : "HEALTHY",
  "value" : 0
}, {
  "key" : "DEAD",
  "value" : 0
}, {
  "key" : "UNKNOWN",
  "value" : 0
} ],
"OutOfChillMode" : false,
"MinimumChillModeNodes" : 1,
"ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 0 
nodes reported, minimal 1 nodes required."
  }
{noformat}

hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-12401) Ozone: TestBlockDeletingService#testBlockDeletionTimeout sometimes timeout

2017-10-03 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12401.

Resolution: Cannot Reproduce

> Ozone: TestBlockDeletingService#testBlockDeletionTimeout sometimes timeout
> --
>
> Key: HDFS-12401
> URL: https://issues.apache.org/jira/browse/HDFS-12401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: HDFS-7240
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Weiwei Yang
>Priority: Minor
>
> {code}
> testBlockDeletionTimeout(org.apache.hadoop.ozone.container.common.TestBlockDeletingService)
>   Time elapsed: 100.383 sec  <<< ERROR!
> java.util.concurrent.TimeoutException: Timed out waiting for condition. 
> Thread diagnostics:
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12546) Ozone: DB listing operation performance improvement

2017-09-25 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12546:
--

 Summary: Ozone: DB listing operation performance improvement
 Key: HDFS-12546
 URL: https://issues.apache.org/jira/browse/HDFS-12546
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


While investigating HDFS-12506, I found there are several {{getRangeKVs}} can 
be replaced by {{getSequentialRangeKVs}} to improve the performance. This JIRA 
is to track these improvements with sufficient tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12540) Ozone: node status text reported by SCM is a confusing

2017-09-24 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12540:
--

 Summary: Ozone: node status text reported by SCM is a confusing
 Key: HDFS-12540
 URL: https://issues.apache.org/jira/browse/HDFS-12540
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Priority: Trivial


At present SCM UI displays node status like following

{noformat}
Node Manager: Chill mode status: Out of chill mode. 15 of out of total 1 nodes 
have reported in.
{noformat}

this text is a bit confusing. UI retrieves status from 
{{SCMNodeManager#getNodeStatus}}, related call is {{#getChillModeStatus}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12539) Ozone: refactor some functions in KSMMetadataManagerImpl to be more readable and reusable

2017-09-24 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12539:
--

 Summary: Ozone: refactor some functions in KSMMetadataManagerImpl 
to be more readable and reusable
 Key: HDFS-12539
 URL: https://issues.apache.org/jira/browse/HDFS-12539
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Priority: Minor


This is from [~anu]'s review comment in HDFS-12506, 
[https://issues.apache.org/jira/browse/HDFS-12506?focusedCommentId=16178356&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16178356].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-12415) Ozone: TestXceiverClientManager and TestAllocateContainer occasionally fails

2017-09-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened HDFS-12415:


> Ozone: TestXceiverClientManager and TestAllocateContainer occasionally fails
> 
>
> Key: HDFS-12415
> URL: https://issues.apache.org/jira/browse/HDFS-12415
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>    Reporter: Weiwei Yang
>    Assignee: Weiwei Yang
> Attachments: HDFS-12415-HDFS-7240.001.patch, 
> HDFS-12415-HDFS-7240.002.patch, HDFS-12415-HDFS-7240.003.patch
>
>
> TestXceiverClientManager seems to be occasionally failing in some jenkins 
> jobs,
> {noformat}
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.ozone.scm.node.SCMNodeManager.getNodeStat(SCMNodeManager.java:828)
>  at 
> org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.hasEnoughSpace(SCMCommonPolicy.java:147)
>  at 
> org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.lambda$chooseDatanodes$0(SCMCommonPolicy.java:125)
> {noformat}
> see more from [this 
> report|https://builds.apache.org/job/PreCommit-HDFS-Build/21065/testReport/]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12524) Ozone: Record number of keys scanned and hinted for getRangeKVs call

2017-09-21 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12524:
--

 Summary: Ozone: Record number of keys scanned and hinted for 
getRangeKVs call
 Key: HDFS-12524
 URL: https://issues.apache.org/jira/browse/HDFS-12524
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: logging, ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


Add debug logging to record number of keys scanned and hinted for 
{{getRangeKVs}} calls, this will be helpful to debug performance issues since 
{{getRangeKVs}} is often the place where we get the lag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12506) Ozone: ListBucket is too slow

2017-09-20 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12506:
--

 Summary: Ozone: ListBucket is too slow
 Key: HDFS-12506
 URL: https://issues.apache.org/jira/browse/HDFS-12506
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Blocker


Generated 3 million keys in ozone, and run {{listBucket}} command to get a list 
of buckets under a volume,

{code}
bin/hdfs oz -listBucket http://15oz1.fyre.ibm.com:9864/vol-0-15143 -user wwei
{code}

this call spent over *15 seconds* to finish. The problem was caused by the 
inflexible structure of KSM DB. Right now {{ksm.db}} stores keys like following

{code}
/v1/b1
/v1/b1/k1
/v1/b1/k2
/v1/b1/k3
/v1/b2
/v1/b2/k1
/v1/b2/k2
/v1/b2/k3
/v1/b3
/v1/b4
{code}

keys are sorted in nature order so when we do list buckets under a volume e.g 
/v1, we need to seek to /v1 point and start to iterate and filter keys, this 
ends up with scanning all keys under volume /v1. The problem with this design 
is we don't have an efficient approach to locate all buckets without scanning 
the keys.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12504) Ozone: Improve SQLCLI performance

2017-09-19 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12504:
--

 Summary: Ozone: Improve SQLCLI performance
 Key: HDFS-12504
 URL: https://issues.apache.org/jira/browse/HDFS-12504
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


In my test, my {{ksm.db}} has *3017660* entries with total size of *128mb*, 
SQLCLI tool runs over *2 hours* but still not finish exporting the DB. This is 
because it iterates each entry and inserts that to another sqllite DB file, 
which is not efficient. We need to improve this to be running more efficiently 
on large DB files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12503) Ozone: some UX improvements to oz_debug

2017-09-19 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12503:
--

 Summary: Ozone: some UX improvements to oz_debug
 Key: HDFS-12503
 URL: https://issues.apache.org/jira/browse/HDFS-12503
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


I tried to use {{oz_debug}} to dump KSM DB for offline analysis, found a few 
problems need to be fixed in order to make this tool easier to use. I know this 
is a debug tool for admins, but it's still necessary to improve the UX so new 
users (like me) can figure out how to use it without reading more docs.

# Support *--help* argument. --help is the general arg for all hdfs scripts to 
print usage.
# When specify output path {{-o}}, we need to add a description to let user 
know the path needs to be a file (instead of a dir). If the path is specified 
as a dir, it will end up with a funny error {{unable to open the database file 
(out of memory)}}, which is pretty misleading. And it will be helpful to add a 
check to make sure the specified path is not an existing dir.
# SQLCLI currently swallows exception
# We should remove {{levelDB}} words from the command output as we are by 
default using rocksDB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12500) Ozone: add logger for oz shell commands and move error stack traces to DEBUG level

2017-09-19 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12500:
--

 Summary: Ozone: add logger for oz shell commands and move error 
stack traces to DEBUG level
 Key: HDFS-12500
 URL: https://issues.apache.org/jira/browse/HDFS-12500
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Priority: Minor


Per discussion in HDFS-12489 to reduce the verbosity of logs when exception 
happens, lets add logger to {{Shell.java}} and move error stack traces to DEBUG 
level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12492) Ozone: ListVolume output misses some attributes

2017-09-19 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12492:
--

 Summary: Ozone: ListVolume output misses some attributes
 Key: HDFS-12492
 URL: https://issues.apache.org/jira/browse/HDFS-12492
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


When do a listVolume call, we get output like following

{noformat}
[ {
  "owner" : {
    "name" : "wwei"
  },
  "quota" : {
    "unit" : "TB",
    "size" : 1048576
  },
  "volumeName" : "vol-0-84022",
  "createdOn" : "Mon, 18 Sep 2017 03:09:46 GMT",
  "createdBy" : null,
  "bytesUsed" : 0,
  "bucketCount" : 0
{noformat}

Values for *createdOn*, *createdBy* and *bytesUsed* and *bucketCount* are all 
missing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12489) Ozone: OzoneRestClientException swallows exceptions which makes client hard to debug failures

2017-09-19 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12489:
--

 Summary: Ozone: OzoneRestClientException swallows exceptions which 
makes client hard to debug failures 
 Key: HDFS-12489
 URL: https://issues.apache.org/jira/browse/HDFS-12489
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


There are multiple try-catch places swallow exceptions when transforming some 
other exception to {{OzoneRestClientException}}. As a result, when client runs 
into such code paths, they lose track of what was going on which makes the 
debug extremely difficult. See below example

{code}
bin/hdfs oz -listBucket  http://15oz1.fyre.ibm.com:9864/vol-0-84022 -user wwei
Command Failed : {"httpCode":0,"shortMessage":"Read timed 
out","resource":null,"message":"Read timed 
out","requestID":null,"hostName":null}
{code}

the returned message doesn't help much on debugging where and how it reads 
timed out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12488) Ozone: OzoneRestClient has no notion of configuration

2017-09-19 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12488:
--

 Summary: Ozone: OzoneRestClient has no notion of configuration
 Key: HDFS-12488
 URL: https://issues.apache.org/jira/browse/HDFS-12488
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang


When I test ozone on a 15 nodes cluster with millions of keys, responses of 
rest client becomes to be slower. Following call times out after default 5s,

{code}
bin/hdfs oz -listBucket  http://15oz1.fyre.ibm.com:9864/vol-0-84022 -user wwei
Command Failed : {"httpCode":0,"shortMessage":"Read timed 
out","resource":null,"message":"Read timed 
out","requestID":null,"hostName":null}
{code}

Then I increase the timeout by explicitly setting following property in 
{{ozone-site.xml}}

{code}
 
ozone.client.socket.timeout.ms
1
  
{code}

but this doesn't work and rest clients are still created with default *5s* 
timeout. This needs to be fixed. Just like {{DFSClient}}, we should make 
{{OzoneRestClient}} to be configuration awareness, so that clients can adjust 
client configuration on demand. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12477) Ozone: Some minor text improvement in SCM web UI

2017-09-17 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12477:
--

 Summary: Ozone: Some minor text improvement in SCM web UI
 Key: HDFS-12477
 URL: https://issues.apache.org/jira/browse/HDFS-12477
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: scm, ui
Reporter: Weiwei Yang
Priority: Trivial


While trying out SCM UI, there seems to have some small text problems, 

bq. Node Manager: Minimum chill mode nodes)

It has an extra ).

bq. $$hashKey   object:9

I am not really sure what does this mean? Would this help?

bq. Node counts

Can we place the HEALTHY ones at the top of the table?

bq. Node Manager: Chill mode status: Out of chill mode. 15 of out of total 1 
nodes have reported in.

Can we refine this text a bit?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-12461) Ozone: Ozone data placement is not even

2017-09-14 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12461.

   Resolution: Not A Problem
 Assignee: Weiwei Yang
Fix Version/s: HDFS-7240

> Ozone: Ozone data placement is not even
> ---
>
> Key: HDFS-12461
> URL: https://issues.apache.org/jira/browse/HDFS-12461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Weiwei Yang
>Priority: Blocker
>  Labels: ozoneMerge
> Fix For: HDFS-7240
>
>
> On a machine with 3 data disks, Ozone keeps on picking the same disk to place 
> all containers. Looks like we have a bug in the round robin selection of 
> disks.
> Steps to Reproduce:
> 1. Install an Ozone cluster.
> 2. Make sure that datanodes have more than one disk.
> 3. Run corona few times, each run creates more containers.
> 4. Login into the data node.
> 5. Run a command like tree or ls -R /data or independently verify each 
> location.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12463) Ozone: Fix TestXceiverClientMetrics#testMetrics

2017-09-14 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12463:
--

 Summary: Ozone: Fix TestXceiverClientMetrics#testMetrics 
 Key: HDFS-12463
 URL: https://issues.apache.org/jira/browse/HDFS-12463
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Priority: Minor


{{TestXceiverClientMetrics#testMetrics}} is failing with following error in 
recent jenkins job,

{noformat}
java.util.ConcurrentModificationException: null
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
at java.util.ArrayList$Itr.next(ArrayList.java:851)
at 
org.apache.hadoop.ozone.scm.TestXceiverClientMetrics.lambda$testMetrics$2(TestXceiverClientMetrics.java:134)
{noformat}

looks like a non thread safe list caused this race condition in the test case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12459) Fix revert: Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API

2017-09-14 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12459:
--

 Summary: Fix revert: Add new op GETFILEBLOCKLOCATIONS to WebHDFS 
REST API
 Key: HDFS-12459
 URL: https://issues.apache.org/jira/browse/HDFS-12459
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-11156 was reverted because the implementation was non optimal, based on 
the suggestion from [~shahrs87], we should avoid creating a dfs client to get 
block locations because that create extra RPC call. Instead we should use 
{{NamenodeProtocols#getBlockLocations}} then covert {{LocatedBlocks}} to 
{{BlockLocation[]}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API

2017-09-14 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened HDFS-11156:


> Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
> 
>
> Key: HDFS-11156
> URL: https://issues.apache.org/jira/browse/HDFS-11156
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.3
>    Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Fix For: 3.0.0-alpha2
>
> Attachments: BlockLocationProperties_JSON_Schema.jpg, 
> BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, 
> HDFS-11156.01.patch, HDFS-11156.02.patch, HDFS-11156.03.patch, 
> HDFS-11156.04.patch, HDFS-11156.05.patch, HDFS-11156.06.patch, 
> HDFS-11156.07.patch, HDFS-11156.08.patch, HDFS-11156.09.patch, 
> HDFS-11156.10.patch, HDFS-11156.11.patch, HDFS-11156.12.patch, 
> HDFS-11156.13.patch, HDFS-11156.14.patch, HDFS-11156.15.patch, 
> HDFS-11156.16.patch, HDFS-11156-branch-2.01.patch, 
> Output_JSON_format_v10.jpg, SampleResponse_JSON.jpg
>
>
> Following webhdfs REST API
> {code}
> http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS&offset=0&length=1
> {code}
> will get a response like
> {code}
> {
>   "LocatedBlocks" : {
> "fileLength" : 1073741824,
> "isLastBlockComplete" : true,
> "isUnderConstruction" : false,
> "lastLocatedBlock" : { ... },
> "locatedBlocks" : [ {...} ]
>   }
> }
> {code}
> This represents for *o.a.h.h.p.LocatedBlocks*. However according to 
> *FileSystem* API, 
> {code}
> public BlockLocation[] getFileBlockLocations(Path p, long start, long len)
> {code}
> clients would expect an array of BlockLocation. This mismatch should be 
> fixed. Marked as Incompatible change as this will change the output of the 
> GET_BLOCK_LOCATIONS API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12443) Ozone: Improve SCM block deletion throttling algorithm

2017-09-13 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12443:
--

 Summary: Ozone: Improve SCM block deletion throttling algorithm 
 Key: HDFS-12443
 URL: https://issues.apache.org/jira/browse/HDFS-12443
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, scm
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Currently SCM scans delLog to send deletion transactions to datanode 
periodically, the throttling algorithm is simple, it scans at most 
{{BLOCK_DELETE_TX_PER_REQUEST_LIMIT}} (by default 50) at a time. This is 
non-optimal, worst case it might cache 50 TXs for 50 different DNs so each DN 
will only get 1 TX to proceed in an interval, this will make the deletion slow. 
An improvement to this is to make this throttling by datanode, e.g 50 TXs per 
datanode per interval.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-12440) Ozone: TestAllocateContainer fails on jenkins

2017-09-13 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12440.

   Resolution: Duplicate
Fix Version/s: HDFS-7240

> Ozone: TestAllocateContainer fails on jenkins
> -
>
> Key: HDFS-12440
> URL: https://issues.apache.org/jira/browse/HDFS-12440
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>    Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: HDFS-7240
>
>
> I am seeing this failure in [this jenkins 
> report|https://builds.apache.org/job/PreCommit-HDFS-Build/21089/testReport/org.apache.hadoop.ozone.scm/TestAllocateContainer/testAllocate/],
>  with following error
> {noformat}
> Stacktrace
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.ozone.scm.node.SCMNodeManager.getNodeStat(SCMNodeManager.java:828)
>  at 
> org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.hasEnoughSpace(SCMCommonPolicy.java:147)
>  at 
> org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.lambda$chooseDatanodes$0(SCMCommonPolicy.java:125)
>  at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12440) Ozone: TestAllocateContainer fails on jenkins

2017-09-12 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12440:
--

 Summary: Ozone: TestAllocateContainer fails on jenkins
 Key: HDFS-12440
 URL: https://issues.apache.org/jira/browse/HDFS-12440
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


I am seeing this failure in [this jenkins 
report|https://builds.apache.org/job/PreCommit-HDFS-Build/21067/testReport/org.apache.hadoop.ozone.scm/TestAllocateContainer/org_apache_hadoop_ozone_scm_TestAllocateContainer/],
 with following error

{noformat}
Stacktrace

java.lang.NullPointerException: null
at 
org.apache.hadoop.hdfs.MiniDFSCluster.setDataNodeStorageCapacities(MiniDFSCluster.java:1715)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.setDataNodeStorageCapacities(MiniDFSCluster.java:1694)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1674)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:882)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:494)
at 
org.apache.hadoop.ozone.MiniOzoneCluster.(MiniOzoneCluster.java:98)
at 
org.apache.hadoop.ozone.MiniOzoneCluster.(MiniOzoneCluster.java:77)
at 
org.apache.hadoop.ozone.MiniOzoneCluster$Builder.build(MiniOzoneCluster.java:441)
at 
org.apache.hadoop.ozone.scm.TestAllocateContainer.init(TestAllocateContainer.java:56)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12415) Ozone: TestXceiverClientManager occasionally fails

2017-09-11 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12415:
--

 Summary: Ozone: TestXceiverClientManager occasionally fails
 Key: HDFS-12415
 URL: https://issues.apache.org/jira/browse/HDFS-12415
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


TestXceiverClientManager seems to be occasionally failing in some jenkins jobs,

{noformat}
java.lang.NullPointerException
 at 
org.apache.hadoop.ozone.scm.node.SCMNodeManager.getNodeStat(SCMNodeManager.java:828)
 at 
org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.hasEnoughSpace(SCMCommonPolicy.java:147)
 at 
org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.lambda$chooseDatanodes$0(SCMCommonPolicy.java:125)
{noformat}

see more from [this 
report|https://builds.apache.org/job/PreCommit-HDFS-Build/21065/testReport/]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-11922) Ozone: KSM: Garbage collect deleted blocks

2017-09-11 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-11922.

  Resolution: Done
   Fix Version/s: HDFS-7240
Target Version/s: HDFS-7240

> Ozone: KSM: Garbage collect deleted blocks
> --
>
> Key: HDFS-11922
> URL: https://issues.apache.org/jira/browse/HDFS-11922
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Anu Engineer
>    Assignee: Weiwei Yang
>Priority: Critical
> Fix For: HDFS-7240
>
> Attachments: Asynchronous key delete .pdf
>
>
> We need to garbage collect deleted blocks from the Datanodes. There are two 
> cases where we will have orphaned blocks. One is like the classical HDFS, 
> where someone deletes a key and we need to delete the corresponding blocks.
> Another case, is when someone overwrites a key -- an overwrite can be treated 
> as a delete and a new put -- that means that older blocks need to be GC-ed at 
> some point of time. 
> Couple of JIRAs has discussed this in one form or another -- so consolidating 
> all those discussions in this JIRA. 
> HDFS-11796 -- needs to fix this issue for some tests to pass 
> HDFS-11780 -- changed the old overwriting behavior to not supporting this 
> feature for time being.
> HDFS-11920 - Once again runs into this issue when user tries to put an 
> existing key.
> HDFS-11781 - delete key API in KSM only deletes the metadata -- and relies on 
> GC for Datanodes. 
> When we solve this issue, we should also consider 2 more aspects. 
> One, we support versioning in the buckets, tracking which blocks are really 
> orphaned is something that KSM will do. So delete and overwrite at some point 
> needs to decide how to handle versioning of buckets.
> Two, If a key exists in a closed container, then it is immutable, hence the 
> strategy of removing the key might be more complex than just talking to an 
> open container.
> cc : [~xyao], [~cheersyang], [~vagarychen], [~msingh], [~yuanbo], 
> [~szetszwo], [~nandakumar131]
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12391) Ozone: TestKSMSQLCli is not working as expected

2017-09-04 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12391:
--

 Summary: Ozone: TestKSMSQLCli is not working as expected
 Key: HDFS-12391
 URL: https://issues.apache.org/jira/browse/HDFS-12391
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, test
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


I found this issue while investigating the {{TestKSMSQLCli}} failure in [this 
jenkins 
report|https://builds.apache.org/job/PreCommit-HDFS-Build/20984/testReport/], 
the test is supposed to use parameterized class to test both {{LevelDB}} and 
{{RocksDB}} implementation of metadata stores, however it only tests default 
{{RocksDB}} case twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-12367) Ozone: Too many open files error while running corona

2017-09-04 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12367.

Resolution: Duplicate

I think this issue no longer happens to me, closing it as a dup to HDFS-12382 
as this looks like to be fixed there, thanks [~nandakumar131]. [~msingh] feel 
free to create another lower severity JIRA to track resource leaks you found in 
code level. I will close this one as it is no longer a blocker for tests.

> Ozone: Too many open files error while running corona
> -
>
> Key: HDFS-12367
> URL: https://issues.apache.org/jira/browse/HDFS-12367
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, tools
>    Reporter: Weiwei Yang
>Assignee: Mukul Kumar Singh
>
> Too many open files error keeps happening to me while using corona, I have 
> simply setup a single node cluster and run corona to generate 1000 keys, but 
> I keep getting following error
> {noformat}
> ./bin/hdfs corona -numOfThreads 1 -numOfVolumes 1 -numOfBuckets 1 -numOfKeys 
> 1000
> 17/08/28 00:47:42 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 17/08/28 00:47:42 INFO tools.Corona: Number of Threads: 1
> 17/08/28 00:47:42 INFO tools.Corona: Mode: offline
> 17/08/28 00:47:42 INFO tools.Corona: Number of Volumes: 1.
> 17/08/28 00:47:42 INFO tools.Corona: Number of Buckets per Volume: 1.
> 17/08/28 00:47:42 INFO tools.Corona: Number of Keys per Bucket: 1000.
> 17/08/28 00:47:42 INFO rpc.OzoneRpcClient: Creating Volume: vol-0-05000, with 
> wwei as owner and quota set to 1152921504606846976 bytes.
> 17/08/28 00:47:42 INFO tools.Corona: Starting progress bar Thread.
> ...
> ERROR tools.Corona: Exception while adding key: key-251-19293 in bucket: 
> bucket-0-34960 of volume: vol-0-05000.
> java.io.IOException: Exception getting XceiverClient.
>   at 
> org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:156)
>   at 
> org.apache.hadoop.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:122)
>   at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.getFromKsmKeyInfo(ChunkGroupOutputStream.java:289)
>   at 
> org.apache.hadoop.ozone.client.rpc.OzoneRpcClient.createKey(OzoneRpcClient.java:487)
>   at 
> org.apache.hadoop.ozone.tools.Corona$OfflineProcessor.run(Corona.java:352)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: com.google.common.util.concurrent.UncheckedExecutionException: 
> java.lang.IllegalStateException: failed to create a child event loop
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>   at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
>   at 
> org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:144)
>   ... 9 more
> Caused by: java.lang.IllegalStateException: failed to create a child event 
> loop
>   at 
> io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:68)
>   at 
> io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49)
>   at 
> io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:61)
>   at 
> io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:52)
>   at 
> io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:44)
>   at 
> io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:36)
>   at org.apache.hadoop.scm.XceiverClient.connect(XceiverClient.java:76)
>   at 
> org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:151)
>   at 
> org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:145)
>   at 
> com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
>   at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(Lo

[jira] [Created] (HDFS-12389) Ozone: oz commandline list calls should return valid JSON format output

2017-09-03 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12389:
--

 Summary: Ozone: oz commandline list calls should return valid JSON 
format output
 Key: HDFS-12389
 URL: https://issues.apache.org/jira/browse/HDFS-12389
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


At present the outputs of {{listVolume}}, {{listBucket}} and {{listKey}} are 
hard to parse, for example following call

{code}
./bin/hdfs oz -listVolume http://localhost:9864 -user wwei
{code}

lists all volumes in my cluster and it returns

{noformat}
{
"version" : 0,
"md5hash" : null,
"createdOn" : "Mon, 04 Sep 2017 03:25:22 GMT",
"modifiedOn" : "Mon, 04 Sep 2017 03:25:22 GMT",
"size" : 10240,
"keyName" : "key-0-22381",
"dataFileName" : null
  }
 {  
"version" : 0,
"md5hash" : null,
"createdOn" : "Mon, 04 Sep 2017 03:25:22 GMT",
"modifiedOn" : "Mon, 04 Sep 2017 03:25:22 GMT",
"size" : 10240,
"keyName" : "key-0-22381",
"dataFileName" : null
  }
  ...
{noformat}

this is not a valid JSON format output hence it is hard to parse in clients' 
script for further interactions. Propose to reformat them to valid JSON data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12367) Ozone: Too many open files error while running corona

2017-08-28 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12367:
--

 Summary: Ozone: Too many open files error while running corona
 Key: HDFS-12367
 URL: https://issues.apache.org/jira/browse/HDFS-12367
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, tools
Reporter: Weiwei Yang


Too many open files error keeps happening to me while using corona, I have 
simply setup a single node cluster and run corona to generate 1000 keys, but I 
keep getting following error

{noformat}
./bin/hdfs corona -numOfThreads 1 -numOfVolumes 1 -numOfBuckets 1 -numOfKeys 
1000
17/08/28 00:47:42 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
17/08/28 00:47:42 INFO tools.Corona: Number of Threads: 1
17/08/28 00:47:42 INFO tools.Corona: Mode: offline
17/08/28 00:47:42 INFO tools.Corona: Number of Volumes: 1.
17/08/28 00:47:42 INFO tools.Corona: Number of Buckets per Volume: 1.
17/08/28 00:47:42 INFO tools.Corona: Number of Keys per Bucket: 1000.
17/08/28 00:47:42 INFO rpc.OzoneRpcClient: Creating Volume: vol-0-05000, with 
wwei as owner and quota set to 1152921504606846976 bytes.
17/08/28 00:47:42 INFO tools.Corona: Starting progress bar Thread.
...
ERROR tools.Corona: Exception while adding key: key-251-19293 in bucket: 
bucket-0-34960 of volume: vol-0-05000.
java.io.IOException: Exception getting XceiverClient.
at 
org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:156)
at 
org.apache.hadoop.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:122)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.getFromKsmKeyInfo(ChunkGroupOutputStream.java:289)
at 
org.apache.hadoop.ozone.client.rpc.OzoneRpcClient.createKey(OzoneRpcClient.java:487)
at 
org.apache.hadoop.ozone.tools.Corona$OfflineProcessor.run(Corona.java:352)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.IllegalStateException: failed to create a child event loop
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at 
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
at 
org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:144)
... 9 more
Caused by: java.lang.IllegalStateException: failed to create a child event loop
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:68)
at 
io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:61)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:52)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:44)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:36)
at org.apache.hadoop.scm.XceiverClient.connect(XceiverClient.java:76)
at 
org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:151)
at 
org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:145)
at 
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
... 12 more
Caused by: io.netty.channel.ChannelException: failed to open a new selector
at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:128)
at io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:120)
at 
io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)
... 25 more
Caused by: java.io.IOException: Too many open files
at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
at sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:130)
at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:69)
at 
sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36

[jira] [Created] (HDFS-12366) Ozone: Refactor KSM metadata class names to avoid confusion

2017-08-28 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12366:
--

 Summary: Ozone: Refactor KSM metadata class names to avoid 
confusion
 Key: HDFS-12366
 URL: https://issues.apache.org/jira/browse/HDFS-12366
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Trivial


Propose to rename 2 classes in package {{org.apache.hadoop.ozone.ksm}}

* MetadataManager -> KsmMetadataManager
* MetadataManagerImpl -> KsmMetadataManagerImpl

this is to avoid confusions with ozone metadata store classes, such as 
{{MetadataKeyFilters}}, {{MetadataStore}} and {{MetadataStoreBuilder}} etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12365) Ozone: ListVolume displays incorrect createdOn time when the volume was created by OzoneRpcClient

2017-08-28 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12365:
--

 Summary: Ozone: ListVolume displays incorrect createdOn time when 
the volume was created by OzoneRpcClient
 Key: HDFS-12365
 URL: https://issues.apache.org/jira/browse/HDFS-12365
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Reproduce steps

1. Create a key in ozone with corona (this delegates the call to 
OzoneRpcClient), e.g

{code}
[wwei@ozone1 hadoop-3.0.0-beta1-SNAPSHOT]$ ./bin/hdfs corona -numOfThreads 1 
-numOfVolumes 1 -numOfBuckets 1 -numOfKeys 1
{code}

2. Run listVolume

{code}
[wwei@ozone1 hadoop-3.0.0-beta1-SNAPSHOT]$ ./bin/hdfs oz -listVolume 
http://localhost:9864 -user wwei
{
  "owner" : {
"name" : "wwei"
  },
  "quota" : {
"unit" : "TB",
"size" : 1048576
  },
  "volumeName" : "vol-0-31437",
  "createdOn" : "Thu, 01 Jan 1970 00:00:00 GMT",
  "createdBy" : null
}
{
  "owner" : {
"name" : "wwei"
  },
  "quota" : {
"unit" : "TB",
"size" : 1048576
  },
  "volumeName" : "vol-0-38900",
  "createdOn" : "Thu, 01 Jan 1970 00:00:00 GMT",
  "createdBy" : null
}
{code}

Note, the time displayed in {{createdOn}} are both incorrect {{Thu, 01 Jan 1970 
00:00:00 GMT}}.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12362) Ozone: write deleted block to RAFT log for consensus on datanodes

2017-08-27 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12362:
--

 Summary: Ozone: write deleted block to RAFT log for consensus on 
datanodes
 Key: HDFS-12362
 URL: https://issues.apache.org/jira/browse/HDFS-12362
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang


Per discussion in HDFS-12282, we need to write deleted blocks info to RAFT log 
when that is ready, see more from [comment from Anu | 
https://issues.apache.org/jira/browse/HDFS-12282?focusedCommentId=16136022&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16136022].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12361) Ozone: SCM failed to start when a container metadata is empty

2017-08-27 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12361:
--

 Summary: Ozone: SCM failed to start when a container metadata is 
empty
 Key: HDFS-12361
 URL: https://issues.apache.org/jira/browse/HDFS-12361
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, scm
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


When I run tests to create keys via corona, sometimes it left some containers 
with empty metadata. This might also happen when SCM stopped at some point that 
metadata was not yet written. When this happens, we got following error and SCM 
could not be started

{noformat}
17/08/27 20:10:57 WARN datanode.DataNode: Unexpected exception in block pool 
Block pool BP-821804790-172.16.165.133-1503887277256 (Datanode Uuid 
7ee16a59-9604-406e-a0f8-6f44650a725b) service to 
ozone1.fyre.ibm.com/172.16.165.133:8111
java.lang.NullPointerException
at 
org.apache.hadoop.ozone.container.common.helpers.ContainerData.getFromProtBuf(ContainerData.java:66)
at 
org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.readContainerInfo(ContainerManagerImpl.java:210)
at 
org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:158)
at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:99)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:77)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:1592)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:409)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:783)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:286)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
at java.lang.Thread.run(Thread.java:745)
{noformat}

We should add a NPE check and mark such containers as inactive without failing 
the SCM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12354) Improve the throttle algorithm in Datanode BlockDeletingService

2017-08-25 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12354:
--

 Summary: Improve the throttle algorithm in Datanode 
BlockDeletingService 
 Key: HDFS-12354
 URL: https://issues.apache.org/jira/browse/HDFS-12354
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


{{BlockDeletingService}} is a per-datanode container block deleting service 
takes in charge of the "real" deletion of ozone blocks. It spawns a worker 
thread per container and delete blocks/chunks from disk as background threads. 
The number of threads currently is throttled by 
{{ozone.block.deleting.container.limit.per.interval}}, but there is a potential 
problem. Containers are sorted so it always fetch same of containers, we need 
to fix this by creating an API in {{ContainerManagerImpl}} to get a shuffled 
list of containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-12039) Ozone: Implement update volume owner in ozone shell

2017-08-17 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12039.

   Resolution: Fixed
Fix Version/s: HDFS-7240

> Ozone: Implement update volume owner in ozone shell
> ---
>
> Key: HDFS-12039
> URL: https://issues.apache.org/jira/browse/HDFS-12039
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>    Reporter: Weiwei Yang
>Assignee: Lokesh Jain
> Fix For: HDFS-7240
>
>
> Ozone shell command {{updateVolume}} should support to update the owner of a 
> volume, using following syntax
> {code}
> hdfs oz -updateVolume http://ozone1.fyre.ibm.com:9864/volume-wwei-0 -owner 
> xyz -root
> {code}
> this could work from rest api, following command could change the volume 
> owner to {{www}}
> {code}
> curl -X PUT -H "Date: Mon, 26 Jun 2017 04:23:30 GMT" -H "x-ozone-version: v1" 
> -H "x-ozone-user:www" -H "Authorization:OZONE root" 
> http://ozone1.fyre.ibm.com:9864/volume-wwei-0
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-12307) Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails

2017-08-16 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12307.

Resolution: Duplicate
  Assignee: Weiwei Yang

> Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails
> ---
>
> Key: HDFS-12307
> URL: https://issues.apache.org/jira/browse/HDFS-12307
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>    Reporter: Weiwei Yang
>    Assignee: Weiwei Yang
>
> It seems this UT constantly fails with following error
> {noformat}
> org.apache.hadoop.ozone.web.exceptions.OzoneException: Exception getting 
> XceiverClient.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:119)
>   at 
> com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createUsingDefault(StdValueInstantiator.java:243)
>   at 
> com.fasterxml.jackson.databind.deser.std.ThrowableDeserializer.deserializeFromObject(ThrowableDeserializer.java:146)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:133)
>   at 
> com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1579)
>   at 
> com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1200)
>   at 
> org.apache.hadoop.ozone.web.exceptions.OzoneException.parse(OzoneException.java:248)
>   at 
> org.apache.hadoop.ozone.web.client.OzoneBucket.executeGetKey(OzoneBucket.java:395)
>   at 
> org.apache.hadoop.ozone.web.client.OzoneBucket.getKey(OzoneBucket.java:321)
>   at 
> org.apache.hadoop.ozone.web.client.TestKeys.runTestPutAndGetKeyWithDnRestart(TestKeys.java:288)
>   at 
> org.apache.hadoop.ozone.web.client.TestKeys.testPutAndGetKeyWithDnRestart(TestKeys.java:265)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12307) Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails

2017-08-15 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12307:
--

 Summary: Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails
 Key: HDFS-12307
 URL: https://issues.apache.org/jira/browse/HDFS-12307
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


It seems this UT constantly fails with following error

{noformat}
org.apache.hadoop.ozone.web.exceptions.OzoneException: Exception getting 
XceiverClient.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:119)
at 
com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createUsingDefault(StdValueInstantiator.java:243)
at 
com.fasterxml.jackson.databind.deser.std.ThrowableDeserializer.deserializeFromObject(ThrowableDeserializer.java:146)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:133)
at 
com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1579)
at 
com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1200)
at 
org.apache.hadoop.ozone.web.exceptions.OzoneException.parse(OzoneException.java:248)
at 
org.apache.hadoop.ozone.web.client.OzoneBucket.executeGetKey(OzoneBucket.java:395)
at 
org.apache.hadoop.ozone.web.client.OzoneBucket.getKey(OzoneBucket.java:321)
at 
org.apache.hadoop.ozone.web.client.TestKeys.runTestPutAndGetKeyWithDnRestart(TestKeys.java:288)
at 
org.apache.hadoop.ozone.web.client.TestKeys.testPutAndGetKeyWithDnRestart(TestKeys.java:265)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12283) Ozone: DeleteKey-5: Implement SCM DeletedBlockLog

2017-08-09 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12283:
--

 Summary: Ozone: DeleteKey-5: Implement SCM DeletedBlockLog
 Key: HDFS-12283
 URL: https://issues.apache.org/jira/browse/HDFS-12283
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, scm
Reporter: Weiwei Yang
Assignee: Weiwei Yang


The DeletedBlockLog is a persisted log in SCM to keep tracking container blocks 
which are under deletion. It maintains info about under-deletion container 
blocks that notified by KSM, and the state how it is processed. We can use 
RocksDB to implement the 1st version of the log, the schema looks like

||TxID||ContainerName||Block List||ProcessedCount||
|0|c1|b1,b2,b3|0|
|1|c2|b1|3|
|2|c2|b2, b3|-1|

Some explanations

# TxID is an incremental long value transaction ID for ONE container and 
multiple blocks
# Container name is the name of the container
# Block list is a list of block IDs
# ProcessedCount is the number of times SCM has sent this record to datanode, 
it represents the "state" of the transaction, it is in range of \[-1, 5\], -1 
means the transaction eventually failed after some retries, 5 is the max number 
times of retries.

We need to define {{DeletedBlockLog}} as an interface and implement this with 
RocksDB {{MetadataStore}} as the first version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12282) Ozone: DeleteKey-4: SCM periodically sends block deletion message to datanode via HB and handles response

2017-08-08 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12282:
--

 Summary: Ozone: DeleteKey-4: SCM periodically sends block deletion 
message to datanode via HB and handles response
 Key: HDFS-12282
 URL: https://issues.apache.org/jira/browse/HDFS-12282
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ozone, scm
Reporter: Weiwei Yang
Assignee: Weiwei Yang


This is the task 3 in the design doc, implements the SCM to datanode 
interactions. Including

# SCM sends block deletion message via HB to datanode
# datanode changes block state to deleting when processes the HB response
# datanode sends deletion ACKs back to SCM
# SCM handles ACKs and removes blocks in DB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12246) Ozone: potential thread leaks

2017-08-02 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12246:
--

 Summary: Ozone: potential thread leaks
 Key: HDFS-12246
 URL: https://issues.apache.org/jira/browse/HDFS-12246
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


Per discussion in HDFS-12163, there might be some places potentially leaks 
threads, we will use this jira to track the work to fix those leaks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12235) Ozone: DeleteKey-3: KSM SCM block deletion message and ACK interactions

2017-07-31 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12235:
--

 Summary: Ozone: DeleteKey-3: KSM SCM block deletion message and 
ACK interactions
 Key: HDFS-12235
 URL: https://issues.apache.org/jira/browse/HDFS-12235
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


KSM and SCM interaction for delete key operation, both KSM and SCM stores key 
state info in a backlog, KSM needs to scan this log and send block-deletion 
command to SCM, once SCM is fully aware of the message, KSM removes the key 
completely from namespace.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12196) Ozone: DeleteKey-2: Implement container recycling service to delete stale blocks at background

2017-07-25 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12196:
--

 Summary: Ozone: DeleteKey-2: Implement container recycling service 
to delete stale blocks at background
 Key: HDFS-12196
 URL: https://issues.apache.org/jira/browse/HDFS-12196
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Implement a recycling service running on datanode to delete stale blocks 
periodically. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12195) Ozone: DeleteKey-1: KSM replies delete key request asynchronously

2017-07-25 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12195:
--

 Summary: Ozone: DeleteKey-1: KSM replies delete key request 
asynchronously
 Key: HDFS-12195
 URL: https://issues.apache.org/jira/browse/HDFS-12195
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Yuanbo Liu


We will implement delete key in ozone in multiple child tasks, this is 1 of the 
child task to implement client to scm communication. We need to do it in async 
manner, once key state is changed in ksm metadata, ksm is ready to reply client 
with a successful message. Actual deletes on other layers will happen some time 
later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12167) Ozone: Intermittent failure TestContainerPersistence#testListKey

2017-07-20 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12167:
--

 Summary: Ozone: Intermittent failure 
TestContainerPersistence#testListKey
 Key: HDFS-12167
 URL: https://issues.apache.org/jira/browse/HDFS-12167
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, test
Reporter: Weiwei Yang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store

2017-07-15 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12149:
--

 Summary: Ozone: RocksDB implementation of ozone metadata store
 Key: HDFS-12149
 URL: https://issues.apache.org/jira/browse/HDFS-12149
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-12069 added a general interface for ozone metadata store, we already have 
a leveldb implementation, this JIRA is to track the work of rocksdb 
implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties

2017-07-15 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12148:
--

 Summary: Ozone: TestOzoneConfigurationFields is failing because 
ozone-default.xml has some missing properties
 Key: HDFS-12148
 URL: https://issues.apache.org/jira/browse/HDFS-12148
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


Following properties added by HDFS-11493 is missing in ozone-default.xml

{noformat}
ozone.scm.max.container.report.threads
ozone.scm.container.report.processing.interval.seconds
ozone.scm.container.reports.wait.timeout.seconds
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12129) Ozone

2017-07-12 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12129:
--

 Summary: Ozone
 Key: HDFS-12129
 URL: https://issues.apache.org/jira/browse/HDFS-12129
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Weiwei Yang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-07 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12098:
--

 Summary: Ozone: Datanode is unable to register with scm if scm 
starts later
 Key: HDFS-12098
 URL: https://issues.apache.org/jira/browse/HDFS-12098
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ozone, scm
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Critical


Reproducing steps
# Start datanode
# Wait and see datanode state, it has connection issues, this is expected
# Start SCM, expecting datanode could connect to the scm and the state machine 
could transit to RUNNING. However in actual, its state transits to SHUTDOWN, 
datanode enters chill mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-12096) Ozone: Bucket versioning design document

2017-07-06 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12096.

Resolution: Duplicate

> Ozone: Bucket versioning design document
> 
>
> Key: HDFS-12096
> URL: https://issues.apache.org/jira/browse/HDFS-12096
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>    Reporter: Weiwei Yang
>    Assignee: Weiwei Yang
> Attachments: Ozone Bucket Versioning v1.pdf
>
>
> This JIRA is opened for the discussion of the bucket versioning design. 
> The bucket versioning is the ability to hold multiple versions objects of a 
> key in a bucket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12096) Ozone: Bucket versioning design document

2017-07-06 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12096:
--

 Summary: Ozone: Bucket versioning design document
 Key: HDFS-12096
 URL: https://issues.apache.org/jira/browse/HDFS-12096
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


This JIRA is opened for the discussion of the bucket versioning design. 
The bucket versioning is the ability to hold multiple versions objects of a key 
in a bucket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12085) Reconfigure namenode interval fails if the interval was set with time unit

2017-07-04 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12085:
--

 Summary: Reconfigure namenode interval fails if the interval was 
set with time unit
 Key: HDFS-12085
 URL: https://issues.apache.org/jira/browse/HDFS-12085
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, tools
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Critical


It fails when I set duration with time unit, e.g 5s, error

{noformat}
Reconfiguring status for node [localhost:8111]: started at Tue Jul 04 08:14:18 
PDT 2017 and finished at Tue Jul 04 08:14:18 PDT 2017.
FAILED: Change property dfs.heartbeat.interval
From: "3s"
To: "5s"
Error: For input string: "5s".
{noformat}

time unit support was added via HDFS-9847.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12082) BlockInvalidateLimit value is incorrectly set after namenode heartbeat interval reconfigured

2017-07-04 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12082:
--

 Summary: BlockInvalidateLimit value is incorrectly set after 
namenode heartbeat interval reconfigured 
 Key: HDFS-12082
 URL: https://issues.apache.org/jira/browse/HDFS-12082
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, namenode
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-1477 provides an option to reconfigured namenode heartbeat interval 
without restarting the namenode. When the heartbeat interval is reconfigured, 
{{blockInvalidateLimit}} gets recounted

{code}
 this.blockInvalidateLimit = Math.max(20 * (int) (intervalSeconds),
DFSConfigKeys.DFS_BLOCK_INVALIDATE_LIMIT_DEFAULT);
{code}

this doesn't honor the existing value set by {{dfs.block.invalidate.limit}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12081) Ozone: Add infoKey REST API document

2017-07-03 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12081:
--

 Summary: Ozone: Add infoKey REST API document
 Key: HDFS-12081
 URL: https://issues.apache.org/jira/browse/HDFS-12081
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-12030 has implemented {{infoKey}}, need to add appropriate document to 
{{OzoneRest.md}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12080) Ozone: Fix UT failure in TestOzoneConfigurationFields

2017-07-03 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12080:
--

 Summary: Ozone: Fix UT failure in TestOzoneConfigurationFields
 Key: HDFS-12080
 URL: https://issues.apache.org/jira/browse/HDFS-12080
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Priority: Minor


HDFS-12023 added a test case {{TestOzoneConfigurationFields}} to make sure 
ozone configuration properties is fully documented in ozone-default.xml. This 
is currently failing because

1. ozone-default.xml has 1 property not used anywhere

{code}
ozone.scm.internal.bind.host
{code}

2. Some cblock properties are missing in ozone-default.xml

{code}
  dfs.cblock.scm.ipaddress
  dfs.cblock.scm.port
  dfs.cblock.jscsi-address
  dfs.cblock.service.rpc-bind-host
  dfs.cblock.jscsi.rpc-bind-host
{code}

this needs to be fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12079) Description of dfs.block.invalidate.limit is incorrect in hdfs-default.xml

2017-07-03 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12079:
--

 Summary: Description of dfs.block.invalidate.limit is incorrect in 
hdfs-default.xml
 Key: HDFS-12079
 URL: https://issues.apache.org/jira/browse/HDFS-12079
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Weiwei Yang
Assignee: Weiwei Yang


The description of property {{dfs.block.invalidate.limit}} in hdfs-default.xml 
is

{{noformat}}
Limit on the list of invalidated block list kept by the Namenode.
{{noformat}}

this seems not correct that would confuse user.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12078) Add time unit to the description of property dfs.namenode.stale.datanode.interval in hdfs-default.xml

2017-07-02 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12078:
--

 Summary: Add time unit to the description of property 
dfs.namenode.stale.datanode.interval in hdfs-default.xml
 Key: HDFS-12078
 URL: https://issues.apache.org/jira/browse/HDFS-12078
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Trivial


The description of property {{dfs.namenode.stale.datanode.interva}} in 
hdfs-default.xml doesn't mention about the time unit, we should add that to 
avoid confusing users.

I have reviewed all properties in hdfs-default.xml, this is the only one 
property causes such confusion, user should be able to easily figure out the 
appropriate time unit for properties by either
* Specified by property name, e.g dfs.namenode.full.block.report.lease.length.ms
* Specified by property value with time unit suffix, e.g 
dfs.blockreport.initialDelay=0s
* Explained by description of the property, e.g 
dfs.namenode.safemode.extension=3000, description: Determines extension of safe 
mode in milliseconds ...

change to the property name and value will be an incompatible change, to 
minimize the impact, propose to simply add the time unit in the description. 
And this should be only one property needs the fix in hdfs-default.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12069) Ozone: Create a general abstraction for metadata store

2017-06-29 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12069:
--

 Summary: Ozone: Create a general abstraction for metadata store
 Key: HDFS-12069
 URL: https://issues.apache.org/jira/browse/HDFS-12069
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Create a general abstraction for metadata store so that we can plug other key 
value store to host ozone metadata. Currently only levelDB is implemented, we 
want to support RocksDB as it provides more production ready features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12053) Ozone: ozone server should create missing metadata directory if it has permission to

2017-06-28 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12053:
--

 Summary: Ozone: ozone server should create missing metadata 
directory if it has permission to
 Key: HDFS-12053
 URL: https://issues.apache.org/jira/browse/HDFS-12053
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


Datanode state machine right now simple fails if container metadata directory 
is missing, it is better to create the directory if it has permission to. This 
is extremely useful at a fresh setup, usually we set 
{{ozone.container.metadata.dirs}} to be under same parent of 
{{dfs.datanode.data.dir}}. E.g

* /hadoop/hdfs/data
* /hadoop/hdfs/scm

if I don't pre-setup /hadoop/hdfs/scm/repository, ozone could not be started.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12047) Ozone: Add REST API documentation

2017-06-26 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12047:
--

 Summary: Ozone: Add REST API documentation
 Key: HDFS-12047
 URL: https://issues.apache.org/jira/browse/HDFS-12047
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Add ozone rest api documentation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12039) Ozone: Implement update volume owner in ozone shell

2017-06-26 Thread Weiwei Yang (JIRA)

Weiwei Yang created HDFS-12039:
--

 Summary: Ozone: Implement update volume owner in ozone shell
 Key: HDFS-12039
 URL: https://issues.apache.org/jira/browse/HDFS-12039
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


Ozone shell command {{updateVolume}} should support to update the owner of a 
volume, using following syntax

{code}
hdfs oz -updateVolume http://ozone1.fyre.ibm.com:9864/volume-wwei-0 -owner xyz 
-root
{code}

this could work from rest api, following command could change the volume owner 
to {{www}}

{code}
curl -X PUT -H "Date: Mon, 26 Jun 2017 04:23:30 GMT" -H "x-ozone-version: v1" 
-H "x-ozone-user:www" -H "Authorization:OZONE root" 
http://ozone1.fyre.ibm.com:9864/volume-wwei-0
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

1 2 >

1 - 100 of 157 matches

Mail list logo