Submarine Sync Up Schedule Update

2019-10-07 Thread Wangda Tan
Hi all,
As the Submarine community is fast-growing, and many contributors outside
of the Mandarin-speaking community have interests to participate. we're
thinking to add the sync up for English speakers back.

What I just updated is making the community sync up in Mandarin and in
English every other week.

The next week will be our first community sync up in English returned. So I
recommend everybody who missed community sync up to join. And Submarine
community members will do some demos, introductions, Q to that.

Please see the schedule below:
https://calendar.google.com/calendar/b/3?cid=aGFkb29wLmNvbW11bml0eS5zeW5jLnVwQGdtYWlsLmNvbQ

Any thoughts about the proposal? Please share with us in either case.

If you want to know more about Submarine you can go to
https://github.com/apache/hadoop-submarine to check the latest document. It
should include many things such as contributor doc, design, architecture,
etc.

Thanks,
Wangda Tan


[NEED HELP] Hadoop 3.x Production Deployment Can be Publicly Talk About?

2019-09-24 Thread Wangda Tan
Hi devs and users,

Tomorrow (sorry for the short notice) we will do a presentation at Strata
Data Conf @ NY for a Community update of Hadoop 3.x. I'm thinking to create
a slide about existing production deployment on Hadoop 3.x. Basically, I
want to put a logo wall with a list of big names so we can encourage more
users to upgrade to 3.x.

I knew there are tons of users are on 3.x already, but only a few of them
have public slides, I don't get a permission to put other non-public use
cases to the slide. So if you are:
- Using 3.x in production. (Ideally large scale, using some new features or
running in new environment like on cloud, using GPU, etc.).
- When I saying 3.x it could be Apache Hadoop 3.x, HDP 3.x, CDH 6.x or any
distribution with Apache Hadoop 3.x as base.

Please respond your company name w/ logo (it it is not too obvious) to this
email (either public or private to me). If you could include a short
summary (which can be publicly shared) of the use cases that will be very
helpful.

If the number of companies responded is too low, I may skip putting a logo
wall.

Hope to get your feedbacks soon.

Thanks,
Wangda Tan


Re: Hadoop Community Sync Up Schedule

2019-08-23 Thread Wangda Tan
Hi all,

I just updated schedules, which takes effect next week. now it looks like:

[image: image.png]

I made sure that no two YARN sync ups nor two HDFS sync ups on the same day.

Please check the calendar to see if you have any questions:
https://calendar.google.com/calendar/b/3?cid=aGFkb29wLmNvbW11bml0eS5zeW5jLnVwQGdtYWlsLmNvbQ
<https://calendar.google.com/calendar/b/3?cid=aGFkb29wLmNvbW11bml0eS5zeW5jLnVwQGdtYWlsLmNvbQ>

Thanks,
Wangda



On Fri, Aug 23, 2019 at 1:50 AM Matt Foley  wrote:

> Wangda and Eric,
> We can express the intent, I think, by scheduling two recurring meetings:
> - monthly, on the 2nd Wednesday, and
> - monthly, on the 4th Wednesday.
>
> This is pretty easy to understand, and not too onerous to maintain.
> But I’m okay with simple bi-weekly too.
>
> I’m neutral on 10 vs 11am, PDT.  But do all participants presently use
> Daylight Savings and change on the same date?  Because I can’t conveniently
> do 9am PST, so if it needs to stay fixed in, say, India Timezone, then 11am
> PDT / 10am PST would be best.
> —Matt
>
> On Aug 21, 2019, at 7:33 AM, epa...@apache.org wrote:
>
> Let's go with bi-weekly (every 2 weeks). Sometimes this gives us 3
> sync-ups in one month, which I think is fine.
> -Eric Payne
>
> On Wednesday, August 21, 2019, 5:01:52 AM CDT, Wangda Tan <
> wheele...@gmail.com <mailto:wheele...@gmail.com>> wrote:
> >
> > For folks in other US time zones: how about 11am PDT, is it better or
> 10am
> > PDT will be better? I will be fine with both.
> >
> > Hi Matt,
> >
> > Thanks for mentioning this issue, this is the exactly issue I saw 藍.
> >
> > Basically there’re two options:
> >
> > - a. weekly, bi-weekly (for odd/even week) and every four months.
> > - b. weekly, 1st/3rd week or 2nd/4th week, x-th week monthly.
> >
> > I’m not sure which one is easier for people to understand as the issue
> you
> > mentioned.
> >
> > After thinking about it. I prefer a. since it is more consistent for
> > audience and not disrupted because of calendar.
> >
> > If we choose a. I will redo the proposal and make it aligns with a.
> >
> > Thoughts?
> >
> > Thanks,
> > Wangda
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org  common-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> <mailto:common-dev-h...@hadoop.apache.org>
>


Re: Hadoop Community Sync Up Schedule

2019-08-23 Thread Wangda Tan
Sounds good, let me make the changes to do simply bi-weekly then.

I will update it tonight if possible.

Best,
Wangda

On Fri, Aug 23, 2019 at 1:50 AM Matt Foley  wrote:

> Wangda and Eric,
> We can express the intent, I think, by scheduling two recurring meetings:
> - monthly, on the 2nd Wednesday, and
> - monthly, on the 4th Wednesday.
>
> This is pretty easy to understand, and not too onerous to maintain.
> But I’m okay with simple bi-weekly too.
>
> I’m neutral on 10 vs 11am, PDT.  But do all participants presently use
> Daylight Savings and change on the same date?  Because I can’t conveniently
> do 9am PST, so if it needs to stay fixed in, say, India Timezone, then 11am
> PDT / 10am PST would be best.
> —Matt
>
> On Aug 21, 2019, at 7:33 AM, epa...@apache.org wrote:
>
> Let's go with bi-weekly (every 2 weeks). Sometimes this gives us 3
> sync-ups in one month, which I think is fine.
> -Eric Payne
>
> On Wednesday, August 21, 2019, 5:01:52 AM CDT, Wangda Tan <
> wheele...@gmail.com <mailto:wheele...@gmail.com>> wrote:
> >
> > For folks in other US time zones: how about 11am PDT, is it better or
> 10am
> > PDT will be better? I will be fine with both.
> >
> > Hi Matt,
> >
> > Thanks for mentioning this issue, this is the exactly issue I saw 藍.
> >
> > Basically there’re two options:
> >
> > - a. weekly, bi-weekly (for odd/even week) and every four months.
> > - b. weekly, 1st/3rd week or 2nd/4th week, x-th week monthly.
> >
> > I’m not sure which one is easier for people to understand as the issue
> you
> > mentioned.
> >
> > After thinking about it. I prefer a. since it is more consistent for
> > audience and not disrupted because of calendar.
> >
> > If we choose a. I will redo the proposal and make it aligns with a.
> >
> > Thoughts?
> >
> > Thanks,
> > Wangda
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org  common-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> <mailto:common-dev-h...@hadoop.apache.org>
>


Re: Hadoop Community Sync Up Schedule

2019-08-23 Thread Wangda Tan
Sounds good, let me make the changes to do simply bi-weekly then.

I will update it tonight if possible.

Best,
Wangda

On Fri, Aug 23, 2019 at 1:50 AM Matt Foley  wrote:

> Wangda and Eric,
> We can express the intent, I think, by scheduling two recurring meetings:
> - monthly, on the 2nd Wednesday, and
> - monthly, on the 4th Wednesday.
>
> This is pretty easy to understand, and not too onerous to maintain.
> But I’m okay with simple bi-weekly too.
>
> I’m neutral on 10 vs 11am, PDT.  But do all participants presently use
> Daylight Savings and change on the same date?  Because I can’t conveniently
> do 9am PST, so if it needs to stay fixed in, say, India Timezone, then 11am
> PDT / 10am PST would be best.
> —Matt
>
> On Aug 21, 2019, at 7:33 AM, epa...@apache.org wrote:
>
> Let's go with bi-weekly (every 2 weeks). Sometimes this gives us 3
> sync-ups in one month, which I think is fine.
> -Eric Payne
>
> On Wednesday, August 21, 2019, 5:01:52 AM CDT, Wangda Tan <
> wheele...@gmail.com <mailto:wheele...@gmail.com>> wrote:
> >
> > For folks in other US time zones: how about 11am PDT, is it better or
> 10am
> > PDT will be better? I will be fine with both.
> >
> > Hi Matt,
> >
> > Thanks for mentioning this issue, this is the exactly issue I saw 藍.
> >
> > Basically there’re two options:
> >
> > - a. weekly, bi-weekly (for odd/even week) and every four months.
> > - b. weekly, 1st/3rd week or 2nd/4th week, x-th week monthly.
> >
> > I’m not sure which one is easier for people to understand as the issue
> you
> > mentioned.
> >
> > After thinking about it. I prefer a. since it is more consistent for
> > audience and not disrupted because of calendar.
> >
> > If we choose a. I will redo the proposal and make it aligns with a.
> >
> > Thoughts?
> >
> > Thanks,
> > Wangda
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org  common-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> <mailto:common-dev-h...@hadoop.apache.org>
>


Re: Hadoop Community Sync Up Schedule

2019-08-21 Thread Wangda Tan
For folks in other US time zones: how about 11am PDT, is it better or 10am
PDT will be better? I will be fine with both.

Hi Matt,

Thanks for mentioning this issue, this is the exactly issue I saw 藍.

Basically there’re two options:

- a. weekly, bi-weekly (for odd/even week) and every four months.
- b. weekly, 1st/3rd week or 2nd/4th week, x-th week monthly.

I’m not sure which one is easier for people to understand as the issue you
mentioned.

After thinking about it. I prefer a. since it is more consistent for
audience and not disrupted because of calendar.

If we choose a. I will redo the proposal and make it aligns with a.

Thoughts?

Thanks,
Wangda



On Wed, Aug 21, 2019 at 4:11 AM Matt Foley  wrote:

> Hi Wangda, thanks for this. A question about the schedule correction:
>
> > > 1) In the proposal, repeats are not properly. (I used bi-weekly
> instead of
> > 2nd/4th week as repeat frequency). I'd like to fix the frequency on Thu
> and
> > it will take effect starting next week.
>
> I understand that “bi-weekly” may cause 3 meetings in some months, and
> that’s not the intent.
> Is it your intent to schedule “Wednesday of the 2nd and 4th weeks of each
> month” or “2nd and 4th Wednesday of each month”? — which of course are not
> the same, for months that start on Thu-Sat...
>
> As a side note, I find that my calendar program cannot express “Wednesday
> of the second week of the month”, but does know how to do “second Wednesday
> of the month”.
>
> But I’ll schedule them one-by-one if I have to, I just wanted clarity. :-)
>
> Thanks,
> —Matt
>
>
> On Aug 19, 2019, at 8:31 PM, Wangda Tan  wrote:
>
> Hi folks,
>
> We have run community sync up for 1.5 months. I spoke to folks offline and
> got some feedback. Here's a summary of what I've observed from sync ups and
> talked to organizers.
>
> Following sync ups have very good participants (sometimes 10+ folks
> joined):
> - YARN/MR monthly sync up in APAC (Mandarin)
> - HDFS monthly sync up in APAC (Mandarin).
> - Submarine weekly sync up in APAC (Mandarin).
>
> Following sync up have OK-ish participants: (3-5 folks joined).
> - Storage monthly sync up in APAC (English)
> - Storage bi-weekly sync up in US (English)
> - YARN bi-weekly sync up in US (English).
>
> Following sync ups don't have good participants: (Skipped a couple of
> times).
> - YARN monthly sync up in APAC (English).
> - Submarine bi-weekly sync up in US (English).
>
> *So I'd like to propose the following changes and fixes of the schedule: *
> 1) Cancel the YARN/MR monthly sync up in APAC (English). Folks from APAC
> who speak English can choose to join the US session.
> 2) Cancel the Submarine bi-weekly sync up in US (English). Now Submarine
> developers and users are fast-growing in Mandarin-speaking areas. We can
> resume the sync if we do see demands from English-speaking areas.
> 3) Update the US sync up time from 9AM to 10AM PDT. 9AM is too early for
> most of the west-cost folks.
>
> *Following are fixes for the schedule:  *
> 1) In the proposal, repeats are not properly. (I used bi-weekly instead of
> 2nd/4th week as repeat frequency). I'd like to fix the frequency on Thu and
> it will take effect starting next week.
>
> Overall, thanks for everybody who participated in the sync ups. I do see
> community contributions grow in the last one month!
>
> Any thoughts about the proposal?
>
> Thanks,
> Wangda
>
>
>
>
> On Thu, Jul 25, 2019 at 11:53 AM 俊平堵  wrote:
>
> > Hi Folks,
> >
> > Kindly remind that we have YARN+MR APAC sync today, and you are
> > welcome to join:
> >
> >
> > Time and Date:07/25 1:00 pm (CST Time)
> >
> > Zoom link:Zoom | https://cloudera.zoom.us/j/880548968
> >
> > Summary:
> >
> >
> https://docs.google.com/document/d/1GY55sXrekVd-aDyRY7uzaX0hMDPyh3T-AL1kUY2TI5M
> >
> >
> > Thanks,
> >
> >
> > Junping
> >
> >
> >
> > Wangda Tan  于2019年6月28日周五 上午2:57写道:
> >
> >> Hi folks,
> >>
> >> Here's the Hadoop Community Sync Up proposal/schedule:
> >>
> >
> https://docs.google.com/document/d/1GfNpYKhNUERAEH7m3yx6OfleoF3MqoQk3nJ7xqHD9nY/edit#heading=h.xh4zfwj8ppmn
> >>
> >> And here's calendar file:
> >>
> >>
> >>
> >
> https://calendar.google.com/calendar/ical/hadoop.community.sync.up%40gmail.com/public/basic.ics
> >>
> >> We gave it a try this week for YARN+MR and Submarine sync, feedbacks
> from
> >> participants seems pretty good, lots of new information shared during
> > sync
> >> up, and companies are using/developing Hadoop can better know each
> other.
> >>
> >> Next week there're 4 community sync-ups (Two Submarine for different
> >> timezones, one YARN+MR, one storage), please join to whichever you're
> >> interested:
> >>
> >> [image: image.png]
> >>
> >> Zoom info and notes can be found in the Google calendar invitation.
> >>
> >> Thanks,
> >> Wangda
> >>
> >
>
> --
Sent from Gmail Mobile


Re: Hadoop Community Sync Up Schedule

2019-08-19 Thread Wangda Tan
Hi folks,

We have run community sync up for 1.5 months. I spoke to folks offline and
got some feedback. Here's a summary of what I've observed from sync ups and
talked to organizers.

Following sync ups have very good participants (sometimes 10+ folks
joined):
- YARN/MR monthly sync up in APAC (Mandarin)
- HDFS monthly sync up in APAC (Mandarin).
- Submarine weekly sync up in APAC (Mandarin).

Following sync up have OK-ish participants: (3-5 folks joined).
- Storage monthly sync up in APAC (English)
- Storage bi-weekly sync up in US (English)
- YARN bi-weekly sync up in US (English).

Following sync ups don't have good participants: (Skipped a couple of
times).
- YARN monthly sync up in APAC (English).
- Submarine bi-weekly sync up in US (English).

*So I'd like to propose the following changes and fixes of the schedule: *
1) Cancel the YARN/MR monthly sync up in APAC (English). Folks from APAC
who speak English can choose to join the US session.
2) Cancel the Submarine bi-weekly sync up in US (English). Now Submarine
developers and users are fast-growing in Mandarin-speaking areas. We can
resume the sync if we do see demands from English-speaking areas.
3) Update the US sync up time from 9AM to 10AM PDT. 9AM is too early for
most of the west-cost folks.

*Following are fixes for the schedule:  *
1) In the proposal, repeats are not properly. (I used bi-weekly instead of
2nd/4th week as repeat frequency). I'd like to fix the frequency on Thu and
it will take effect starting next week.

Overall, thanks for everybody who participated in the sync ups. I do see
community contributions grow in the last one month!

Any thoughts about the proposal?

Thanks,
Wangda




On Thu, Jul 25, 2019 at 11:53 AM 俊平堵  wrote:

> Hi Folks,
>
>  Kindly remind that we have YARN+MR APAC sync today, and you are
> welcome to join:
>
>
> Time and Date:07/25 1:00 pm (CST Time)
>
> Zoom link:Zoom | https://cloudera.zoom.us/j/880548968
>
> Summary:
>
> https://docs.google.com/document/d/1GY55sXrekVd-aDyRY7uzaX0hMDPyh3T-AL1kUY2TI5M
>
>
> Thanks,
>
>
> Junping
>
>
>
> Wangda Tan  于2019年6月28日周五 上午2:57写道:
>
> > Hi folks,
> >
> > Here's the Hadoop Community Sync Up proposal/schedule:
> >
> https://docs.google.com/document/d/1GfNpYKhNUERAEH7m3yx6OfleoF3MqoQk3nJ7xqHD9nY/edit#heading=h.xh4zfwj8ppmn
> >
> > And here's calendar file:
> >
> >
> >
> https://calendar.google.com/calendar/ical/hadoop.community.sync.up%40gmail.com/public/basic.ics
> >
> > We gave it a try this week for YARN+MR and Submarine sync, feedbacks from
> > participants seems pretty good, lots of new information shared during
> sync
> > up, and companies are using/developing Hadoop can better know each other.
> >
> > Next week there're 4 community sync-ups (Two Submarine for different
> > timezones, one YARN+MR, one storage), please join to whichever you're
> > interested:
> >
> > [image: image.png]
> >
> > Zoom info and notes can be found in the Google calendar invitation.
> >
> > Thanks,
> > Wangda
> >
>


Re: Hadoop Community Sync Up Schedule

2019-08-19 Thread Wangda Tan
Hi folks,

We have run community sync up for 1.5 months. I spoke to folks offline and
got some feedback. Here's a summary of what I've observed from sync ups and
talked to organizers.

Following sync ups have very good participants (sometimes 10+ folks
joined):
- YARN/MR monthly sync up in APAC (Mandarin)
- HDFS monthly sync up in APAC (Mandarin).
- Submarine weekly sync up in APAC (Mandarin).

Following sync up have OK-ish participants: (3-5 folks joined).
- Storage monthly sync up in APAC (English)
- Storage bi-weekly sync up in US (English)
- YARN bi-weekly sync up in US (English).

Following sync ups don't have good participants: (Skipped a couple of
times).
- YARN monthly sync up in APAC (English).
- Submarine bi-weekly sync up in US (English).

*So I'd like to propose the following changes and fixes of the schedule: *
1) Cancel the YARN/MR monthly sync up in APAC (English). Folks from APAC
who speak English can choose to join the US session.
2) Cancel the Submarine bi-weekly sync up in US (English). Now Submarine
developers and users are fast-growing in Mandarin-speaking areas. We can
resume the sync if we do see demands from English-speaking areas.
3) Update the US sync up time from 9AM to 10AM PDT. 9AM is too early for
most of the west-cost folks.

*Following are fixes for the schedule:  *
1) In the proposal, repeats are not properly. (I used bi-weekly instead of
2nd/4th week as repeat frequency). I'd like to fix the frequency on Thu and
it will take effect starting next week.

Overall, thanks for everybody who participated in the sync ups. I do see
community contributions grow in the last one month!

Any thoughts about the proposal?

Thanks,
Wangda




On Thu, Jul 25, 2019 at 11:53 AM 俊平堵  wrote:

> Hi Folks,
>
>  Kindly remind that we have YARN+MR APAC sync today, and you are
> welcome to join:
>
>
> Time and Date:07/25 1:00 pm (CST Time)
>
> Zoom link:Zoom | https://cloudera.zoom.us/j/880548968
>
> Summary:
>
> https://docs.google.com/document/d/1GY55sXrekVd-aDyRY7uzaX0hMDPyh3T-AL1kUY2TI5M
>
>
> Thanks,
>
>
> Junping
>
>
>
> Wangda Tan  于2019年6月28日周五 上午2:57写道:
>
> > Hi folks,
> >
> > Here's the Hadoop Community Sync Up proposal/schedule:
> >
> https://docs.google.com/document/d/1GfNpYKhNUERAEH7m3yx6OfleoF3MqoQk3nJ7xqHD9nY/edit#heading=h.xh4zfwj8ppmn
> >
> > And here's calendar file:
> >
> >
> >
> https://calendar.google.com/calendar/ical/hadoop.community.sync.up%40gmail.com/public/basic.ics
> >
> > We gave it a try this week for YARN+MR and Submarine sync, feedbacks from
> > participants seems pretty good, lots of new information shared during
> sync
> > up, and companies are using/developing Hadoop can better know each other.
> >
> > Next week there're 4 community sync-ups (Two Submarine for different
> > timezones, one YARN+MR, one storage), please join to whichever you're
> > interested:
> >
> > [image: image.png]
> >
> > Zoom info and notes can be found in the Google calendar invitation.
> >
> > Thanks,
> > Wangda
> >
>


Re: Any thoughts making Submarine a separate Apache project?

2019-08-13 Thread Wangda Tan
Hi folks,

I just drafted a proposal which is targetted to send to PMC list and board
for thoughts. Thanks Xun Liu for providing thoughts about future
directions/architecture, and reviews from Keqiu Hu.

Title: "Apache Submarine for Apache Top-Level Project"

https://docs.google.com/document/d/1kE_f-r-ANh9qOeapdPwQPHhaJTS7IMiqDQAS8ESi4TA/edit

I plan to send it to PMC list/board next Monday, so any
comments/suggestions are welcome.

Thanks,
Wangda


On Tue, Jul 30, 2019 at 6:01 PM 俊平堵  wrote:

> Thanks Vinod for these great suggestions. I agree most of your comments
> above.
>  "For the Apache Hadoop community, this will be treated simply as
> code-change and so need a committer +1?". IIUC, this should be treated as
> feature branch merge, so may be 3 committer +1 is needed here according to
> https://hadoop.apache.org/bylaws.html?
>
> bq. Can somebody who have cycles and been on the ASF lists for a while
> look into the process here?
> I can check with ASF members who has experience on this if no one haven't
> yet.
>
> Thanks,
>
> Junping
>
> Vinod Kumar Vavilapalli  于2019年7月29日周一 下午9:46写道:
>
>> Looks like there's a meaningful push behind this.
>>
>> Given the desire is to fork off Apache Hadoop, you'd want to make sure
>> this enthusiasm turns into building a real, independent but more
>> importantly a sustainable community.
>>
>> Given that there were two official releases off the Apache Hadoop
>> project, I doubt if you'd need to go through the incubator process. Instead
>> you can directly propose a new TLP at ASF board. The last few times this
>> happened was with ORC, and long before that with Hive, HBase etc. Can
>> somebody who have cycles and been on the ASF lists for a while look into
>> the process here?
>>
>> For the Apache Hadoop community, this will be treated simply as
>> code-change and so need a committer +1? You can be more gently by formally
>> doing a vote once a process doc is written down.
>>
>> Back to the sustainable community point, as part of drafting this
>> proposal, you'd definitely want to make sure all of the Apache Hadoop
>> PMC/Committers can exercise their will to join this new project as
>> PMC/Committers respectively without any additional constraints.
>>
>> Thanks
>> +Vinod
>>
>> > On Jul 25, 2019, at 1:31 PM, Wangda Tan  wrote:
>> >
>> > Thanks everybody for sharing your thoughts. I saw positive feedbacks
>> from
>> > 20+ contributors!
>> >
>> > So I think we should move it forward, any suggestions about what we
>> should
>> > do?
>> >
>> > Best,
>> > Wangda
>> >
>> > On Mon, Jul 22, 2019 at 5:36 PM neo  wrote:
>> >
>> >> +1, This is neo from TiDB & TiKV community.
>> >> Thanks Xun for bring this up.
>> >>
>> >> Our CNCF project's open source distributed KV storage system TiKV,
>> >> Hadoop submarine's machine learning engine helps us to optimize data
>> >> storage,
>> >> helping us solve some problems in data hotspots and data shuffers.
>> >>
>> >> We are ready to improve the performance of TiDB in our open source
>> >> distributed relational database TiDB and also using the hadoop
>> submarine
>> >> machine learning engine.
>> >>
>> >> I think if submarine can be independent, it will develop faster and
>> better.
>> >> Thanks to the hadoop community for developing submarine!
>> >>
>> >> Best Regards,
>> >> neo
>> >> www.pingcap.com / https://github.com/pingcap/tidb /
>> >> https://github.com/tikv
>> >>
>> >> Xun Liu  于2019年7月22日周一 下午4:07写道:
>> >>
>> >>> @adam.antal
>> >>>
>> >>> The submarine development team has completed the following
>> preparations:
>> >>> 1. Established a temporary test repository on Github.
>> >>> 2. Change the package name of hadoop submarine from
>> org.hadoop.submarine
>> >> to
>> >>> org.submarine
>> >>> 3. Combine the Linkedin/TonY code into the Hadoop submarine module;
>> >>> 4. On the Github docked travis-ci system, all test cases have been
>> >> tested;
>> >>> 5. Several Hadoop submarine users completed the system test using the
>> >> code
>> >>> in this repository.
>> >>>
>> >>> 赵欣  于2019年7月22日周一 上午9:38写道:
>> >>>
>> >>>> Hi
>> >>>>
>

Re: Any thoughts making Submarine a separate Apache project?

2019-07-29 Thread Wangda Tan
Thanks Vinod, the proposal to make it be TLP definitely a great suggestion.
I will draft a proposal and keep the thread posted.

Best,
Wangda

On Mon, Jul 29, 2019 at 3:46 PM Vinod Kumar Vavilapalli 
wrote:

> Looks like there's a meaningful push behind this.
>
> Given the desire is to fork off Apache Hadoop, you'd want to make sure
> this enthusiasm turns into building a real, independent but more
> importantly a sustainable community.
>
> Given that there were two official releases off the Apache Hadoop project,
> I doubt if you'd need to go through the incubator process. Instead you can
> directly propose a new TLP at ASF board. The last few times this happened
> was with ORC, and long before that with Hive, HBase etc. Can somebody who
> have cycles and been on the ASF lists for a while look into the process
> here?
>
> For the Apache Hadoop community, this will be treated simply as
> code-change and so need a committer +1? You can be more gently by formally
> doing a vote once a process doc is written down.
>
> Back to the sustainable community point, as part of drafting this
> proposal, you'd definitely want to make sure all of the Apache Hadoop
> PMC/Committers can exercise their will to join this new project as
> PMC/Committers respectively without any additional constraints.
>
> Thanks
> +Vinod
>
> > On Jul 25, 2019, at 1:31 PM, Wangda Tan  wrote:
> >
> > Thanks everybody for sharing your thoughts. I saw positive feedbacks from
> > 20+ contributors!
> >
> > So I think we should move it forward, any suggestions about what we
> should
> > do?
> >
> > Best,
> > Wangda
> >
> > On Mon, Jul 22, 2019 at 5:36 PM neo  wrote:
> >
> >> +1, This is neo from TiDB & TiKV community.
> >> Thanks Xun for bring this up.
> >>
> >> Our CNCF project's open source distributed KV storage system TiKV,
> >> Hadoop submarine's machine learning engine helps us to optimize data
> >> storage,
> >> helping us solve some problems in data hotspots and data shuffers.
> >>
> >> We are ready to improve the performance of TiDB in our open source
> >> distributed relational database TiDB and also using the hadoop submarine
> >> machine learning engine.
> >>
> >> I think if submarine can be independent, it will develop faster and
> better.
> >> Thanks to the hadoop community for developing submarine!
> >>
> >> Best Regards,
> >> neo
> >> www.pingcap.com / https://github.com/pingcap/tidb /
> >> https://github.com/tikv
> >>
> >> Xun Liu  于2019年7月22日周一 下午4:07写道:
> >>
> >>> @adam.antal
> >>>
> >>> The submarine development team has completed the following
> preparations:
> >>> 1. Established a temporary test repository on Github.
> >>> 2. Change the package name of hadoop submarine from
> org.hadoop.submarine
> >> to
> >>> org.submarine
> >>> 3. Combine the Linkedin/TonY code into the Hadoop submarine module;
> >>> 4. On the Github docked travis-ci system, all test cases have been
> >> tested;
> >>> 5. Several Hadoop submarine users completed the system test using the
> >> code
> >>> in this repository.
> >>>
> >>> 赵欣  于2019年7月22日周一 上午9:38写道:
> >>>
> >>>> Hi
> >>>>
> >>>> I am a teacher at Southeast University (https://www.seu.edu.cn/). We
> >> are
> >>>> a major in electrical engineering. Our teaching teams and students use
> >>>> bigoop submarine for big data analysis and automation control of
> >>> electrical
> >>>> equipment.
> >>>>
> >>>> Many thanks to the hadoop community for providing us with machine
> >>> learning
> >>>> tools like submarine.
> >>>>
> >>>> I wish hadoop submarine is getting better and better.
> >>>>
> >>>>
> >>>> ==
> >>>> 赵欣
> >>>> 东南大学电气工程学院
> >>>>
> >>>> -
> >>>>
> >>>> Zhao XIN
> >>>>
> >>>> School of Electrical Engineering
> >>>>
> >>>> ==
> >>>> 2019-07-18
> >>>>
> >>>>
> >>>> *From:* Xun Liu 
> >>>> *Date:* 2019-07-18 09:46
> >>>> *To:* xinzhao 
> >>>> *Subject:* Fwd: 

Re: Any thoughts making Submarine a separate Apache project?

2019-07-25 Thread Wangda Tan
Thanks everybody for sharing your thoughts. I saw positive feedbacks from
20+ contributors!

So I think we should move it forward, any suggestions about what we should
do?

Best,
Wangda

On Mon, Jul 22, 2019 at 5:36 PM neo  wrote:

> +1, This is neo from TiDB & TiKV community.
> Thanks Xun for bring this up.
>
> Our CNCF project's open source distributed KV storage system TiKV,
> Hadoop submarine's machine learning engine helps us to optimize data
> storage,
> helping us solve some problems in data hotspots and data shuffers.
>
> We are ready to improve the performance of TiDB in our open source
> distributed relational database TiDB and also using the hadoop submarine
> machine learning engine.
>
> I think if submarine can be independent, it will develop faster and better.
> Thanks to the hadoop community for developing submarine!
>
> Best Regards,
> neo
> www.pingcap.com / https://github.com/pingcap/tidb /
> https://github.com/tikv
>
> Xun Liu  于2019年7月22日周一 下午4:07写道:
>
> > @adam.antal
> >
> > The submarine development team has completed the following preparations:
> > 1. Established a temporary test repository on Github.
> > 2. Change the package name of hadoop submarine from org.hadoop.submarine
> to
> > org.submarine
> > 3. Combine the Linkedin/TonY code into the Hadoop submarine module;
> > 4. On the Github docked travis-ci system, all test cases have been
> tested;
> > 5. Several Hadoop submarine users completed the system test using the
> code
> > in this repository.
> >
> > 赵欣  于2019年7月22日周一 上午9:38写道:
> >
> > > Hi
> > >
> > > I am a teacher at Southeast University (https://www.seu.edu.cn/). We
> are
> > > a major in electrical engineering. Our teaching teams and students use
> > > bigoop submarine for big data analysis and automation control of
> > electrical
> > > equipment.
> > >
> > > Many thanks to the hadoop community for providing us with machine
> > learning
> > > tools like submarine.
> > >
> > > I wish hadoop submarine is getting better and better.
> > >
> > >
> > > ==
> > > 赵欣
> > > 东南大学电气工程学院
> > >
> > > -
> > >
> > > Zhao XIN
> > >
> > > School of Electrical Engineering
> > >
> > > ==
> > > 2019-07-18
> > >
> > >
> > > *From:* Xun Liu 
> > > *Date:* 2019-07-18 09:46
> > > *To:* xinzhao 
> > > *Subject:* Fwd: Re: Any thoughts making Submarine a separate Apache
> > > project?
> > >
> > >
> > > -- Forwarded message -
> > > 发件人: dashuiguailu...@gmail.com 
> > > Date: 2019年7月17日周三 下午3:17
> > > Subject: Re: Re: Any thoughts making Submarine a separate Apache
> project?
> > > To: Szilard Nemeth , runlin zhang <
> > > runlin...@gmail.com>
> > > Cc: Xun Liu , common-dev <
> > common-...@hadoop.apache.org>,
> > > yarn-dev , hdfs-dev <
> > > hdfs-...@hadoop.apache.org>, mapreduce-dev <
> > > mapreduce-...@hadoop.apache.org>, submarine-dev <
> > > submarine-...@hadoop.apache.org>
> > >
> > >
> > > +1 ,Good idea, we are very much looking forward to it.
> > >
> > > --
> > > dashuiguailu...@gmail.com
> > >
> > >
> > > *From:* Szilard Nemeth 
> > > *Date:* 2019-07-17 14:55
> > > *To:* runlin zhang 
> > > *CC:* Xun Liu ; Hadoop Common
> > > ; yarn-dev ;
> > > Hdfs-dev ; mapreduce-dev
> > > ; submarine-dev
> > > 
> > > *Subject:* Re: Any thoughts making Submarine a separate Apache project?
> > > +1, this is a very great idea.
> > > As Hadoop repository has already grown huge and contains many
> projects, I
> > > think in general it's a good idea to separate projects in the early
> > phase.
> > >
> > >
> > > On Wed, Jul 17, 2019, 08:50 runlin zhang  wrote:
> > >
> > > > +1 ,That will be great !
> > > >
> > > > > 在 2019年7月10日,下午3:34,Xun Liu  写道:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This is Xun Liu contributing to the Submarine project for deep
> > learning
> > > > > workloads running with big data workloads together on Hadoop
> > clusters.
> > > > >
> > > > > There are a bunch of integrations of Submarine to other projects
> are
> > > > > finished or going on, such as Apache Zeppelin, TonY, Azkaban. The
> > next
> > > > step
> > > > > of Submarine is going to integrate with more projects like Apache
> > > Arrow,
> > > > > Redis, MLflow, etc. & be able to handle end-to-end machine learning
> > use
> > > > > cases like model serving, notebook management, advanced training
> > > > > optimizations (like auto parameter tuning, memory cache
> optimizations
> > > for
> > > > > large datasets for training, etc.), and make it run on other
> > platforms
> > > > like
> > > > > Kubernetes or natively on Cloud. LinkedIn also wants to donate TonY
> > > > project
> > > > > to Apache so we can put Submarine and TonY together to the same
> > > codebase
> > > > > (Page #30.
> > > > >
> > > >
> > >
> >
> https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-tony-tensorflow-on-yarn-and-beyond#30
> > > > > ).
> > > > >
> > > > > This expands the scope of the 

Hadoop Community Sync Up Schedule

2019-06-27 Thread Wangda Tan
Hi folks,

Here's the Hadoop Community Sync Up proposal/schedule:
https://docs.google.com/document/d/1GfNpYKhNUERAEH7m3yx6OfleoF3MqoQk3nJ7xqHD9nY/edit#heading=h.xh4zfwj8ppmn

And here's calendar file:

https://calendar.google.com/calendar/ical/hadoop.community.sync.up%40gmail.com/public/basic.ics

We gave it a try this week for YARN+MR and Submarine sync, feedbacks from
participants seems pretty good, lots of new information shared during sync
up, and companies are using/developing Hadoop can better know each other.

Next week there're 4 community sync-ups (Two Submarine for different
timezones, one YARN+MR, one storage), please join to whichever you're
interested:

[image: image.png]

Zoom info and notes can be found in the Google calendar invitation.

Thanks,
Wangda


Re: Agenda & More Information about Hadoop Community Meetup @ Palo Alto, June 26

2019-06-25 Thread Wangda Tan
A friendly reminder,

The meetup will take place tomorrow at 9:00 AM PDT to 4:00 PM PDT.

The address is: 395 Page Mill Rd, Palo Alto, CA 94306
We’ll be in the Bigtop conference room on the 1st floor. Go left after
coming through the main entrance, and it will be on the right.

Zoom: https://cloudera.zoom.us/j/606607666

Please let me know if you have any questions. If you haven't RSVP yet,
please go ahead and RSVP so we can better prepare food, seat, etc.

Thanks,
Wangda

On Wed, Jun 19, 2019 at 4:49 PM Wangda Tan  wrote:

> Hi All,
>
> I want to let you know that we have confirmed most of the agenda for
> Hadoop Community Meetup. It will be a whole day event.
>
> Agenda & Dial-In info because see below, *please RSVP
> at https://www.meetup.com/Hadoop-Contributors/events/262055924/
> <https://www.meetup.com/Hadoop-Contributors/events/262055924/>*
>
> Huge thanks to Daniel Templeton, Wei-Chiu Chuang, Christina Vu for helping
> with organizing and logistics.
>
> *Please help to promote meetup information on Twitter, LinkedIn, etc.
> Appreciated! *
>
> Best,
> Wangda
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *AM:9:00: Arrival and check-in--9:30 -
> 10:15:-Talk: Hadoop storage in cloud-native
> environmentsAbstract: Hadoop is a mature storage system but designed years
> before the cloud-native movement. Kubernetes and other cloud-native tools
> are emerging solutions for containerized environments but sometimes they
> require different approaches.In this presentation we would like to share
> our experiences to run Apache Hadoop Ozone in Kubernetes and the connection
> point to other cloud-native ecosystem elements. We will compare the
> benefits and drawbacks to use Kubernetes and Hadoop storage together and
> show our current achievements and future plans.Speaker: Marton Elek
> (Cloudera)10:20 - 11:00:--Talk: Selective Wire Encryption In
> HDFSAbstract: Wire data encryption is a key component of the Hadoop
> Distributed File System (HDFS). However, such encryption enforcement comes
> in as an all-or-nothing feature. In our use case at LinkedIn, we would like
> to selectively expose fast unencrypted access to fully managed internal
> clients, which can be trusted, while only expose encrypted access to
> clients outside of the trusted circle with higher security risks. That way
> we minimize performance overhead for trusted internal clients while still
> securing data from potential outside threats. Our design extends HDFS
> NameNode to run on multiple ports, connecting to different NameNode ports
> would end up with different levels of encryption protection. This
> protection then gets enforced for both NameNode RPC and the subsequent data
> transfers to/from DataNode. This approach comes with minimum operational
> and performance overhead.Speaker: Konstantin Shvachko (LinkedIn), Chen
> Liang (LinkedIn)11:10 - 11:55:-Talk: YuniKorn: Next Generation
> Scheduling for YARN and K8sAbstract: We will talk about our open source
> work - YuniKorn scheduler project (Y for YARN, K for K8s, uni- for Unified)
> brings long-wanted features such as hierarchical queues, fairness between
> users/jobs/queues, preemption to Kubernetes; and it brings service
> scheduling enhancements to YARN. Any improvements to this scheduler can
> benefit both Kubernetes and YARN community.Speaker: Wangda Tan
> (Cloudera)PM:12:00 - 12:55 Lunch Break (Provided by
> Cloudera)1:00 -
> 1:25---Talk: Yarn Efficiency at UberAbstract: We will present the
> work done at Uber to improve YARN cluster utilization and job SOA with
> elastic resource management, low compute workload on passive datacenter,
> preemption, larger container, etc. We will also go through YARN upgrade in
> order to adopt new features and talk about the challenges.Speaker: Aihua Xu
> (Uber), Prashant Golash (Uber)1:30 - 2:10 One more
> talk-2:20 - 4:00---BoF sessions &
> Breakout Sessions & Group discussions: Talk about items like JDK 11
> support, next releases (2.10.0, 3.3.0, etc.), Hadoop on Cloud, etc.4:00:
> Reception provided by
> Cloudera.==Join Zoom
> Meetinghttps://cloudera.zoom.us/j/116816195
> <https://cloudera.zoom.us/j/116816195>*
>


Agenda & More Information about Hadoop Community Meetup @ Palo Alto, June 26

2019-06-19 Thread Wangda Tan
Hi All,

I want to let you know that we have confirmed most of the agenda for Hadoop
Community Meetup. It will be a whole day event.

Agenda & Dial-In info because see below, *please RSVP
at https://www.meetup.com/Hadoop-Contributors/events/262055924/
<https://www.meetup.com/Hadoop-Contributors/events/262055924/>*

Huge thanks to Daniel Templeton, Wei-Chiu Chuang, Christina Vu for helping
with organizing and logistics.

*Please help to promote meetup information on Twitter, LinkedIn, etc.
Appreciated! *

Best,
Wangda

























































*AM:9:00: Arrival and check-in--9:30 -
10:15:-Talk: Hadoop storage in cloud-native
environmentsAbstract: Hadoop is a mature storage system but designed years
before the cloud-native movement. Kubernetes and other cloud-native tools
are emerging solutions for containerized environments but sometimes they
require different approaches.In this presentation we would like to share
our experiences to run Apache Hadoop Ozone in Kubernetes and the connection
point to other cloud-native ecosystem elements. We will compare the
benefits and drawbacks to use Kubernetes and Hadoop storage together and
show our current achievements and future plans.Speaker: Marton Elek
(Cloudera)10:20 - 11:00:--Talk: Selective Wire Encryption In
HDFSAbstract: Wire data encryption is a key component of the Hadoop
Distributed File System (HDFS). However, such encryption enforcement comes
in as an all-or-nothing feature. In our use case at LinkedIn, we would like
to selectively expose fast unencrypted access to fully managed internal
clients, which can be trusted, while only expose encrypted access to
clients outside of the trusted circle with higher security risks. That way
we minimize performance overhead for trusted internal clients while still
securing data from potential outside threats. Our design extends HDFS
NameNode to run on multiple ports, connecting to different NameNode ports
would end up with different levels of encryption protection. This
protection then gets enforced for both NameNode RPC and the subsequent data
transfers to/from DataNode. This approach comes with minimum operational
and performance overhead.Speaker: Konstantin Shvachko (LinkedIn), Chen
Liang (LinkedIn)11:10 - 11:55:-Talk: YuniKorn: Next Generation
Scheduling for YARN and K8sAbstract: We will talk about our open source
work - YuniKorn scheduler project (Y for YARN, K for K8s, uni- for Unified)
brings long-wanted features such as hierarchical queues, fairness between
users/jobs/queues, preemption to Kubernetes; and it brings service
scheduling enhancements to YARN. Any improvements to this scheduler can
benefit both Kubernetes and YARN community.Speaker: Wangda Tan
(Cloudera)PM:12:00 - 12:55 Lunch Break (Provided by
Cloudera)1:00 -
1:25---Talk: Yarn Efficiency at UberAbstract: We will present the
work done at Uber to improve YARN cluster utilization and job SOA with
elastic resource management, low compute workload on passive datacenter,
preemption, larger container, etc. We will also go through YARN upgrade in
order to adopt new features and talk about the challenges.Speaker: Aihua Xu
(Uber), Prashant Golash (Uber)1:30 - 2:10 One more
talk-2:20 - 4:00---BoF sessions &
Breakout Sessions & Group discussions: Talk about items like JDK 11
support, next releases (2.10.0, 3.3.0, etc.), Hadoop on Cloud, etc.4:00:
Reception provided by
Cloudera.==Join Zoom
Meetinghttps://cloudera.zoom.us/j/116816195
<https://cloudera.zoom.us/j/116816195>*


[ANNOUNCE] Apache Hadoop 3.1.2 release

2019-02-07 Thread Wangda Tan
It gives us great pleasure to announce that the Apache Hadoop community has
voted to release Apache Hadoop 3.1.2.

IMPORTANT NOTES

3.1.2 is the second stable release of 3.1 line which is considered to be
production-ready.

Hadoop 3.1.2 brings a number of enhancements.

The Hadoop community fixed 325 JIRAs [1] in total as part of the 3.1.2
release. Of these fixes:

Apache Hadoop 3.1.2 contains a number of significant features and
enhancements. A few of them are noted below.

- Nvidia-docker-plugin v2 support for GPU support on YARN.
- YARN service upgrade improvements and bug fixes.
- YARN UIv2 improvements and bug fixes.
- AliyunOSS related improvements and bug fixes.
- Docker on YARN support related improvements and bug fixes.

Please see the Hadoop 3.1.2 CHANGES for the detailed list of issues
resolved. The release news is posted on the Apache Hadoop website too, you
can go to the downloads section.

Many thanks to everyone who contributed to the release, and everyone in the
Apache Hadoop community! The release is a result of direct and indirect
efforts from many contributors, listed below are the those who contributed
directly by submitting patches and/or reporting issues. (148 contributors,
Sorted by ID)

BilwaST, Charo Zhang, GeLiXin, Harsha1206, Huachao, Jim_Brennan, LiJinglun,
Naganarasimha, OrDTesters, RANith, Rakesh_Shah, Ray Burgemeestre, Sen Zhao,
SoumyaPN, SouryakantaDwivedy, Tao Yang, Zian Chen, abmodi, adam.antal,
ajayydv, ajisakaa, akhilpb, akhilsnaik, amihalyi, arpitagarwal, aw,
ayushtkn, banditka, belugabehr, benlau, bibinchundatt, billie.rinaldi,
boky01, bolke, borisvu, botong, brahmareddy, briandburton, bsteinbach,
candychencan, ccondit-target, charanh, cheersyang, cltlfcjin, collinma,
crh, csingh, csun, daisuke.kobayashi, daryn, dibyendu_hadoop,
dineshchitlangia, ebadger, eepayne, elgoiri, erwaman, eyang, fengchuang,
ferhui, fly_in_gis, gabor.bota, gezapeti, gsaha, haibochen, hexiaoqiao,
hfyang20071, hgadre, jeagles, jhung, jiangjianfei, jianliang.wu,
jira.shegalov, jiwq, jlowe, jojochuang, jonBoone, kanwaljeets, karams,
kennethlnnn, kgyrtkirk, kihwal, knanasi, kshukla, laszlok, leftnoteasy,
leiqiang, liaoyuxiangqin, linyiqun, ljain, lukmajercak, maniraj...@gmail.com,
masatana, nandakumar131, oliverhuh...@gmail.com, oshevchenko, pbacsko,
peruguusha, photogamrun, pj.fanning, prabham, pradeepambati, pranay_singh,
revans2, rkanter, rohithsharma, shaneku...@gmail.com, shubham.dewan,
shuzirra, shv, simonprewo, sinago, smeng, snemeth, sodonnell,
sreenivasulureddy, ssath...@hortonworks.com, ssulav, ste...@apache.org,
study, suma.shivaprasad, sunilg, surendrasingh, tangzhankun, tarunparimi,
tasanuma0829, templedf, thinktaocs, tlipcon, tmarquardt, trjianjianjiao,
uranus, varun_saxena, vinayrpet, vrushalic, wilfreds,
write2kish...@gmail.com, wujinhu, xiaochen, xiaoheipangzi, xkrogen,
yangjiandan, yeshavora, yiran, yoelee, yuzhih...@gmail.com, zichensun,
zvenczel

Wangda Tan and Sunil Govind

[1] JIRA query: project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution =
Fixed AND fixVersion = 3.1.2 ORDER BY key ASC, updated ASC, created DESC,
priority DESC


Re: YARN SLS improving ideas

2018-08-14 Thread Wangda Tan
+1, I think you should go ahead, file JIRA and work on that if you have
interests :)

Thanks,
Wangda

On Thu, Aug 9, 2018 at 6:22 AM Sichen Zhao  wrote:

> Hi,
> I am a developer from AliBaBa China, i recently used SLS for scheduling
> simulation, SLS currently supports multidimensional resource input(CPU, mem
> , other resources: disk). But SLS can't take scheduling request, which is
> currently widely used in YARN, as input, so Placement Constraints and
> attributes are not supported.
>
> So what i wanna improve the SLS: Add scheduling emulation for scheduling
> request resource format.
>
> The specific work is as follows:
> 1. Add input support for the scheduling request format.
> 2. Add support for scheduling request resource format in NMSim.
> 3. Adding scheduling request support for the Capacity Scheduler(maybe it
> is already done in current version).
>
> What do you think about my ideas?
>
>
> Best Regards
> Sichen Zhao
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>


[ANNOUNCE] Apache Hadoop 3.1.1 release

2018-08-09 Thread Wangda Tan
It gives me great pleasure to announce that the Apache Hadoop community has
voted to release Apache Hadoop 3.1.1.

Hadoop 3.1.1 is the first stable maintenance release for the year 2018 in
the Hadoop-3.1 line and brings a number of enhancements.

IMPORTANT NOTES

3.1.1 is the first stable release of 3.1 line which is production-ready.

The Hadoop community fixed 435 JIRAs [1] in total as part of the 3.1.1
release. Of these fixes:

   -

   60 in Hadoop Common
   -

   139 in HDFS
   -

   223 in YARN
   -

   13 in MapReduce

--

Apache Hadoop 3.1.1 contains a number of significant features and
enhancements. A few of them are noted below.


   -

   ENTRY_POINT support for Docker containers.
   -

   Restart policy support for YARN native services.
   -

   Capacity Scheduler: Intra-queue preemption for fairness ordering policy.
   -

   Stabilization works for schedulers, YARN service, docker support, etc.


Please see the Hadoop 3.1.1 CHANGES
<http://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/release/3.1.1/CHANGES.3.1.1.html>
for the detailed list of issues resolved. The release news is posted on the
Apache Hadoop website too, you can go to the downloads section.
<http://hadoop.apache.org/releases.html#Download>

--

Many thanks to everyone who contributed to the release, and everyone in the
Apache Hadoop community! The release is a result of direct and indirect
efforts from many contributors, listed below are the those who contributed
directly by submitting patches and reporting issues.

Abhishek Modi, Ajay Kumar, Akhil PB, Akira Ajisaka, Allen Wittenauer,
Anbang Hu, Andrew Wang, Arpit Agarwal, Atul Sikaria, BELUGA BEHR, Bharat
Viswanadham, Bibin A Chundatt, Billie Rinaldi, Bilwa S T, Botong Huang,
Brahma Reddy Battula, Brook Zhou, CR Hota, Chandni Singh, Chao Sun, Charan
Hebri, Chen Liang, Chetna Chaudhari, Chun Chen, Daniel Templeton, Davide
 Vergari, Dennis Huo, Dibyendu Karmakar, Ekanth Sethuramalingam, Eric
Badger, Eric Yang, Erik Krogen, Esfandiar Manii, Ewan Higgs, Gabor Bota,
Gang Li, Gang Xie, Genmao Yu, Gergely Novák, Gergo Repas, Giovanni Matteo
Fumarola, Gour Saha, Greg Senia, Haibo Yan, Hanisha Koneru, Hsin-Liang
Huang, Hu Ziqian, Istvan Fajth, Jack Bearden, Jason Lowe, Jeff Zhang, Jian
He, Jianchao Jia, Jiandan Yang , Jim Brennan, Jinglun, John Zhuge, Joseph
Fourny, K G Bakthavachalam, Karthik Palanisamy, Kihwal Lee, Kitti Nanasi,
Konstantin Shvachko, Lei (Eddy) Xu, LiXin Ge, Lokesh Jain, Lukas Majercak,
Miklos Szegedi, Mukul Kumar Singh, Namit Maheshwari, Nanda kumar, Nilotpal
Nandi, Pavel Avgustinov, Prabhu Joseph, Prasanth Jayachandran, Robert
Kanter, Rohith Sharma K S, Rushabh S Shah, Sailesh Patel, Sammi Chen, Sean
Mackrory, Sergey Shelukhin, Shane Kumpf, Shashikant Banerjee, Siyao Meng,
Sreenath Somarajapuram, Steve Loughran, Suma Shivaprasad, Sumana Sathish,
Sunil Govindan, Surendra Singh Lilhore, Szilard Nemeth, Takanobu Asanuma,
Tao Jie, Tao Yang, Ted Yu, Thomas Graves, Thomas Marquardt, Todd Lipcon,
Vinod Kumar Vavilapalli, Wangda Tan, Wei Yan, Wei-Chiu Chuang, Weiwei Yang,
Wilfred Spiegelenburg, Xiao Chen, Xiao Liang, Xintong Song, Xuan Gong, Yang
Wang, Yesha Vora, Yiqun Lin, Yiran Wu, Yongjun Zhang, Yuanbo Liu, Zian
Chen, Zoltan Haindrich, Zsolt Venczel, Zuoming Zhang, fang zhenyi, john
lilley, jwhitter, kyungwan nam, liaoyuxiangqin, liuhongtong, lujie, skrho,
yanghuafeng, yimeng, Íñigo Goiri.

Wangda Tan

[1] JIRA query: project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution =
Fixed AND fixVersion = 3.1.1 ORDER BY key ASC, updated ASC, created DESC,
priority DESC


Re: CapacityScheduler vs. FairScheduler

2016-07-07 Thread Wangda Tan
Hi folks,

I've a short write-up about feature-wise comparison between fair scheduler
and capacity scheduler for latest Hadoop.

https://wangda.live/2016/07/07/an-updated-feature-comparison-between-capacity-scheduler-and-fair-scheduler/

HTH,

- Wangda

On Sun, Jun 19, 2016 at 11:21 PM, sandeep vura 
wrote:

> Hi,
>
> I too have same doubt !! Please clarify.
>
> Regards,
> sandeep.v
>
> On Fri, Jun 10, 2016 at 6:08 AM, Alvin Chyan  wrote:
>
>> I have the same question.
>>
>> Thanks!
>>
>>
>> *Alvin Chyan*Lead Software Engineer, Data
>> 901 Marshall St, Suite 200, Redwood City, CA 94063
>>
>>
>> turn.com    |   @TurnPlatform
>> 
>>
>> This message is Turn Confidential, except for information included that
>> is already available to the public. If this message was sent to you
>> accidentally, please delete it.
>>
>> On Fri, Jun 3, 2016 at 11:04 AM, Lars Francke 
>> wrote:
>>
>>> Hi,
>>>
>>> I've been using Hadoop for years and have always just taken for granted
>>> that FairScheduler = Cloudera and CapacityScheduler = Hortonworks/Yahoo.
>>> There are some comparisons but all of them are years old and somewhat (if
>>> not entirely) outdated.
>>>
>>> The documentation doesn't really help and neither does the Javadoc. The
>>> code of both is fairly complex.
>>>
>>> So my question is: How do these two Schedulers really differ today? What
>>> are some features that one has that the other doesn't? Are there any
>>> fundamental differences (anymore)?
>>>
>>> Any insight is welcome.
>>>
>>> Thank you!
>>>
>>> Cheers,
>>> Lars
>>>
>>
>>
>


Re: FW: Is it valid to use userLimit in calculating maxApplicationsPerUser ?

2015-07-16 Thread Wangda Tan
I think it's not valid. Multiply it by ULF seems not reasonable, I think it
should be:

max(1, maxApplications * max(userlimit/100, 1/#activeUsers))

Assuming admin setups a very large ULF (e.g. 100), maxApplicationsPerUser
can be much more than maxApplications of a queue.

Also, multiply ULF to compute userAMResourceLimit seems not valid too, it
can lead to a single user can run too much applications than expected,
which we should avoid.

Thoughts?

Thanks,
Wangda

On Thu, Jul 16, 2015 at 12:53 AM, Naganarasimha G R (Naga) 
garlanaganarasi...@huawei.com wrote:

   Hi Folks ,
  Came across one scenario where in maxApplications @ cluster level(2
 node) was set to a low value like 10 and based on capacity configuration
 for a particular queue it was coming to 2 as value, but further while
 calculating maxApplicationsPerUser formula used is :

  *maxApplicationsPerUser = (int)(maxApplications * (userLimit / 100.0f) *
 userLimitFactor);*

  but the definition of *userLimit  *in the documentation is :
 *Each queue enforces a limit on the percentage of resources allocated to a
 user at any given time, if there is demand for resources. The user limit
 can vary between a minimum and maximum value. The the former (the minimum
 value) is set to this property value and the latter (the maximum value)
 depends on the number of users who have submitted applications. For e.g.,
 suppose the value of this property is 25. If two users have submitted
 applications to a queue, no single user can use more than 50% of the queue
 resources. If a third user submits an application, no single user can use
 more than 33% of the queue resources. With 4 or more users, no user can use
 more than 25% of the queues resources. A value of 100 implies no user
 limits are imposed. The default is 100. Value is specified as a integer.*

  So was wondering how a* minimum limit is made used in a formula to
 calculate max applications for a user*, suppose i set 
 yarn.scheduler.capacity.queue-path.minimum-user-limit-percent to *20 * 
 assuming
 at least 20% of queue at the minimum is available for a queue but based on
 the formula *maxApplicationsPerUser * is getting set to *zero. *
 According to the definition of the property max is based on the current
 no. of active users, so i feel this formula is wrong.
 *P.S. *userLimitFactor was configured as default 1, but what i am
 wondering is whether its valid to use it in combination with userlimit to
 find max apps per user.

  Please correct me if my understanding is wrong? if its a bug would like
 to raise and solve it .

  + Naga



Re: Capacity scheduler properties

2015-01-15 Thread Wangda Tan
You can check HDP 2.2's document:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/capacity_scheduler/index.html

HTH,
Wangda

On Thu, Jan 15, 2015 at 4:22 AM, Jakub Stransky stransky...@gmail.com
wrote:

 Hello,

 I am configuring capacity scheduler all seems ok but I cannot find what is
 the meaning of the following property

 yarn.scheduler.capacity.root.unfunded.capacity

 I just found that everywhere is set to 50 and description is No
 description.

 Can anybody clarify or point to where to find relevant documentation?

 Thx
 Jakub




Re: Questions about Capacity scheduler behavior

2014-11-04 Thread Wangda Tan
Hi Fabio,
To answer your questions:
1)
CS (capacity-scheduler) will do allocation from root to leaf, level by
level, and for queues with same parent, will do allocation in following way:
First will allocate queue with least used resource
If queues with same used resource, say root.qA used 1G, root.qB used 1G as
well, it will allocate resource by name of queue, so qA will go first.

2)
There're two parameters about capacity, one is capacity, another one is
maximum-capacity, queue can only allocate resource with multiple of
minimum-allocation of the cluster, and it will be = maximum-capacity of
the queue always

3)
I think 1) should already answer your question, queue will go first by used
resource, not percentage of used resource, so in your example, queue-A will
go first.

4)
Sorry, I'm not quite understand this question, could you do some
explanations about it?

5)
You were not misunderstand this example, this is what
capacity/maximum-capacity applied on queues. The maximum-capacity of queue
is used to do such resource provisioning for queues.

root.a.capacity = 50
root.a.maximum-capacity = 80
root.a.a1.capacity = 50
root.a.a1.maximum-capacity = 90

The guaranteed resource of a.a1 is a.capacity% *  a.a1.capacity% = 25%
The maximum resource of a.a1 is a.maximum-capacity% * a.a1.maximum-capacity
= 72% (0.8 * 0.9).

And as you said, a resource of child can use, will always = resource of
its parent.

If you want to let all leafQueues can leverage all resource in the cluster,
you can simply set all maximum-capacity of parent queues and leaf queues to
100.

Does this make sense to you?

Thanks,
Wangda

On Mon, Nov 3, 2014 at 6:30 AM, Fabio anyte...@gmail.com wrote:

 Hi guys, I'm posting this in the user mailing list since I got no reply in
 the Yarn-dev. I have to model as well as possible the capacity scheduler
 behavior and I have some questions, hope someone can help me with this. In
 the following I will consider all containers to be equal for simplicity:

 1) In case of multiple queues having the same level of assigned resources,
 what's the policy to decide which comes first in the resource allocation?

 2) Let's consider this configuration:
 We have a cluster hosting a total of 40 containers. We have 3 queues: A is
 configured to get 39% of cluster capacity, B also gets 39% and C gets 22%.
 The number of containers is going to be 15.6, 15.6 and 8.8 for A, B an C.
 Since we can't split a container, how does the Capacity scheduler round
 these values in a real case? Who gets the two contended containers? I may
 think they are considered as extra containers, thus shared upon need among
 the three queues. Is this correct?

 3) Let's say I have queues A and B. A is configured to get 20% (20
 containers) of the total cluster capacity (100 containers), B gets 80% (80
 containers). Capacity scheduler gives available resources firstly to the
 most under-served queue.
 In case A is using 10 containers and B is using 20, who is going to get
 the first available container? A is already using 50% of it's assigned
 capacity, B just 25%, but A has less containers than B... who is considered
 to be more under-served?

 4) Does the previous question make sense at all? Because I have a doubt
 that when I have free containers I will just serve requests as they arrive,
 possibly over-provisioning a queue (that is: if I get a container request
 for an app in A, I will give it a container since I don't know that after a
 few milliseconds I will get a new request from B, or vice versa). The
 previous question may have sense if there was some sort of buffer that is
 filled with incoming requests, due to the difficulty of serving them in
 real time, thus making the scheduler able to choose the request from the
 most under-served queue. Is this what happens?

 5) According to the example presented in Apache Hadoop YARN: Moving
 beyond MapReduce and Batch Processing with Apache Hadoop 2 about the
 resource allocation with the capacity scheduler, what I understood is that
 the chance for a leaf queue to get resources above it's assigned capacity
 is always upper-limited by the fraction of cluster capacity assigned to its
 first/closer parent queue. That is: if I am a leaf queue A1, I can only get
 at most the resources dedicated to my parent A, while I can't get the ones
 from B, sibling of A, even if it doesn't have any running application.
 Actually at first I thought this over-provisioning was not limited, and
 regardless of the queue configuration a single application could get the
 whole cluster (excluding per-application limits). Did I misunderstood the
 example?

 Thanks a lot

 Fabio



Re: Memory settings in hadoop YARN

2014-08-21 Thread Wangda Tan
Hi Narayanan,
I've read a great blog post by Rohit Bakhshi before, recommend it to you :
http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/. I
think most your questions are covered by this blog post. Please let me know
if you have more questions.

Thanks,
Wangda Tan



On Wed, Aug 20, 2014 at 12:12 PM, Narayanan K knarayana...@gmail.com
wrote:

 Hi

 We run our Pig jobs in Hadoop 0.23 which has the new YARN architecture.

 I had few questions on memory used by the jobs :

 We have following settings for memory.

 mapred.child.java.opts

 mapreduce.map.memory.mb

 mapreduce.reduce.memory.mb

 yarn.app.mapreduce.am.resource.mb

 yarn.app.mapreduce.am.command-opts


 1. I want to understand these settings to make better use of Hadoop
 cluster.
 2. How is Memory Allocated to Container ? Does any of the above
 settings result in change in container ?
 3. Any other memory settings we need to be aware of ?
 4. I heard there was a virtual memory and physical memory involved. Is
 there any propery documentation/guide that can make the memory
 management easy.

 Thanks
 Narayanan



Re: 100% CPU consumption by Resource Manager process

2014-08-18 Thread Wangda Tan
Hi Krishna,

4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms in
your configuration?
50

I think this config is problematic, too small heartbeat-interval will cause
NM contact RM too often. I would suggest you can set this value larger like
1000.

Thanks,
Wangda



On Wed, Aug 13, 2014 at 4:42 PM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi Wangda,
   Thanks for the reply, here are the details, please see if you could
 suggest anything.

 1) Number of nodes and running app in the cluster
 2 nodes, and I am running my own application that keeps asking for
 containers,
 a) running something on the containers,
 b) releasing the containers,
 c) ask for more containers with incremented priority value, and repeat the
 same process

 2) What's the version of your Hadoop?
 apache hadoop-2.4.0

 3) Have you set
 yarn.scheduler.capacity.schedule-asynchronously.enable=true?
 No

 4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms in
 your configuration?
 50




 On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan wheele...@gmail.com wrote:

 Hi Krishna,
 To get more understanding about the problem, could you please share
 following information:
 1) Number of nodes and running app in the cluster
 2) What's the version of your Hadoop?
 3) Have you set
 yarn.scheduler.capacity.schedule-asynchronously.enable=true?
 4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms
 in your configuration?

 Thanks,
 Wangda Tan



 On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   My YARN resource manager is consuming 100% CPU when I am running an
 application that is running for about 10 hours, requesting as many as 27000
 containers. The CPU consumption was very low at the starting of my
 application, and it gradually went high to over 100%. Is this a known issue
 or are we doing something wrong?

 Every dump of the EVent Processor thread is running
 LeafQueue::assignContainers() specifically the for loop below from
 LeafQueue.java and seems to be looping through some priority list.

 // Try to assign containers to applications in order
 for (FiCaSchedulerApp application : activeApplications) {
 ...
 // Schedule in priority order
 for (Priority priority : application.getPriorities()) {

 3XMTHREADINFO  ResourceManager Event Processor
 J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00,
 java/lang/Thread:0x8341D9A0, state:CW, prio=5
 3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false)
 3XMTHREADINFO1(native thread ID:0x4B64, native priority:0x5,
 native policy:UNKNOWN)
 3XMTHREADINFO2(native stack address range
 from:0x7F0313DF8000, to:0x7F0313E39000, size:0x41000)
 3XMCPUTIME   *CPU usage total: 42334.614623696 secs*
 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
 (0x4FE8)
 3XMTHREADINFO3   Java callstack:
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x8360DFE0,
 entry count: 1)
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x833B9280,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80,
 entry count: 2)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x834037C8,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
 Code))
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
 Code))
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run

Re: Synchronization among Mappers in map-reduce task

2014-08-12 Thread Wangda Tan
Hi Saurabh,
It's an interesting topic,

 So , here is the question , is it possible to make sure that when one of
the mapper tasks is writing to a file , other should wait until the first
one is finished. ? I read that all the mappers task don't interact with
each other

A simple way to do this is using HDFS namespace:
Create file using public FSDataOutputStream create(Path f, boolean
overwrite), overwrite=false. Only one mapper can successfully create file.

After write completed, the mapper will create a flag file like completed
in the same folder. Other mappers can wait for the completed file created.

 Is there any way to have synchronization between two independent map
reduce jobs?
I think ZK can do some complex synchronization here. Like mutex, master
election, etc.

Hope this helps,

Wangda Tan




On Tue, Aug 12, 2014 at 10:43 AM, saurabh jain sauravma...@gmail.com
wrote:

 Hi Folks ,

 I have been writing a map-reduce application where I am having an input
 file containing records and every field in the record is separated by some
 delimiter.

 In addition to this user will also provide a list of columns that he wants
 to lookup in a master properties file (stored in HDFS). If this columns
 (lets say it a key) is present in master properties file then get the
 corresponding value and update the key with this value and if the key is
 not present it in the master properties file then it will create a new
 value for this key and will write to this property file and will also
 update in the record.

 I have written this application , tested it and everything worked fine
 till now.

 *e.g :* *I/P Record :* This | is | the | test | record

 *Columns :* 2,4 (that means code will look up only field *is and test* in
 the master properties file.)

 Here , I have a question.

 *Q 1:* In the case when my input file is huge and it is splitted across
 the multiple mappers , I was getting the below mentioned exception where
 all the other mappers tasks were failing. *Also initially when I started
 the job my master properties file is empty.* In my code I have a check if
 this file (master properties) doesn't exist create a new empty file before
 submitting the job itself.

 e.g : If i have 4 splits of data , then 3 map tasks are failing. But after
 this all the failed map tasks restarts and finally the job become
 successful.

 So , *here is the question , is it possible to make sure that when one of
 the mapper tasks is writing to a file , other should wait until the first
 one is finished. ?* I read that all the mappers task don't interact with
 each other.

 Also what will happen in the scenario when I start multiple parallel
 map-reduce jobs and all of them working on the same properties files. *Is
 there any way to have synchronization between two independent map reduce
 jobs*?

 I have also read that ZooKeeper can be used in such scenarios , Is that
 correct ?


 Error: 
 com.techidiocy.hadoop.filesystem.api.exceptions.HDFSFileSystemException: 
 IOException - failed while appending data to the file -Failed to create file 
 [/user/cloudera/lob/master/bank.properties] for 
 [DFSClient_attempt_1407778869492_0032_m_02_0_1618418105_1] on client 
 [10.X.X.17], because this file is already being created by
 [DFSClient_attempt_1407778869492_0032_m_05_0_-949968337_1] on [10.X.X.17]
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2548)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2377)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2612)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2575)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)




Re: 100% CPU consumption by Resource Manager process

2014-08-12 Thread Wangda Tan
Hi Krishna,
To get more understanding about the problem, could you please share
following information:
1) Number of nodes and running app in the cluster
2) What's the version of your Hadoop?
3) Have you set
yarn.scheduler.capacity.schedule-asynchronously.enable=true?
4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms in
your configuration?

Thanks,
Wangda Tan



On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi,
   My YARN resource manager is consuming 100% CPU when I am running an
 application that is running for about 10 hours, requesting as many as 27000
 containers. The CPU consumption was very low at the starting of my
 application, and it gradually went high to over 100%. Is this a known issue
 or are we doing something wrong?

 Every dump of the EVent Processor thread is running
 LeafQueue::assignContainers() specifically the for loop below from
 LeafQueue.java and seems to be looping through some priority list.

 // Try to assign containers to applications in order
 for (FiCaSchedulerApp application : activeApplications) {
 ...
 // Schedule in priority order
 for (Priority priority : application.getPriorities()) {

 3XMTHREADINFO  ResourceManager Event Processor
 J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00,
 java/lang/Thread:0x8341D9A0, state:CW, prio=5
 3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false)
 3XMTHREADINFO1(native thread ID:0x4B64, native priority:0x5,
 native policy:UNKNOWN)
 3XMTHREADINFO2(native stack address range
 from:0x7F0313DF8000, to:0x7F0313E39000, size:0x41000)
 3XMCPUTIME   *CPU usage total: 42334.614623696 secs*
 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
 (0x4FE8)
 3XMTHREADINFO3   Java callstack:
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x8360DFE0,
 entry count: 1)
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x833B9280,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80,
 entry count: 2)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x834037C8,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
 Code))
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
 Code))
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
 4XESTACKTRACEat java/lang/Thread.run(Thread.java:853)

 3XMTHREADINFO  ResourceManager Event Processor
 J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00,
 java/lang/Thread:0x8341D9A0, state:CW, prio=5
 3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false)
 3XMTHREADINFO1(native thread ID:0x4B64, native priority:0x5,
 native policy:UNKNOWN)
 3XMTHREADINFO2(native stack address range
 from:0x7F0313DF8000, to:0x7F0313E39000, size:0x41000)
 3XMCPUTIME   CPU usage total: 42379.604203548 secs
 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
 (0xDFC0)
 3XMTHREADINFO3   Java callstack:
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x8360DFE0,
 entry count: 1)
 5XESTACKTRACE   (entered

Re: Negative value given by getVirtualCores() or getAvailableResources()

2014-08-12 Thread Wangda Tan
By default, vcore = 1 for each resource request. If you don't like this
behavior, you can set yarn.scheduler.minimum-allocation-vcores=0

Hope this helps,
Wangda Tan



On Thu, Aug 7, 2014 at 7:13 PM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi,
   I am calling getAvailableResources() on AMRMClientAsync and getting -ve
 value for the number virtual cores as below. Is there something wrong?

 memory:16110, vCores:-2.

 I have set the vcores in my yarn-site.xml like this, and just ran an
 application that requires two containers other than the Application
 Master's container. In the ContainerRequest setup from my
 ApplicationMaster, I haven't set anything for virtual cores, means I didn't
 call setVirtualCores() at all.

 So, I think it shouldn't be showing a -ve value for the vcores, when I
 call getAvailableResources(), am I wrong?


 description Number of CPU cores that can be allocated for containers.
 /description
 name yarn.nodemanager.resource.cpu-vcores /name
 value 4 /value
 /property

 Thanks,
 Kishore



Re: Synchronization among Mappers in map-reduce task Please advise

2014-08-12 Thread Wangda Tan
Hi Saurabh,

 am not sure making overwrite=false , will solve the problem. As per java
doc by making overwrite=false , it will throw an exception if the file
already exists. So, for all the remaining mappers it will throw an
exception.
You can catch the exception and wait.

 Can you please refer to me some source or link on ZK , that can help me
in solving the problem.
You can check this: http://zookeeper.apache.org/doc/r3.4.6/recipes.html

Thanks,
Wangda



On Wed, Aug 13, 2014 at 9:34 AM, saurabh jain sauravma...@gmail.com wrote:

 Hi Wangda ,

 I am not sure making overwrite=false , will solve the problem. As per java
 doc by making overwrite=false , it will throw an exception if the file
 already exists. So, for all the remaining mappers it will throw an
 exception.

 Also I am very new to ZK and have very basic knowledge of it , I am not
 sure if it can solve the problem and if yes how. I am still going through
 available sources on the ZK.

 Can you please refer to me some source or link on ZK , that can help me in
 solving the problem.

 Best
 Saurabh

 On Tue, Aug 12, 2014 at 3:08 AM, Wangda Tan wheele...@gmail.com wrote:

 Hi Saurabh,
 It's an interesting topic,

  So , here is the question , is it possible to make sure that when one
 of
 the mapper tasks is writing to a file , other should wait until the first
 one is finished. ? I read that all the mappers task don't interact with
 each other

 A simple way to do this is using HDFS namespace:
 Create file using public FSDataOutputStream create(Path f, boolean
 overwrite), overwrite=false. Only one mapper can successfully create
 file.

 After write completed, the mapper will create a flag file like completed
 in the same folder. Other mappers can wait for the completed file
 created.

  Is there any way to have synchronization between two independent map
 reduce jobs?
 I think ZK can do some complex synchronization here. Like mutex, master
 election, etc.

 Hope this helps,

 Wangda Tan




 On Tue, Aug 12, 2014 at 10:43 AM, saurabh jain sauravma...@gmail.com
 wrote:

  Hi Folks ,
 
  I have been writing a map-reduce application where I am having an input
  file containing records and every field in the record is separated by
 some
  delimiter.
 
  In addition to this user will also provide a list of columns that he
 wants
  to lookup in a master properties file (stored in HDFS). If this columns
  (lets say it a key) is present in master properties file then get the
  corresponding value and update the key with this value and if the key is
  not present it in the master properties file then it will create a new
  value for this key and will write to this property file and will also
  update in the record.
 
  I have written this application , tested it and everything worked fine
  till now.
 
  *e.g :* *I/P Record :* This | is | the | test | record
 
  *Columns :* 2,4 (that means code will look up only field *is and
 test* in
  the master properties file.)
 
  Here , I have a question.
 
  *Q 1:* In the case when my input file is huge and it is splitted across
  the multiple mappers , I was getting the below mentioned exception where
  all the other mappers tasks were failing. *Also initially when I started
  the job my master properties file is empty.* In my code I have a check
 if
  this file (master properties) doesn't exist create a new empty file
 before
  submitting the job itself.
 
  e.g : If i have 4 splits of data , then 3 map tasks are failing. But
 after
  this all the failed map tasks restarts and finally the job become
  successful.
 
  So , *here is the question , is it possible to make sure that when one
 of
  the mapper tasks is writing to a file , other should wait until the
 first
  one is finished. ?* I read that all the mappers task don't interact with
  each other.
 
  Also what will happen in the scenario when I start multiple parallel
  map-reduce jobs and all of them working on the same properties files.
 *Is
  there any way to have synchronization between two independent map reduce
  jobs*?
 
  I have also read that ZooKeeper can be used in such scenarios , Is that
  correct ?
 
 
  Error:
 com.techidiocy.hadoop.filesystem.api.exceptions.HDFSFileSystemException:
 IOException - failed while appending data to the file -Failed to create
 file [/user/cloudera/lob/master/bank.properties] for
 [DFSClient_attempt_1407778869492_0032_m_02_0_1618418105_1] on client
 [10.X.X.17], because this file is already being created by
  [DFSClient_attempt_1407778869492_0032_m_05_0_-949968337_1] on
 [10.X.X.17]
  at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2548)
  at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2377)
  at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2612

Re: Configuration set up questions - Container killed on request. Exit code is 143

2014-07-17 Thread Wangda Tan
Hi Chris MacKenzie,
Since your output is still As the output was still 2.1 GB of 2.1 GB
virtual memory used. Killing
I guess yarn.nodemanager.vmem-pmem-ratio doesn't take effect, if it take
effect, it should be xxGB of 10GB virtual memory used ...
Have your tried restart NM after configure that option?

Thanks,
Wangda


On Thu, Jul 17, 2014 at 11:15 PM, Chris Mawata chris.maw...@gmail.com
wrote:

 Another thing to try is smaller input splits if your data can be broken up
 into smaller files that can be independently processed. That way s
 you get more but smaller map tasks. You could also use more  but smaller
 reducers. The many files will tax your NameNode more but you might get to
 use all you cores.
 On Jul 17, 2014 9:07 AM, Chris MacKenzie 
 stu...@chrismackenziephotography.co.uk wrote:

 Hi Chris,

 Thanks for getting back to me. I will set that value to 10

 I have just tried this.

 https://support.gopivotal.com/hc/en-us/articles/201462036-Mapreduce-YARN-Me
 mory-Parameters
 https://support.gopivotal.com/hc/en-us/articles/201462036-Mapreduce-YARN-Memory-Parameters

 Setting both to mapreduce.map.memory.mb mapreduce.reduce.memory.mb. Though
 after setting it I didn’t get the expected change.

 As the output was still 2.1 GB of 2.1 GB virtual memory used. Killing
 container


 Regards,

 Chris MacKenzie
 telephone: 0131 332 6967
 email: stu...@chrismackenziephotography.co.uk
 corporate: www.chrismackenziephotography.co.uk
 http://www.chrismackenziephotography.co.uk/
 http://plus.google.com/+ChrismackenziephotographyCoUk/posts
 http://www.linkedin.com/in/chrismackenziephotography/






 From:  Chris Mawata chris.maw...@gmail.com
 Reply-To:  user@hadoop.apache.org
 Date:  Thursday, 17 July 2014 13:36
 To:  Chris MacKenzie stu...@chrismackenziephotography.co.uk
 Cc:  user@hadoop.apache.org
 Subject:  Re: Configuration set up questions - Container killed on
 request. Exit code is 143


 Hi Chris MacKenzie,  I have a feeling (I am not familiar with the kind
 of work you are doing) that your application is memory intensive.  8 cores
 per node and only 12GB is tight. Try bumping up the
 yarn.nodemanager.vmem-pmem-ratio
 Chris Mawata




 On Wed, Jul 16, 2014 at 11:37 PM, Chris MacKenzie
 stu...@chrismackenziephotography.co.uk wrote:

 Hi,

 Thanks Chris Mawata
 I’m working through this myself, but wondered if anyone could point me in
 the right direction.

 I have attached my configs.


 I’m using hadoop 2.41

 My system is:
 32 Clusters
 8 processors per machine
 12 gb ram
 Available disk space per node 890 gb

 This is my current error:

 mapreduce.Job (Job.java:printTaskEvents(1441)) - Task Id :
 attempt_1405538067846_0006_r_00_1, Status : FAILED
 Container [pid=25848,containerID=container_1405538067846_0006_01_04]
 is running beyond virtual memory limits. Current usage: 439.0 MB of 1 GB
 physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing
 container.
 Dump of the process-tree for container_1405538067846_0006_01_04 :
 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
 SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
 |- 25853 25848 25848 25848 (java) 2262 193 2268090368 112050
 /usr/java/latest//bin/java -Djava.net.preferIPv4Stack=true
 -Dhadoop.metrics.log.level=WARN -Xmx768m

 -Djava.io.tmpdir=/tmp/hadoop-cm469/nm-local-dir/usercache/cm469/appcache/ap
 plication_1405538067846_0006/container_1405538067846_0006_01_04/tmp
 -Dlog4j.configuration=container-log4j.properties

 -Dyarn.app.container.log.dir=/scratch/extra/cm469/hadoop-2.4.1/logs/userlog
 s/application_1405538067846_0006/container_1405538067846_0006_01_04
 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
 org.apache.hadoop.mapred.YarnChild 137.195.143.103 59056
 attempt_1405538067846_0006_r_00_1 4
 |- 25848 25423 25848 25848 (bash) 0 0 108613632 333 /bin/bash -c
 /usr/java/latest//bin/java -Djava.net.preferIPv4Stack=true
 -Dhadoop.metrics.log.level=WARN  -Xmx768m

 -Djava.io.tmpdir=/tmp/hadoop-cm469/nm-local-dir/usercache/cm469/appcache/ap
 plication_1405538067846_0006/container_1405538067846_0006_01_04/tmp
 -Dlog4j.configuration=container-log4j.properties

 -Dyarn.app.container.log.dir=/scratch/extra/cm469/hadoop-2.4.1/logs/userlog
 s/application_1405538067846_0006/container_1405538067846_0006_01_04
 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
 org.apache.hadoop.mapred.YarnChild 137.195.143.103 59056
 attempt_1405538067846_0006_r_00_1 4

 1/scratch/extra/cm469/hadoop-2.4.1/logs/userlogs/application_1405538067846
 _0006/container_1405538067846_0006_01_04/stdout

 2/scratch/extra/cm469/hadoop-2.4.1/logs/userlogs/application_1405538067846
 _0006/container_1405538067846_0006_01_04/stderr

 Container killed on request. Exit code is 143
 Container exited with a non-zero exit code 143






 Regards,

 Chris MacKenzie
 telephone: 0131 332 6967
 email: stu...@chrismackenziephotography.co.uk
 

Re: YARN creates only 1 container

2014-06-26 Thread Wangda Tan
It should be yarn-site.xml not yarn.xml.
yarn.xml will not be added to $CLASSPATH

Thanks,
Wangda

On Wed, May 28, 2014 at 8:56 AM, hari harib...@gmail.com wrote:

 The issue was not related the configuration related to containers. Due to
 misconfiguration, the Application master was not able to contact
 resourcemanager
 causing in the 1 container problem.

 However, the total containers allocated still is not as expected. The
 configuration settings
 should have resulted in 16 containers per node, but it is allocating 64
 containers per node.

 Reiterating the config parameters here again:

 mapred-site.xml
 mapreduce.map.cpu.vcores = 1
 mapreduce.reduce.cpu.vcores = 1
 mapreduce.map.memory.mb = 1024
 mapreduce.reduce.memory.mb = 1024
 mapreduce.map.java.opts = -Xmx1024m
 mapreduce.reduce.java.opts = -Xmx1024m

 yarn.xml
 yarn.nodemanager.resource.memory-mb = 65536
  yarn.nodemanager.resource.cpu-vcores = 16
 yarn.scheduler.minimum-allocation-mb = 1024
 yarn.scheduler.maximum-allocation-mb  = 2048
 yarn.scheduler.minimum-allocation-vcores = 1
 yarn.scheduler.maximum-allocation-vcores = 1

 Is there anything else that might be causing this problem ?

 thanks,
 hari





 On Tue, May 27, 2014 at 3:31 AM, hari harib...@gmail.com wrote:

 Hi,

 When using YARN 2.2.0 version, only 1 container is created
 for an application in the entire cluster.
 The single container is created at an arbitrary node
 for every run. This happens when running any application from
 the examples jar (e.g., wordcount). Currently only one application is
 run at a time. The input datasize is  200GB.

 I am setting custom values that affect concurrent container count.
 These config parameters were mostly taken from:

 http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/
 These wasn't much description elsewhere on how the container count would
 be
 decided.

 The settings are:

 mapred-site.xml
 mapreduce.map.cpu.vcores = 1
 mapreduce.reduce.cpu.vcores = 1
 mapreduce.map.memory.mb = 1024
 mapreduce.reduce.memory.mb = 1024
 mapreduce.map.java.opts = -Xmx1024m
 mapreduce.reduce.java.opts = -Xmx1024m

 yarn.xml
 yarn.nodemanager.resource.memory-mb = 65536
 yarn.nodemanager.resource.cpu-vcores = 16

 From these settings, each node should be running 16 containers.

 Let me know if there might be something else affecting the container
 count.

 thanks,
 hari









Re: Web interface not show applications

2014-05-26 Thread Wangda Tan
Hi Bo,
I've created ticket YARN-2104 to track this, you can check it by that
ticket.

Thanks,


On Mon, May 26, 2014 at 1:38 PM, bo yang bobyan...@gmail.com wrote:

 No problem. Thanks Wangda to follow up on this issue!


 On Sun, May 25, 2014 at 6:36 AM, Wangda Tan wheele...@gmail.com wrote:

 Thanks for reporting this, will look at this.


 On Sat, May 24, 2014 at 8:53 AM, bo yang bobyan...@gmail.com wrote:

 Yes, after application shown in main page, do step 1 again, the
 applications will disappear again.

 I use hadoop 2.4, running on Windows.



 On Fri, May 23, 2014 at 4:09 PM, Wangda Tan wheele...@gmail.com wrote:

 Bo,
 Thanks for your steps. I want to know after application shown in main
 page, do step 1 again, will this application disappear?
 And what's version of your Hadoop?

 Thanks,
 Wangda


 On Sat, May 24, 2014 at 1:33 AM, bo yang bobyan...@gmail.com wrote:

 Wangda,

 Yes, it is still reproducible.

 Steps in my side:
 1. Go to Scheduler Page, click some child queue.
 2. Go to Main Page, I see no applications there.
 3. Go to Scheduler Page, click the root queue.
 4. Go to Main Page, I can see applications as normal then.

 Thanks,
 Bo



 On Fri, May 23, 2014 at 6:45 AM, Wangda Tan wheele...@gmail.comwrote:

 Boyu,
 Your problem typically is forgetting set mapreduce.framework.name=yarn
 in yarn-site.xml or mapred-site.xml.

 Bo,
 Is this reproducible? I think it's more like a bug we need fix, and
 could you tell me how to run into this bug if it's re-producible.


 On Thu, May 15, 2014 at 3:26 AM, bo yang bobyan...@gmail.com wrote:

 Hi Boyu,

 I hit similar situation previously. I found I clicked some child
 queue in Scheduler page, then I could not see any applications in main
 page. If I click the root queue in Scheduler page, I can see 
 applications
 in main page again. You may have a similar try.

 Regards,
 Bo



 On Wed, May 14, 2014 at 10:01 AM, Boyu Zhang 
 boyuzhan...@gmail.comwrote:

 Dear all,

 I am using hadoop 2.4.0 in pseudo distributed mode, I tried to run
 the example wordcount program. It finished successfully, but I am not 
 able
 to see the application/job from the localhost:8088 web interface.

 I started the job history daemon, and nothing from localhost:19888
 either.

 Can anybody provide any intuitions?

 Thanks a lot!
 Boyu











Re: Web interface not show applications

2014-05-25 Thread Wangda Tan
Thanks for reporting this, will look at this.


On Sat, May 24, 2014 at 8:53 AM, bo yang bobyan...@gmail.com wrote:

 Yes, after application shown in main page, do step 1 again, the
 applications will disappear again.

 I use hadoop 2.4, running on Windows.



 On Fri, May 23, 2014 at 4:09 PM, Wangda Tan wheele...@gmail.com wrote:

 Bo,
 Thanks for your steps. I want to know after application shown in main
 page, do step 1 again, will this application disappear?
 And what's version of your Hadoop?

 Thanks,
 Wangda


 On Sat, May 24, 2014 at 1:33 AM, bo yang bobyan...@gmail.com wrote:

 Wangda,

 Yes, it is still reproducible.

 Steps in my side:
 1. Go to Scheduler Page, click some child queue.
 2. Go to Main Page, I see no applications there.
 3. Go to Scheduler Page, click the root queue.
 4. Go to Main Page, I can see applications as normal then.

 Thanks,
 Bo



 On Fri, May 23, 2014 at 6:45 AM, Wangda Tan wheele...@gmail.com wrote:

 Boyu,
 Your problem typically is forgetting set mapreduce.framework.name=yarn
 in yarn-site.xml or mapred-site.xml.

 Bo,
 Is this reproducible? I think it's more like a bug we need fix, and
 could you tell me how to run into this bug if it's re-producible.


 On Thu, May 15, 2014 at 3:26 AM, bo yang bobyan...@gmail.com wrote:

 Hi Boyu,

 I hit similar situation previously. I found I clicked some child queue
 in Scheduler page, then I could not see any applications in main page. If 
 I
 click the root queue in Scheduler page, I can see applications in main 
 page
 again. You may have a similar try.

 Regards,
 Bo



 On Wed, May 14, 2014 at 10:01 AM, Boyu Zhang boyuzhan...@gmail.comwrote:

 Dear all,

 I am using hadoop 2.4.0 in pseudo distributed mode, I tried to run
 the example wordcount program. It finished successfully, but I am not 
 able
 to see the application/job from the localhost:8088 web interface.

 I started the job history daemon, and nothing from localhost:19888
 either.

 Can anybody provide any intuitions?

 Thanks a lot!
 Boyu









Re: question about search log use yarn command

2014-05-23 Thread Wangda Tan
Hi,
You first need set *yarn.log-aggregation-enable *to true in yarn-site.xml
to enable using yarn logs retrieve logs,

To your first question, the port you can see in RM's web UI, typically in
RM_address:8088, check the available nodes and find the host/port of NMs.
To your 2nd question, if you're not sure about this, you can checkout the
whole application log with all containers (don't use -containerId and
-nodeAddress) and find what you interested.

Thanks,
Wangda



On Thu, May 15, 2014 at 4:34 PM, ch huang justlo...@gmail.com wrote:

 hi,mailist:


   doc say

   $ $HADOOP_YARN_HOME/bin/yarn logs
 Retrieve logs for completed YARN applications.
 usage: yarn logs -applicationId application ID [OPTIONS]
 general options are:
 -appOwner Application Owner   AppOwner (assumed to be current user if
 not specified)
 -containerId Container ID ContainerId (must be specified if node
 address is specified)
 -nodeAddress Node Address NodeAddress in the format nodename:port
 (must be specified if container id is
 specified)

 my question is if i want see NM1 log NodeAddress nodename is NM1,how can i
 get the port? another question is about containerId,how can i know the
 containerId? any list info can search ?



Re: Web interface not show applications

2014-05-23 Thread Wangda Tan
Boyu,
Your problem typically is forgetting set mapreduce.framework.name=yarn in
yarn-site.xml or mapred-site.xml.

Bo,
Is this reproducible? I think it's more like a bug we need fix, and could
you tell me how to run into this bug if it's re-producible.


On Thu, May 15, 2014 at 3:26 AM, bo yang bobyan...@gmail.com wrote:

 Hi Boyu,

 I hit similar situation previously. I found I clicked some child queue in
 Scheduler page, then I could not see any applications in main page. If I
 click the root queue in Scheduler page, I can see applications in main page
 again. You may have a similar try.

 Regards,
 Bo



 On Wed, May 14, 2014 at 10:01 AM, Boyu Zhang boyuzhan...@gmail.comwrote:

 Dear all,

 I am using hadoop 2.4.0 in pseudo distributed mode, I tried to run the
 example wordcount program. It finished successfully, but I am not able to
 see the application/job from the localhost:8088 web interface.

 I started the job history daemon, and nothing from localhost:19888 either.

 Can anybody provide any intuitions?

 Thanks a lot!
 Boyu





Re: Web interface not show applications

2014-05-23 Thread Wangda Tan
Bo,
Thanks for your steps. I want to know after application shown in main page,
do step 1 again, will this application disappear?
And what's version of your Hadoop?

Thanks,
Wangda


On Sat, May 24, 2014 at 1:33 AM, bo yang bobyan...@gmail.com wrote:

 Wangda,

 Yes, it is still reproducible.

 Steps in my side:
 1. Go to Scheduler Page, click some child queue.
 2. Go to Main Page, I see no applications there.
 3. Go to Scheduler Page, click the root queue.
 4. Go to Main Page, I can see applications as normal then.

 Thanks,
 Bo



 On Fri, May 23, 2014 at 6:45 AM, Wangda Tan wheele...@gmail.com wrote:

 Boyu,
 Your problem typically is forgetting set mapreduce.framework.name=yarn
 in yarn-site.xml or mapred-site.xml.

 Bo,
 Is this reproducible? I think it's more like a bug we need fix, and could
 you tell me how to run into this bug if it's re-producible.


 On Thu, May 15, 2014 at 3:26 AM, bo yang bobyan...@gmail.com wrote:

 Hi Boyu,

 I hit similar situation previously. I found I clicked some child queue
 in Scheduler page, then I could not see any applications in main page. If I
 click the root queue in Scheduler page, I can see applications in main page
 again. You may have a similar try.

 Regards,
 Bo



 On Wed, May 14, 2014 at 10:01 AM, Boyu Zhang boyuzhan...@gmail.comwrote:

 Dear all,

 I am using hadoop 2.4.0 in pseudo distributed mode, I tried to run the
 example wordcount program. It finished successfully, but I am not able to
 see the application/job from the localhost:8088 web interface.

 I started the job history daemon, and nothing from localhost:19888
 either.

 Can anybody provide any intuitions?

 Thanks a lot!
 Boyu







Re: 2.4 / yarn pig jobs fail due to exit code 1 from container.

2014-05-23 Thread Wangda Tan
Kevin,
NM logs you provided can only know container exit with code=1, logs of
container can be helpful to understand what happened.

Thanks,


On Sat, May 24, 2014 at 6:17 AM, Kevin Burton bur...@spinn3r.com wrote:

 Trying to track down exactly what's happening.

 Right now I'm getting this (see below).

 The setup documentation for 2.4 could definitely be better.  Probably with
 a sample/working config. Looks like too much of this is left up as an
 exercise to the user.

 2014-05-23 21:20:30,652 INFO
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
 launchContainer: [nice, -n, 0, bash,
 /data/2/yarn/local/usercache/root/appcache/application_1400821083545_0004/container_1400821083545_0004_01_01/default_container_executor.sh]
 2014-05-23 21:20:30,771 WARN
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
 code from container container_1400821083545_0004_01_01 is : 1
 2014-05-23 21:20:30,771 WARN
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
 Exception from container-launch with container ID:
 container_1400821083545_0004_01_01 and exit code: 1
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
 at org.apache.hadoop.util.Shell.run(Shell.java:418)
 at
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
 at
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)


 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ 
 profilehttps://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




Re: YarnException: Unauthorized request to start container. This token is expired.

2014-04-02 Thread Wangda Tan
Fengyun,
I think I have met a similar problem before, you can check if RM's time and
NM's time are set synchronized or not.

Regards,
Wangda Tan


On Wed, Apr 2, 2014 at 9:55 PM, Fengyun RAO raofeng...@gmail.com wrote:

 thank you, omkar,

 I'm fresh to Hadoop, and all the settings are default, so I guess the
 expiration is 10 minutes.

 The exception happens when running big job, which occupies all the
 resources of all nodes.

 When running small job, with many containers remained, no exception was
 thrown.


 Actually I didn't quite follow you, what reservation means,
 I guess you mean RM creates the token at the time of reservation, but when
 it assigns the container to AM, the token is expired.
 Is this correct?

 Can I ask you a favor to help me find the jira? or tell me which version
 fixed the problem?

 Thanks!

 2014-03-30 0:33 GMT+08:00 omkar joshi omkar.vinit.joshi...@gmail.com:

 Can you check few things?
 What is the container expiry interval set to?
 How many containers are getting allocated?
 Is there any reservation of the containers happening..?
 if yes then that was a known problem...I don't remember the jira number
 though... Underlying problem in case of reservation was that it creates a
 token at the time of reservation and not when it issues the token to AM.



 On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz se3g2...@gmail.com wrote:

 no doubt

 Sent from my iPhone 6

  On Mar 23, 2014, at 17:37, Fengyun RAO raofeng...@gmail.com wrote:
 
  What does this exception mean? I googled a lot, all the results tell
 me it's because the time is not synchronized between datanode and namenode.
  However, I checked all the servers, that the ntpd service is on, and
 the time differences are less than 1 second.
  What's more, the tasks are not always failing on certain datanodes.
  It fails and then it restarts and succeeds. If it were the time
 problem, I guess it would always fail.
 
  My hadoop version is CDH5 beta. Below is the detailed log:
 
  14/03/23 14:57:06 INFO mapreduce.Job: Running job:
 job_1394434496930_0032
  14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032
 running in uber mode : false
  14/03/23 14:57:17 INFO mapreduce.Job:  map 0% reduce 0%
  14/03/23 15:08:01 INFO mapreduce.Job: Task Id :
 attempt_1394434496930_0032_m_34_0, Status : FAILED
  Container launch failed for container_1394434496930_0032_01_41 :
 org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
 start container.
  This token is expired. current time is 1395558481146 found
 1395558443384
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at
 java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 at
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
 at
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 
  14/03/23 15:08:02 INFO mapreduce.Job:  map 1% reduce 0%
  14/03/23 15:09:36 INFO mapreduce.Job: Task Id :
 attempt_1394434496930_0032_m_36_0, Status : FAILED
  Container launch failed for container_1394434496930_0032_01_38 :
 org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
 start container.
  This token is expired. current time is 1395558575889 found
 1395558443245
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at
 java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 at
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
 at
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run

Re: number of map tasks on yarn

2014-04-01 Thread Wangda Tan
More specifically, Number of map tasks for each job is depended on
InputFormat.getSplits(...). The number of map tasks is as same as number of
splits returned by InputFormat.getSplits(...). You can read source code of
FileInputFormat to get more understanding about this.



Regards,
Wangda Tan


On Wed, Apr 2, 2014 at 10:23 AM, Stanley Shi s...@gopivotal.com wrote:

 map task number is not decided by the resources you need.
 It's decided by something else.

 Regards,
 *Stanley Shi,*



 On Wed, Apr 2, 2014 at 9:08 AM, Libo Yu yu_l...@hotmail.com wrote:

 Hi all,

 I pretty much use the default yarn setting to run a word count example on
 a 3 node cluster. Here are my settings:
 yarn.nodemanager.resource.memory-mb 8192
 yarn.scheduler.minimum-allocation-mb 1024
 yarn.scheduler.maximum-allocation-vcores 32

 I would expect to see 8192/1024 * 3 = 24 map tasks.
 However, I see 32 map tasks. Anybody knows why? Thanks.

 Libo





Re: How to get locations of blocks programmatically?

2014-03-27 Thread Wangda Tan
Hi Libo,
DFSClient.getBlockLocations, is this what you want?

Regards,
Wangda Tan


On Fri, Mar 28, 2014 at 10:03 AM, Libo Yu yu_l...@hotmail.com wrote:

 Hi all,

 hadoop path fsck -files -block -locations can list locations for all
 blocks in the path.
 Is it possible to list all blocks and the block locations for a given path
 programmatically?
 Thanks,

 Libo



Re: Getting error message from AM container launch

2014-03-26 Thread Wangda Tan
HI John,
Typically, this is caused by somewhere in your program set nice as AM
launching command. You can check the real script which YARN used to
launch AM.
You need set yarn.nodemanager.delete.debug-delay-sec in yarn-site.xml on
all NMs to a larger value (like 600, 10 min), to make NMs don't remove
temporary directory of a container as soon as the container get finished.
You need restart NMs after you set.
After that, you can re-run your program again, the script you can find
should be
host-of-AM:/ephemeral02/hadoop/yarn/local/usercache/SYSTEM/appcache/app-id/container-id/launch_container.sh.
You can verify the launch command if correct in the script.
--
Regards,
Wangda Tan


On Thu, Mar 27, 2014 at 7:12 AM, Azuryy azury...@gmail.com wrote:

 You used 'nice' in your app?


 Sent from my iPhone5s

 On 2014年3月27日, at 6:55, John Lilley john.lil...@redpoint.net wrote:

  On further examination they appear to be 369 characters long.  I’ve read
 about similar issues showing when the environment exceeds 132KB, but we
 aren’t putting anything significant in the environment.

 John





 *From:* John Lilley 
 [mailto:john.lil...@redpoint.netjohn.lil...@redpoint.net]

 *Sent:* Wednesday, March 26, 2014 4:41 PM
 *To:* user@hadoop.apache.org
 *Subject:* RE: Getting error message from AM container launch



 We do have a fairly long container command-line.  Not huge, around 200
 characters.

 John



 *From:* John Lilley 
 [mailto:john.lil...@redpoint.netjohn.lil...@redpoint.net]

 *Sent:* Wednesday, March 26, 2014 4:38 PM
 *To:* user@hadoop.apache.org
 *Subject:* Getting error message from AM container launch



 Running a non-MapReduce YARN application, one of the containers launched
 by the AM is failing with an error message I’ve never seen.  Any ideas?
 I’m not sure who exactly is running “nice” or why its argument list would
 be too long.

 Thanks

 john



 Container for appattempt_1395755163053_0030_01 exited with  exitCode:
 0 due to: Exception from container-launch:

 java.io.IOException: Cannot run program nice (in directory
 /ephemeral02/hadoop/yarn/local/usercache/SYSTEM/appcache/application_1395755163053_0030/container_1395755163053_0030_01_01):
 java.io.IOException: error=7, Argument list too long

 at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)

 at org.apache.hadoop.util.Shell.runCommand(Shell.java:407)

 at org.apache.hadoop.util.Shell.run(Shell.java:379)

 at
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)

 at
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)

 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)

 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

 at java.util.concurrent.FutureTask.run(FutureTask.java:138)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

 at java.lang.Thread.run(Thread.java:662)

 Caused by: java.io.IOException: java.io.IOException: error=7, Argument
 list too long

 at java.lang.UNIXProcess.init(UNIXProcess.java:148)

 at java.lang.ProcessImpl.start(ProcessImpl.java:65)

 at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)

 ... 11 more






-- 
Regards,
Wangda


Re: Getting error message from AM container launch

2014-03-26 Thread Wangda Tan
Glad to hear that :)
--
Wangda Tan

Regards,
Wangda Tan


On Thu, Mar 27, 2014 at 10:36 AM, John Lilley john.lil...@redpoint.netwrote:

  Wangda Tan,



 Thanks for your reply!  We did actually figure out where the problem was
 coming from, but this is a very helpful technique to know.



 John





 *From:* Wangda Tan [mailto:wheele...@gmail.com]
 *Sent:* Wednesday, March 26, 2014 6:35 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: Getting error message from AM container launch



 HI John,

 Typically, this is caused by somewhere in your program set nice as AM
 launching command. You can check the real script which YARN used to
 launch AM.

 You need set yarn.nodemanager.delete.debug-delay-sec in yarn-site.xml on
 all NMs to a larger value (like 600, 10 min), to make NMs don't remove
 temporary directory of a container as soon as the container get finished.
 You need restart NMs after you set.

 After that, you can re-run your program again, the script you can find
 should be
 host-of-AM:/ephemeral02/hadoop/yarn/local/usercache/SYSTEM/appcache/app-id/container-id/launch_container.sh.

 You can verify the launch command if correct in the script.

 --

 Regards,

 Wangda Tan



 On Thu, Mar 27, 2014 at 7:12 AM, Azuryy azury...@gmail.com wrote:

 You used 'nice' in your app?



 Sent from my iPhone5s


 On 2014年3月27日, at 6:55, John Lilley john.lil...@redpoint.net wrote:

  On further examination they appear to be 369 characters long.  I’ve read
 about similar issues showing when the environment exceeds 132KB, but we
 aren’t putting anything significant in the environment.

 John





 *From:* John Lilley 
 [mailto:john.lil...@redpoint.netjohn.lil...@redpoint.net]

 *Sent:* Wednesday, March 26, 2014 4:41 PM
 *To:* user@hadoop.apache.org
 *Subject:* RE: Getting error message from AM container launch



 We do have a fairly long container command-line.  Not huge, around 200
 characters.

 John



 *From:* John Lilley 
 [mailto:john.lil...@redpoint.netjohn.lil...@redpoint.net]

 *Sent:* Wednesday, March 26, 2014 4:38 PM
 *To:* user@hadoop.apache.org
 *Subject:* Getting error message from AM container launch



 Running a non-MapReduce YARN application, one of the containers launched
 by the AM is failing with an error message I’ve never seen.  Any ideas?
 I’m not sure who exactly is running “nice” or why its argument list would
 be too long.

 Thanks

 john



 Container for appattempt_1395755163053_0030_01 exited with  exitCode:
 0 due to: Exception from container-launch:

 java.io.IOException: Cannot run program nice (in directory
 /ephemeral02/hadoop/yarn/local/usercache/SYSTEM/appcache/application_1395755163053_0030/container_1395755163053_0030_01_01):
 java.io.IOException: error=7, Argument list too long

 at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)

 at org.apache.hadoop.util.Shell.runCommand(Shell.java:407)

 at org.apache.hadoop.util.Shell.run(Shell.java:379)

 at
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)

 at
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)

 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)

 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

 at java.util.concurrent.FutureTask.run(FutureTask.java:138)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

 at java.lang.Thread.run(Thread.java:662)

 Caused by: java.io.IOException: java.io.IOException: error=7, Argument
 list too long

 at java.lang.UNIXProcess.init(UNIXProcess.java:148)

 at java.lang.ProcessImpl.start(ProcessImpl.java:65)

 at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)

 ... 11 more







 --

 Regards,

 Wangda