Re: Hadoop 3.2 Release Plan proposal

2018-09-04 Thread Haibo Chen
Hi Sunil,

For YARN-1011, we found a few minor issues that we have not merged our
internal fixes into the upstream feature branch. Hence, given the release
timeline, YARN-1011 will not make it.

Thanks.

On Thu, Aug 30, 2018 at 2:52 PM, Virajith Jalaparti 
wrote:

> Hi Sunil,
>
> Quick correction on the task list  (missed this earlier) -- HDFS-12615 is
> being done by Inigo Goiri
>
> -Virajith
>
>
>
> On Thu, Aug 30, 2018 at 9:30 AM Sunil G  wrote:
>
> > Hi All,
> >
> > Inline with earlier communication dated 17th July 2018, I would like to
> > provide some updates.
> >
> > We are approaching previously proposed code freeze date (Aug 31).
> >
> > One of the critical feature Node Attributes feature merge discussion/vote
> > is ongoing. Also few other Blocker bugs need a bit more time. With regard
> > to this, suggesting to push the feature/code freeze for 2 more weeks to
> > accommodate these jiras too.
> >
> > Proposing Updated changes in plan inline with this:
> > Feature freeze date : all features to merge by September 7, 2018.
> > Code freeze date : blockers/critical only, no improvements and
> >  blocker/critical bug-fixes September 14, 2018.
> > Release date: September 28, 2018
> >
> > If any features in branch which are targeted to 3.2.0, please reply to
> this
> > email thread.
> >
> > *Here's an updated 3.2.0 feature status:*
> >
> > 1. Merged & Completed features:
> >
> > - (Wangda) YARN-8561: Hadoop Submarine project for DeepLearning workloads
> > Initial cut.
> > - (Uma) HDFS-10285: HDFS Storage Policy Satisfier
> > - (Sunil) YARN-7494: Multi Node scheduling support in Capacity Scheduler.
> > - (Chandni/Eric) YARN-7512: Support service upgrade via YARN Service API
> > and CLI.
> >
> > 2. Features close to finish:
> >
> > - (Naga/Sunil) YARN-3409: Node Attributes support in YARN. Merge/Vote
> > Ongoing.
> > - (Rohith) YARN-5742: Serve aggregated logs of historical apps from
> ATSv2.
> > Patch in progress.
> > - (Virajit) HDFS-12615: Router-based HDFS federation. Improvement works.
> > - (Steve) S3Guard Phase III, S3a phase V, Support Windows Azure Storage.
> In
> > progress.
> >
> > 3. Tentative features:
> >
> > - (Haibo Chen) YARN-1011: Resource overcommitment. Looks challenging to
> be
> > done before Aug 2018.
> > - (Eric) YARN-7129: Application Catalog for YARN applications.
> Challenging
> > as more discussions are on-going.
> >
> > *Summary of 3.2.0 issues status:*
> >
> > 26 Blocker and Critical issues [1] are open, I am following up with
> owners
> > to get status on each of them to get in by Code Freeze date.
> >
> > [1] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND priority in (Blocker,
> > Critical) AND resolution = Unresolved AND "Target Version/s" = 3.2.0
> ORDER
> > BY priority DESC
> >
> > Thanks,
> > Sunil
> >
> > On Tue, Aug 14, 2018 at 10:30 PM Sunil G  wrote:
> >
> > > Hi All,
> > >
> > > Thanks for the feedbacks. Inline with earlier communication dated 17th
> > > July 2018, I would like to provide some updates.
> > >
> > > We are approaching previously proposed feature freeze date (Aug 21,
> about
> > > 7 days from today).
> > > If any features in branch which are targeted to 3.2.0, please reply to
> > > this email thread.
> > > Steve has mentioned about the s3 features which will come close to Code
> > > Freeze Date (Aug 31st).
> > >
> > > *Here's an updated 3.2.0 feature status:*
> > >
> > > 1. Merged & Completed features:
> > >
> > > - (Wangda) YARN-8561: Hadoop Submarine project for DeepLearning
> workloads
> > > Initial cut.
> > > - (Uma) HDFS-10285: HDFS Storage Policy Satisfier
> > >
> > > 2. Features close to finish:
> > >
> > > - (Naga/Sunil) YARN-3409: Node Attributes support in YARN. Major
> patches
> > > are all in, only one last
> > > patch is in review state.
> > > - (Sunil) YARN-7494: Multi Node scheduling support in Capacity
> Scheduler.
> > > Close to commit.
> > > - (Chandni/Eric) YARN-7512: Support service upgrade via YARN Service
> API
> > > and CLI. 2 patches are pending
> > > which will be closed by Feature freeze date.
> > > - (Rohith) YARN-5742: Serve aggregated logs of historical apps from
> > ATSv2.
> > > Patch in progress.
> > > - (Virajit) HDFS-12615: Router-based HDFS federation. I

Re: Apache Hadoop 3.0.3 Release plan

2018-05-08 Thread Haibo Chen
+1 on adding YARN-7190 to Hadoop 3.0.x despite the fact that it is
technically incompatible.
It is critical enough to justify being an exception, IMO.

Added Rohith and Vrushali

On Tue, May 8, 2018 at 6:20 AM, Wei-Chiu Chuang  wrote:

> Thanks Yongjun for driving 3.0.3 release!
>
> IMHO, could we consider adding YARN-7190
>  into the list?
> I understand that it is listed as an incompatible change, however, because
> of this bug, HBase considers the entire Hadoop 3.0.x line not production
> ready. I feel there's not much point releasing any more 3.0.x releases if
> downstream projects can't pick it up (after the fact that HBase is one of
> the most important projects around Hadoop).
>
> On Mon, May 7, 2018 at 1:19 PM, Yongjun Zhang  wrote:
>
> > Hi Eric,
> >
> > Thanks for the feedback, good point. I will try to clean up things, then
> > cut branch before the release production and vote.
> >
> > Best,
> >
> > --Yongjun
> >
> > On Mon, May 7, 2018 at 8:39 AM, Eric Payne  > invalid
> > > wrote:
> >
> > > >  We plan to cut branch-3.0.3 by the coming Wednesday (May 9th) and
> vote
> > > for RC on May 30th
> > > I much prefer to wait to cut the branch until just before the
> production
> > > of the release and the vote. With so many branches, we sometimes miss
> > > putting critical bug fixes in unreleased branches if the branch is cut
> > too
> > > early.
> > >
> > > My 2 cents...
> > > Thanks,
> > > -Eric Payne
> > >
> > >
> > >
> > >
> > >
> > > On Monday, May 7, 2018, 12:09:00 AM CDT, Yongjun Zhang <
> > > yjzhan...@apache.org> wrote:
> > >
> > >
> > >
> > >
> > >
> > > Hi All,
> > >
> > > >
> > > We have released Apache Hadoop 3.0.2 in April of this year [1]. Since
> > then,
> > > there are quite some commits done to branch-3.0. To further improve the
> > > quality of release, we plan to do 3.0.3 release now. The focus of 3.0.3
> > > will be fixing blockers (3), critical bugs (17) and bug fixes (~130),
> see
> > > [2].
> > >
> > > Usually no new feature should be included for maintenance releases, I
> > > noticed we have https://issues.apache.org/jira/browse/HADOOP-13055 in
> > the
> > > branch classified as new feature. I will talk with the developers to
> see
> > if
> > > we should include it in 3.0.3.
> > >
> > > I also noticed that there are more commits in the branch than can be
> > found
> > > by query [2], also some commits committed to 3.0.3 do not have their
> jira
> > > target release field filled in accordingly. I will go through them to
> > > update the jira.
> > >
> > > >
> > > We plan to cut branch-3.0.3 by the coming Wednesday (May 9th) and vote
> > for
> > > RC on May 30th, targeting for Jun 8th release.
> > >
> > > >
> > > Your insights are welcome.
> > >
> > > >
> > > [1] https://www.mail-archive.com/general@hadoop.apache.org/
> msg07790.html
> > >
> > > > [2] https://issues.apache.org/jira/issues/?filter=12343874  See Note
> > > below
> > > Note: seems I need some admin change so that I can make the filter in
> [2]
> > > public, I'm working on that. For now, you can use jquery
> > > (project = hadoop OR project = "Hadoop HDFS" OR project = "Hadoop YARN"
> > OR
> > > project = "Hadoop Map/Reduce") AND fixVersion in (3.0.3) ORDER BY
> > priority
> > > DESC
> > >
> > > Thanks and best regards,
> > >
> > > --Yongjun
> > >
> > > -
> > > To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
> > >
> > >
> >
>
>
>
> --
> A very happy Hadoop contributor
>


[jira] [Created] (MAPREDUCE-7065) Improve information stored in ATSv2 for MR jobs

2018-03-12 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-7065:
-

 Summary: Improve information stored in ATSv2 for MR jobs
 Key: MAPREDUCE-7065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7065
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Haibo Chen
Assignee: Haibo Chen


While exploring the possibility of retrieving every piece of information that 
JHS presents today through ATSv2, I found a few improvements we can make.

1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are 
indistinguishably stored as entities of type MR_TASK. We can split MR_TASK into 
MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT

2) Task attempt final state are stored in the events, so we can not use 
infofilter to group task attempts by final state, which is what JHS does.

3) Display names of counters are not stored in JHS. We are currently storing 
(counter name, display name, value) as a metric (counter name, value). We can 
potentially store (counter name, display name) as an info. Similarly for 
sources of Job configuration properties

4) Job level counters and configuration properties are stored both in 
ApplicationTable and EntityTable. It's probably safe just to store MR specific 
counters in EntityTable.

 

One general problem I see around this area in MR:

1) We can precompute # of failed/killed/successful map/reduce task attempts and 
average map/reduce/shuffle/merge time in the AM. This would avoid iterating 
over all task attempts when JHS servers the Job Overview Page.

 

To fully replace JHS with ATSv2, three functionalities need to be supported by 
ATSv2

1) /apps/ query so that a list of all jobs can be retrieved

2) support streaming api to get all generic entities (YARN-5672)

3) support per-app data retention policy. Likely a setting in TimelineWriter 
that allow admins specifies how long information of a given application should 
be kepts, in the form of TTL in HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6955) remove unnecessary dependency from hadoop-mapreduce-client-app to hadoop-mapreduce-client-shuffle

2017-09-08 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6955.
---
Resolution: Not A Problem

> remove unnecessary dependency from hadoop-mapreduce-client-app to 
> hadoop-mapreduce-client-shuffle
> -
>
> Key: MAPREDUCE-6955
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6955
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>    Reporter: Haibo Chen
>    Assignee: Haibo Chen
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6955) remove unnecessary dependency from hadoop-mapreduce-client-app to hadoop-mapreduce-client-shuffle

2017-09-08 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6955:
-

 Summary: remove unnecessary dependency from 
hadoop-mapreduce-client-app to hadoop-mapreduce-client-shuffle
 Key: MAPREDUCE-6955
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6955
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6949) yarn.app.mapreduce.am.log.level is not documented in mapred-default.xml

2017-08-30 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6949.
---
Resolution: Duplicate

> yarn.app.mapreduce.am.log.level is not documented in mapred-default.xml
> ---
>
> Key: MAPREDUCE-6949
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6949
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0-alpha4
>    Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6949) yarn.app.mapreduce.am.log.level is not documented in mapred-default.xml

2017-08-30 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6949:
-

 Summary: yarn.app.mapreduce.am.log.level is not documented in 
mapred-default.xml
 Key: MAPREDUCE-6949
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6949
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.0.0-alpha4
Reporter: Haibo Chen
Assignee: Haibo Chen
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6948) TestJobImpl.testUnusableNodeTransition failed

2017-08-30 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6948:
-

 Summary: TestJobImpl.testUnusableNodeTransition failed
 Key: MAPREDUCE-6948
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6948
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0-alpha4
Reporter: Haibo Chen


*Error Message*
expected: but was:

*Stacktrace*
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:1041)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:615)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Merge feature branch YARN-5355 (Timeline Service v2) to trunk

2017-08-25 Thread Haibo Chen
+1 from my side.

More from the perspective of ensuring there is no impact of ATSv2 when it
is off (by default), I deployed the latest YARN-5355 bits into a few
clusters and ran internal Smoke tests. The tests shows no impact when ATSv2
is off.

Best,
Haibo

On Thu, Aug 24, 2017 at 7:51 AM, Sunil G <sun...@apache.org> wrote:

> Thank you very much Vrushali, Rohith, Varun and other folks who made this
> happen. Great work, really appreciate the same!!
>
> +1 (binding) from my side:
>
> # Tested ATSv2 cluster in a secure cluster. Ran some basic jobs
> # Accessed new YARN UI which shows various flows/flow activity etc. Seems
> fine.
> # Based on code, looks like all apis are compatible.
> # REST api docs looks fine as well, I guess we could improve that a bit
> more post merge as well.
> # Adding to additional thoughts which are discussed here, native service
> also could publish events to atsv2. I think that work is also happened in
> branch.
>
> Looking forward to a much wider adoption of ATSv2 with more projects.
>
> Thanks
> Sunil
>
>
> On Tue, Aug 22, 2017 at 12:02 PM Vrushali Channapattan <
> vrushalic2...@gmail.com> wrote:
>
> > Hi folks,
> >
> > Per earlier discussion [1], I'd like to start a formal vote to merge
> > feature branch YARN-5355 [2] (Timeline Service v.2) to trunk. The vote
> will
> > run for 7 days, and will end August 29 11:00 PM PDT.
> >
> > We have previously completed one merge onto trunk [3] and Timeline
> Service
> > v2 has been part of Hadoop release 3.0.0-alpha1.
> >
> > Since then, we have been working on extending the capabilities of
> Timeline
> > Service v2 in a feature branch [2] for a while, and we are reasonably
> > confident that the state of the feature meets the criteria to be merged
> > onto trunk and we'd love folks to get their hands on it in a test
> capacity
> > and provide valuable feedback so that we can make it production-ready.
> >
> > In a nutshell, Timeline Service v.2 delivers significant scalability and
> > usability improvements based on a new architecture. What we would like to
> > merge to trunk is termed "alpha 2" (milestone 2). The feature has a
> > complete end-to-end read/write flow with security and read level
> > authorization via whitelists. You should be able to start setting it up
> and
> > testing it.
> >
> > At a high level, the following are the key features that have been
> > implemented since alpha1:
> > - Security via Kerberos Authentication and delegation tokens
> > - Read side simple authorization via whitelist
> > - Client configurable entity sort ordering
> > - Richer REST APIs for apps, app attempts, containers, fetching metrics
> by
> > timerange, pagination, sub-app entities
> > - Support for storing sub-application entities (entities that exist
> outside
> > the scope of an application)
> > - Configurable TTLs (time-to-live) for tables, configurable table
> prefixes,
> > configurable hbase cluster
> > - Flow level aggregations done as dynamic (table level) coprocessors
> > - Uses latest stable HBase release 1.2.6
> >
> > There are a total of 82 subtasks that were completed as part of this
> > effort.
> >
> > We paid close attention to ensure that once disabled Timeline Service v.2
> > does not impact existing functionality when disabled (by default).
> >
> > Special thanks to a team of folks who worked hard and contributed towards
> > this effort with patches, reviews and guidance: Rohith Sharma K S, Varun
> > Saxena, Haibo Chen, Sangjin Lee, Li Lu, Vinod Kumar Vavilapalli, Joep
> > Rottinghuis, Jason Lowe, Jian He, Robert Kanter, Micheal Stack.
> >
> > Regards,
> > Vrushali
> >
> > [1] http://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg27383.html
> > [2] https://issues.apache.org/jira/browse/YARN-5355
> > [3] https://issues.apache.org/jira/browse/YARN-2928
> > [4] https://github.com/apache/hadoop/commits/YARN-5355
> >
>


[jira] [Created] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common

2017-08-10 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6936:
-

 Summary: Remove unnecessary dependency of 
hadoop-yarn-server-common from hadoop-mapreduce-client-common 
 Key: MAPREDUCE-6936
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6929) TimelineV2Client hangs MR AM

2017-08-03 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6929.
---
Resolution: Not A Problem

> TimelineV2Client hangs MR AM
> 
>
> Key: MAPREDUCE-6929
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6929
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>    Reporter: Haibo Chen
>
> I happened to misconfigure ATSv2 settings, that is, I enabled emitting to 
> ATSv2 in MR and did not start YARN with ATSv2. The job was stuck after it 
> finished all its work. 
> Noticed that in MRAppMaster, TimelineClient is never added as a service, 
> which I think is why the AM was hanging. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6929) TimelineClient hangs MR AM

2017-08-02 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6929:
-

 Summary: TimelineClient hangs MR AM
 Key: MAPREDUCE-6929
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6929
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am
Reporter: Haibo Chen


I happened to misconfigure ATSv2 settings, that is, I enabled emitting to ATSv2 
in MR and did not start YARN with ATSv2. The job was stuck after it finished 
all its work. 

Noticed that in MRAppMaster, TimelineClient is never added as a service, which 
I think is why the AM was hanging. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2017-08-01 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6926:
-

 Summary: Allow MR jobs to opt out of oversubscription
 Key: MAPREDUCE-6926
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: mrv2
Affects Versions: 3.0.0-alpha3
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6886) Job History File Permissions configurable

2017-05-22 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6886.
---
Resolution: Duplicate

Closing this as a duplicate of MAPREDUCE-6288.  [~Prabhu Joseph] You can 
suggest this in MAPREDUCE-6288, see what people think.

> Job History File Permissions configurable
> -
>
> Key: MAPREDUCE-6886
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6886
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>
> Currently the mapreduce job history files are written with 770 permissions 
> which can be accessed by job user or other user part of hadoop group. 
> Customers has users who are not part of the hadoop group but want to access 
> these history files. We can make it configurable like 770 (Strict) or 755 
> (All) permissions with default 770.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6346) mapred.nativetask.kvtest.KVTest crashes on PPC64LE

2017-05-16 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6346.
---
Resolution: Duplicate

> mapred.nativetask.kvtest.KVTest crashes on PPC64LE
> --
>
> Key: MAPREDUCE-6346
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6346
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
> Environment: RHEL 7.1 - PPC64 LE - OpenJDK 
> rhel-2.5.5.1.ael7b_1-ppc64le u79-b14
>Reporter: Tony Reix
>Assignee: Ayappan
> Attachments: TR
>
>
> Test org.apache.hadoop.mapred.nativetask.kvtest.KVTest (and 5 or 6 other 
> tests) crashes on PPC64LE .
> 
> 15/04/28 10:46:06 INFO Mid-spill: { id: 4, collect: 245 ms, in-memory sort: 
> 32 ms, in-memory records: 48202, merge: 80 ms, uncompressed size: 
> 5031451, real size: 3739319 path: 
> /tmp/hadoop-reixt/mapred/local/localRunner/reixt/jobcache/job_local408221154_0008/attempt_local408221154_0008_m_00_0/output/spill4.out
>  }
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x3fff6c7d8e50, pid=945, tid=70366264881616
> #
> # JRE version: OpenJDK Runtime Environment (7.0_79-b14) (build 
> 1.7.0_79-mockbuild_2015_04_10_10_48-b00)
> # Java VM: OpenJDK 64-Bit Server VM (24.79-b02 mixed mode linux-ppc64 
> compressed oops)
> # Derivative: IcedTea 2.5.5
> # Distribution: Built on Red Hat Enterprise Linux Server release 7.1 (Maipo) 
> (Fri Apr 10 10:48:01 EDT 2015)
> # Problematic frame:
> # C  [libnativetask.so.1.0.0+0x58e50]  
> NativeTask::WritableUtils::ReadVLongInner(char const*, unsigned int&)+0x40
> #
> # Core dump written. Default location: 
> /home/reixt/HADOOP-2.7.0/hadoop-FromApache-Trunk-201504241115/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/core
>  or core.945
> #
> # An error report file with more information is saved as:
> # /tmp/jvm-945/hs_error.log
> #
> # If you would like to submit a bug report, please include
> # instructions on how to reproduce the bug and visit:
> #   http://icedtea.classpath.org/bugzilla
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> /bin/sh: line 1:   945 Aborted (core dumped) 
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79-2.5.5.1.ael7b_1.ppc64le/jre/bin/java 
> -Xmx4096m -XX:MaxPermSize=768m -XX:+HeapDumpOnOutOfMemoryError -jar 
> /home/reixt/HADOOP-2.7.0/hadoop-FromApache-Trunk-201504241115/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/surefire/surefirebooter9078773752877532263.jar
>  
> /home/reixt/HADOOP-2.7.0/hadoop-FromApache-Trunk-201504241115/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/surefire/surefire4138802116387705281tmp
>  
> /home/reixt/HADOOP-2.7.0/hadoop-FromApache-Trunk-201504241115/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/surefire/surefire_01525011254551870798tmp
> /tmp/jvm-945/hs_error.log :
> # C  [libnativetask.so.1.0.0+0x58e50]  
> NativeTask::WritableUtils::ReadVLongInner(char const*, unsigned int&)+0x40



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6872) RM does not blacklist node for AM launch failures

2017-03-29 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6872:
-

 Summary: RM does not blacklist node for AM launch failures
 Key: MAPREDUCE-6872
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6872
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0-alpha2
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)

2017-03-21 Thread Haibo Chen
Thanks Junping for working on the new release!

+1 non-binding

1) Downloaded the source, verified the checksum
2) Built natively from source, and deployed it to a pseudo-distributed
cluster
3) Ran sleep and teragen job and checked both YARN and JHS web UI
4) Played with yarn + mapreduce command lines

Best,
Haibo Chen

On Mon, Mar 20, 2017 at 11:18 AM, Junping Du <j...@hortonworks.com> wrote:

> ?Thanks for update, John. Then we should be OK with fixing this issue in
> 2.8.1.
>
> Mark the target version of HADOOP-14205 to 2.8.1 instead of 2.8.0 and bump
> up to blocker in case we could miss this in releasing 2.8.1. :)
>
>
> Thanks,
>
>
> Junping
>
> 
> From: John Zhuge <jzh...@cloudera.com>
> Sent: Monday, March 20, 2017 10:31 AM
> To: Junping Du
> Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> Subject: Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)
>
> Yes, it only affects ADL. There is a workaround of adding these 2
> properties to core-site.xml:
>
>   
> fs.adl.impl
> org.apache.hadoop.fs.adl.AdlFileSystem
>   
>
>   
> fs.AbstractFileSystem.adl.impl
> org.apache.hadoop.fs.adl.Adl
>   
>
> I have the initial patch ready but hitting these live unit test failures:
>
> Failed tests:
>   
> TestAdlFileSystemContractLive.runTest:60->FileSystemContractBaseTest.testListStatus:257
> expected:<1> but was:<10>
>
> Tests in error:
>   TestAdlFileContextMainOperationsLive>FileContextMainOperationsBaseTest.
> testMkdirsFailsForSubdirectoryOfExistingFile:254 » AccessControl
>   TestAdlFileSystemContractLive.runTest:60->FileSystemContractBaseTest.
> testMkdirsFailsForSubdirectoryOfExistingFile:190 » AccessControl
>
>
> Stay tuned...
>
> John Zhuge
> Software Engineer, Cloudera
>
> On Mon, Mar 20, 2017 at 10:02 AM, Junping Du <j...@hortonworks.com j...@hortonworks.com>> wrote:
>
> Thank you for reporting the issue, John! Does this issue only affect ADL
> (Azure Data Lake) which is a new feature for 2.8 rather than other existing
> FS? If so, I think we can leave the fix to 2.8.1 to fix given this is not a
> regression and just a new feature get broken.?
>
>
> Thanks,
>
>
> Junping
>
> 
> From: John Zhuge <jzh...@cloudera.com<mailto:jzh...@cloudera.com>>
> Sent: Monday, March 20, 2017 9:07 AM
> To: Junping Du
> Cc: common-...@hadoop.apache.org<mailto:common-...@hadoop.apache.org>;
> hdfs-...@hadoop.apache.org<mailto:hdfs-...@hadoop.apache.org>;
> yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>;
> mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>
> Subject: Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)
>
> Discovered https://issues.apache.org/jira/browse/HADOOP-14205 "No
> FileSystem for scheme: adl".
>
> The issue were caused by backporting HADOOP-13037 to branch-2 and earlier.
> HADOOP-12666 should not be backported, but some changes are needed:
> property fs.adl.impl in core-default.xml and hadoop-tools-dist/pom.xml.
>
> I am working on a patch.
>
>
> John Zhuge
> Software Engineer, Cloudera
>
> On Fri, Mar 17, 2017 at 2:18 AM, Junping Du <j...@hortonworks.com<mailto:jd
> u...@hortonworks.com>> wrote:
> Hi all,
>  With fix of HDFS-11431 get in, I've created a new release candidate
> (RC3) for Apache Hadoop 2.8.0.
>
>  This is the next minor release to follow up 2.7.0 which has been
> released for more than 1 year. It comprises 2,900+ fixes, improvements, and
> new features. Most of these commits are released for the first time in
> branch-2.
>
>   More information about the 2.8.0 release plan can be found here:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release
>
>   New RC is available at: http://home.apache.org/~
> junping_du/hadoop-2.8.0-RC3
>
>   The RC tag in git is: release-2.8.0-RC3, and the latest commit id
> is: 91f2b7a13d1e97be65db92ddabc627cc29ac0009
>
>   The maven artifacts are available via repository.apache.org<http://
> repository.apache.org> at: https://repository.apache.org/
> content/repositories/orgapachehadoop-1057
>
>   Please try the release and vote; the vote will run for the usual 5
> days, ending on 03/22/2017 PDT time.
>
> Thanks,
>
> Junping
>
>
>


[jira] [Created] (MAPREDUCE-6851) Terasort does not work with S3 FileSystem

2017-02-24 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6851:
-

 Summary: Terasort does not work with S3 FileSystem
 Key: MAPREDUCE-6851
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6851
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 3.0.0-alpha2
Reporter: Haibo Chen


Terasort currently writes a partition list file in the output directory and add 
it to distributed cache. If a S3 bucket is used as the output directory, the 
job may fail because the partition list file will be localized by NMs which may 
not have the s3 credentials.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6598) LineReader enhencement to support text records contains "\n"

2016-12-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6598.
---
Resolution: Not A Problem

> LineReader enhencement to support text records contains "\n"
> 
>
> Key: MAPREDUCE-6598
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6598
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 2.6.0
> Environment: RHEL 7, Spark 1.3.1, Hadoop 2.6.0
>Reporter: cloudyarea
>Priority: Minor
>
> We have billions of XML message records stored on text files need to be 
> parsed parallel by Spark. By default, Spark open a Hadoop text file using 
> LineReader which provides a single line of text as a record. 
> The XML messages contains "\n" and I believe it is a common scenario - many 
> users have cross-line records. Currently, the solution is to the extend the 
> interface RecordReader.
> To reduce the repeat work, I wrote a class named MessageRecordReader to 
> extend the interface RecordReader, user can set a string as record delimiter, 
> then MessageRecordReader provides a multiple line record to user. 
> I would like to contribute the code to community. Please let me know if you 
> are interested in this simple but useful implementation. 
> Thank you very much and happy new year!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6817) The format of job start time in JHS is different from those of submit and finish time

2016-12-02 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6817:
-

 Summary: The format of job start time in JHS is different from 
those of submit and finish time
 Key: MAPREDUCE-6817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 3.0.0-alpha1
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6815) Fix flaky TestKill.testKillTask()

2016-12-01 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6815:
-

 Summary: Fix flaky TestKill.testKillTask()
 Key: MAPREDUCE-6815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.0.0-alpha1
Reporter: Haibo Chen
Assignee: Haibo Chen


Error Message
Job state is not correct (timedout) expected: but was:
Stacktrace
java.lang.AssertionError: Job state is not correct (timedout) 
expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.apache.hadoop.mapreduce.v2.app.MRApp.waitForState(MRApp.java:416)
at org.apache.hadoop.mapreduce.v2.app.TestKill.testKillTask(TestKill.java:124)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6802) TestKill.testKillJob() fails intermittently on Power

2016-11-16 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6802.
---
Resolution: Duplicate

> TestKill.testKillJob() fails intermittently on Power
> 
>
> Key: MAPREDUCE-6802
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6802
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
> Environment: # uname -a
> Linux pts00452-vm10 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 
> 2015 ppc64le ppc64le ppc64le GNU/Linux
> # cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
>Reporter: Yussuf Shaikh
>
> Running org.apache.hadoop.mapreduce.v2.app.TestKill
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.86 sec <<< 
> FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestKill
> testKillJob(org.apache.hadoop.mapreduce.v2.app.TestKill)  Time elapsed: 0.377 
> sec  <<< FAILURE!
> java.lang.AssertionError: Task state not correct expected: but 
> was:
>at org.junit.Assert.fail(Assert.java:88)
>at org.junit.Assert.failNotEquals(Assert.java:743)
>at org.junit.Assert.assertEquals(Assert.java:118)
>at 
> org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob(TestKill.java:99)
> Results :
> Failed tests:
>  TestKill.testKillJob:99 Task state not correct expected: but 
> was:
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6801) Fix flaky TestKill.testKillJob()

2016-10-26 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6801:
-

 Summary: Fix flaky TestKill.testKillJob()
 Key: MAPREDUCE-6801
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6801
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.0.0-alpha1
Reporter: Haibo Chen
Assignee: Haibo Chen


TestKill.testKillJob often fails for the same reason with the following error 
message:

{code}
1 tests failed.
FAILED:  org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob

Error Message:
Task state not correct expected: but was:

Stack Trace:
java.lang.AssertionError: Task state not correct expected: but 
was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at 
org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob(TestKill.java:84)
{code}
The root cause is that when the job is in KILLED state from an external view, 
TaskKillEvents and TaskAttemptKillEvents placed on the event loop queue may not 
have been processed by the dispatcher thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6791) module dependency from hadoop-mapreduce-client-jobclient to hadoop-mapreduce-client-shuffle is unnecessary

2016-10-11 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6791:
-

 Summary: module dependency from hadoop-mapreduce-client-jobclient 
to hadoop-mapreduce-client-shuffle is unnecessary
 Key: MAPREDUCE-6791
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6791
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.0.0-alpha1
Reporter: Haibo Chen
Assignee: Haibo Chen
Priority: Minor
 Fix For: 3.0.0-alpha2






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Reopened] (MAPREDUCE-6741) add MR support to redact job conf properties

2016-10-04 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reopened MAPREDUCE-6741:
---

Reopen to upload a branch-2.8 patch

> add MR support to redact job conf properties
> 
>
> Key: MAPREDUCE-6741
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6741
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 2.7.2
>    Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: mapreduce6741.001.patch, mapreduce6741.002.patch, 
> mapreduce6741.003.patch, mapreduce6741.004.patch, mapreduce6741.005.patch, 
> mapreduce6741.006.patch
>
>
> JHS today displays all Job conf properties in Web UI directly. Users may have 
> some credentials or any sensitive information they added to the job conf but 
> do not want to be shown in Web UI. It'd be nice if we can allow users to 
> specify a set of properties which JHS will filter out when Job conf is 
> displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6771) Diagnostics information is lost in .jhist if task containers are killed by Node Manager.

2016-08-26 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6771:
-

 Summary: Diagnostics information is lost in .jhist if task 
containers are killed by Node Manager.
 Key: MAPREDUCE-6771
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.7.3
Reporter: Haibo Chen
Assignee: Haibo Chen


Task containers can go over their resource limit, and killed by Node Manager. 
Then MR AM gets notified of the container status and diagnostics information 
through its heartbeat with RM.  However, it is possible that the diagnostics 
information never gets into .jhist file, so when the job completes, the 
diagnostics information associated with the failed task attempts is empty.  
This makes it hard for users to root cause job failures that are often caused 
by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6768) TestRecovery.testSpeculative failed with NPE

2016-08-25 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6768:
-

 Summary: TestRecovery.testSpeculative failed with NPE
 Key: MAPREDUCE-6768
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6768
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Haibo Chen
Assignee: Haibo Chen


1 tests failed.
REGRESSION:  org.apache.hadoop.mapreduce.v2.app.TestRecovery.testSpeculative

Error Message:
null

Stack Trace:
java.lang.NullPointerException: null
at 
org.apache.hadoop.mapreduce.v2.app.TestRecovery.testSpeculative(TestRecovery.java:1201)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6765) MR should not schedule container requests in cases where reducer or mapper containers demand resource larger than the maximum supported

2016-08-23 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6765:
-

 Summary: MR should not schedule container requests in cases where 
reducer or mapper containers demand resource larger than the maximum supported
 Key: MAPREDUCE-6765
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6765
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.2
Reporter: Haibo Chen
Assignee: Haibo Chen
Priority: Minor
 Fix For: 2.9.0


When mapper or reducer containers request resource larger than the 
maxResourceRequest in the cluster, job is to be killed. In such cases, it is 
unnecessary to still schedule container requests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6723) Turn log level to Debug in test

2016-08-16 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6723.
---
Resolution: Won't Fix

> Turn log level to Debug in test
> ---
>
> Key: MAPREDUCE-6723
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6723
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>    Reporter: Haibo Chen
>    Assignee: Haibo Chen
> Attachments: mapreduce6723.001.patch
>
>
> The current log level in test enviroment for all mapreduce projects is info. 
> Often in case where we are investigating intermittent test failures, DEBUG 
> level messages in log file can be very useful to identify problems.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6741) add JHS support to hide job conf properties from Web UI

2016-07-21 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6741:
-

 Summary: add JHS support to hide job conf properties from Web UI
 Key: MAPREDUCE-6741
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6741
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.7.2
Reporter: Haibo Chen
Assignee: Haibo Chen


JHS today display all Job conf properties in Web UI directly. Anyone who has 
access to JHS web UI essentially has access to Job Conf of any other users' 
job. Users may have some credentials or any sensitive information they added to 
the job conf. It'd be nice if we can allow users to specify a set of properties 
which JHS will filter out when Job conf is displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6669) Jobs with encrypted spills don't tolerate AM failures

2016-07-21 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6669.
---
Resolution: Duplicate

> Jobs with encrypted spills don't tolerate AM failures
> -
>
> Key: MAPREDUCE-6669
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6669
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Haibo Chen
>Priority: Critical
>
> The key used for encrypting intermediate data is not persisted anywhere, and 
> hence can't be recovered the same way other MR jobs can be. We should support 
> recovering these jobs as well, hopefully without having to re-run completed 
> tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-5739) DirectoryCollection#createNonExistentDirs() may use an invalid iterator

2016-07-21 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-5739.
---
Resolution: Fixed

> DirectoryCollection#createNonExistentDirs() may use an invalid iterator
> ---
>
> Key: MAPREDUCE-5739
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5739
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>
> Here is related code:
> {code}
> for (final String dir : localDirs) {
>   try {
> createDir(localFs, new Path(dir), perm);
>   } catch (IOException e) {
> LOG.warn("Unable to create directory " + dir + " error " +
> e.getMessage() + ", removing from the list of valid 
> directories.");
> localDirs.remove(dir);
> {code}
> Call to localDirs.remove() modifies Iterable "localDirs" which invalidates 
> the iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-5752) Potential invalid iterator in NMClientImpl#cleanupRunningContainers()

2016-07-21 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-5752.
---
Resolution: Invalid

> Potential invalid iterator in NMClientImpl#cleanupRunningContainers()
> -
>
> Key: MAPREDUCE-5752
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5752
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> In cleanupRunningContainers() :
> {code}
> for (StartedContainer startedContainer : startedContainers.values()) {
>   try {
> stopContainer(startedContainer.getContainerId(),
> startedContainer.getNodeId());
> {code}
> Removal of container is done in removeStartedContainer():
> {code}
> startedContainers.remove(container.containerId);
> {code}
> This may result in invalid iterator for the loop on startedContainers.values()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6131) Integer overflow in RMContainerAllocator results in starvation of applications

2016-07-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6131.
---
Resolution: Invalid

> Integer overflow in RMContainerAllocator results in starvation of applications
> --
>
> Key: MAPREDUCE-6131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Kamal Kc
> Attachments: MAPREDUCE-6131-2.2.0.patch
>
>
> When processing large datasets, Hadoop encounters a scenario where all
>  containers run reduce tasks and no map tasks are scheduled. The 
> application does not fail but rather remains in this state without making 
> any forward progress. It then has to be manually terminated. 
> This bug is due to integer overflow in scheduleReduces() of 
> RMContainerAllocator. The variable netScheduledMapMem overflows for 
> large data sizes, takes negative value, and results in a large 
> finalReduceMemLimit and a large rampup value. In almost all cases, this 
> large rampup value is greater than the total number of reduce tasks. 
> Therefore, the AM tries to assign all reduce tasks. And if the total number 
> of reduce tasks is greater than the total container slots, then all slots are 
> taken up by reduce tasks, leaving none for maps. 
> With 128MB block size and 2GB map container size, overflow occurs with 128 TB 
> data size. An example scenario for the reproduction is: 
> - Input data size of 32TB, block size 128MB, Map container size = 10GB,
> reduce container size = 10GB, #reducers = 50,  cluster mem capacity =  7 x 
> 40GB, slowstart=0.0
> Better resolution might be to change the variables used in 
> RMContainerAllocator from int to long. A simpler fix instead would be to 
> only change the local variables of scheduleReduces() to long data types. 
> Patch is attached for 2.2.0. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6687) Allow specifing java home via job configuration

2016-07-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6687.
---
Resolution: Implemented

> Allow specifing java home via job configuration
> ---
>
> Key: MAPREDUCE-6687
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6687
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: applicationmaster
>Reporter: He Tianyi
>Priority: Minor
>
> Suggest allowing user to use a preferred JVM implementation (or version) by 
> specifying java home via JobConf, to launch Map/Reduce tasks. 
> Especially useful for running A/B tests on real workload or benchmark between 
> JVM implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6740) Enforce mapreduce.task.timeout to be at least mapreduce.task.progress-report.interval

2016-07-20 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6740:
-

 Summary: Enforce mapreduce.task.timeout to be at least 
mapreduce.task.progress-report.interval
 Key: MAPREDUCE-6740
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6740
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am
Affects Versions: 2.8.0
Reporter: Haibo Chen
Assignee: Haibo Chen
Priority: Minor


Mapreduce-6242 makes task status update interval configurable to ease the 
pressure on MR AM to process status updates, but it did not ensure that 
mapreduce.task.timeout is no smaller than the configured value of task report 
interval. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6739) allow specifying range on the port that MR AM web server binds to

2016-07-20 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6739:
-

 Summary: allow specifying range on the port that MR AM web server 
binds to
 Key: MAPREDUCE-6739
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6739
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Haibo Chen
Assignee: Haibo Chen


MR AM web server binds itself to an arbitrary port.  This means if the RM web 
proxy lives outside of a cluster, the whole port range needs to be wide open. 
It'd be nice to reuse yarn.app.mapreduce.am.job.client.port-range to place a 
port range restriction on MR AM web server, so that connection from outside the 
cluster can be restricted within a range of ports.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection

2016-07-05 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6728:
-

 Summary: Give fetchers hint when ShuffleHandler rejects a 
shuffling connection
 Key: MAPREDUCE-6728
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: Haibo Chen
Assignee: Haibo Chen


If # of open shuffle connection to a node goes over the max, ShuffleHandler 
closes the connection immediately without giving fetchers any hint of the 
reason, which causes fetchers to fail due to exceptions 

java.net.SocketException: Unexpected end of file from server
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)


OR 

java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java

Such failures are counted as fetcher failures



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6724) Unsafe conversion from long to int in MergeManagerImpl.unconditionalReserve()

2016-06-22 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6724:
-

 Summary: Unsafe conversion from long to int in 
MergeManagerImpl.unconditionalReserve()
 Key: MAPREDUCE-6724
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6724
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6723) Turn log level to Debug in test

2016-06-22 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6723:
-

 Summary: Turn log level to Debug in test
 Key: MAPREDUCE-6723
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6723
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Haibo Chen
Assignee: Haibo Chen


The current log level in test enviroment for all mapreduce projects is info. 
Often in case where we are investigating intermittent test failures, DEBUG 
level messages in log file can be very useful to identify problems.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Reopened] (MAPREDUCE-6718) add progress log to JHS during startup

2016-06-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reopened MAPREDUCE-6718:
---

> add progress log to JHS during startup
> --
>
> Key: MAPREDUCE-6718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6718
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>    Reporter: Haibo Chen
>    Assignee: Haibo Chen
>Priority: Minor
>  Labels: supportability
>
> When the JHS starts up, it initializes the internal caches and storage via 
> the HistoryFileManager. If we have a large number of existing finished jobs 
> then we could spent minutes in this startup phase without logging progress:
> 2016-03-14 10:56:01,444 INFO 
> org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file 
> system [hdfs://hadoopcdh.itnas01.ieee.org:8020]
> 2016-03-14 10:56:11,455 INFO 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Initializing Existing 
> Jobs...
> 2016-03-14 12:01:36,926 INFO 
> org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage: CachedHistoryStorage 
> Init
> This makes it really difficult to assess if things are working correctly (it 
> looks hung). We can add logs to notify users of progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6718) add progress log to JHS during startup

2016-06-18 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6718.
---
Resolution: Not A Problem

Turns out  MAPREDUCE-6059 limits the number of files to load in the cache. We 
will not see the long starting time as a result.

> add progress log to JHS during startup
> --
>
> Key: MAPREDUCE-6718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6718
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>    Reporter: Haibo Chen
>    Assignee: Haibo Chen
>Priority: Minor
>  Labels: supportability
>
> When the JHS starts up, it initializes the internal caches and storage via 
> the HistoryFileManager. If we have a large number of existing finished jobs 
> then we could spent minutes in this startup phase without logging progress:
> 2016-03-14 10:56:01,444 INFO 
> org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file 
> system [hdfs://hadoopcdh.itnas01.ieee.org:8020]
> 2016-03-14 10:56:11,455 INFO 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Initializing Existing 
> Jobs...
> 2016-03-14 12:01:36,926 INFO 
> org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage: CachedHistoryStorage 
> Init
> This makes it really difficult to assess if things are working correctly (it 
> looks hung). We can add logs to notify users of progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6698) Increase timeout on TestUnnecessaryBlockingOnHistoryFileInfo.testTwoThreadsQueryingDifferentJobOfSameUser

2016-05-16 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6698:
-

 Summary: Increase timeout on 
TestUnnecessaryBlockingOnHistoryFileInfo.testTwoThreadsQueryingDifferentJobOfSameUser
 Key: MAPREDUCE-6698
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6698
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.7.3
Reporter: Haibo Chen
Assignee: Haibo Chen


The timeout on 
TestUnnecessaryBlockingOnHistoryFileInfo.testTwoThreadsQueryingDifferentJobOfSameUser
 is added to verify the fix of MAPREDUCE-6684 works. When two thread are 
requesting different jobs owned by the same user, one thread request jobA 
should not be blocked by the other that is processing a large job jobB. The 
timeout exception, if happened, should ideally indicate the fix does not work. 
But the timeout period is set too aggressive, so the test always fails  on slow 
VMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6681) TestUberAM fails intermittently

2016-05-05 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6681.
---
Resolution: Fixed

MAPREDUCE-6677 has been committed to fix the issue reported here.

> TestUberAM  fails intermittently 
> -
>
> Key: MAPREDUCE-6681
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6681
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>    Assignee: Haibo Chen
>
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.verifySleepJobCounters(TestMRJobs.java:474)
>   at 
> org.apache.hadoop.mapreduce.v2.TestUberAM.verifySleepJobCounters(TestUberAM.java:71)
> {noformat}
> *PreCommit Build* 
> https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6434/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6684) High contention on scanning of user directory under immediate_done in Job History Server

2016-04-20 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6684:
-

 Summary: High contention on scanning of user directory under 
immediate_done in Job History Server
 Key: MAPREDUCE-6684
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6684
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.7.0
Reporter: Haibo Chen
Assignee: Haibo Chen
Priority: Critical


HistoryFileManager.scanIntermediateDirectory() in JHS acquires a lock on each 
user directory it tries to scan (move or delete files under the user directory 
as necessary). This method is called in a thread in JobHistory that performs 
periodical scanning of intermediate directory, and can also be called by web 
server threads for each Web API call made by a JHS client. In cases where there 
are many concurrent Web API calls/connections to JHS, all but one thread are 
blocked on the lock on the user directory. Eventually, client connects will 
time out, but the threads in JHS will not be killed and leave a lot of TCP 
connections in CLOSE_WAIT state. 

[systest@vb1120 ~]$ sudo netstat -nap | grep 63729 | sort -k 4
tcp0  0 10.17.202.19:10020  0.0.0.0:*   
LISTEN  63729/java  
tcp0  0 10.17.202.19:10020  10.17.198.30:33010  
ESTABLISHED 63729/java  
tcp0  0 10.17.202.19:10020  10.17.200.30:33980  
ESTABLISHED 63729/java  
tcp0  0 10.17.202.19:10020  10.17.202.10:59625  
ESTABLISHED 63729/java  
tcp0  0 10.17.202.19:10020  10.17.202.13:35765  
ESTABLISHED 63729/java  
tcp0  0 10.17.202.19:10033  0.0.0.0:*   
LISTEN  63729/java  
tcp0  0 10.17.202.19:19888  0.0.0.0:*   
LISTEN  63729/java  
tcp0  0 10.17.202.19:19888  10.17.198.30:35103  
ESTABLISHED 63729/java  
tcp  277  0 10.17.202.19:19888  10.17.198.30:43670  
ESTABLISHED 63729/java  
tcp0  0 10.17.202.19:19888  10.17.198.30:45453  
ESTABLISHED 63729/java  
tcp  277  0 10.17.202.19:19888  10.17.198.30:49184  
ESTABLISHED 63729/java  
tcp1  0 10.17.202.19:19888  10.17.202.13:49992  
CLOSE_WAIT  63729/java  
tcp  261  0 10.17.202.19:19888  10.17.202.13:52703  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52707  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52708  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52710  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52714  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52723  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52726  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52727  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52739  
CLOSE_WAIT  63729/java  
tcp  261  0 10.17.202.19:19888  10.17.202.13:52749  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52753  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52757  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52760  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52820  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52827  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52829  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52831  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52833  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52836  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52839  
CLOSE_WAIT  63729/java  
tcp  256  0 10.17.202.19:19888  10.17.202.13:52841  
CLOSE_WAIT  63729/java  
tcp  261  0 10.17.202.19:19888  10.17.202.13:52843  
CLOSE_WAIT  63729/java  
tcp

[jira] [Created] (MAPREDUCE-6677) LocalContainerAllocator doesn't specify resource of the containers allocated.

2016-04-15 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6677:
-

 Summary: LocalContainerAllocator doesn't specify resource of the 
containers allocated.
 Key: MAPREDUCE-6677
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6677
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6675) TestJobImpl.testUnusableNode failed

2016-04-14 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6675:
-

 Summary: TestJobImpl.testUnusableNode failed 
 Key: MAPREDUCE-6675
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6675
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.7.3
Reporter: Haibo Chen
Assignee: Haibo Chen


TestJobImpl#testUnusableNodeTransition is flaky.

2016-02-13 09:16:42 Running 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 8.324 sec <<< FAILURE! - in 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50 
testUnusableNodeTransition(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl)
  Time elapsed: 5.165 sec  <<< FAILURE!
2016-02-13 09:16:50 java.lang.AssertionError: expected: but 
was:
2016-02-13 09:16:50 at org.junit.Assert.fail(Assert.java:88)
2016-02-13 09:16:50 at org.junit.Assert.failNotEquals(Assert.java:743)
2016-02-13 09:16:50 at org.junit.Assert.assertEquals(Assert.java:118)
2016-02-13 09:16:50 at org.junit.Assert.assertEquals(Assert.java:144)
2016-02-13 09:16:50 at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:977)
2016-02-13 09:16:50 at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:627)
2016-02-13 09:16:50 
2016-02-13 09:16:50 
2016-02-13 09:16:50 Results :
2016-02-13 09:16:50 
2016-02-13 09:16:50 Failed tests: 
2016-02-13 09:16:50   
TestJobImpl.testUnusableNodeTransition:627->assertJobState:977 
expected: but was:
2016-02-13 09:16:50 
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0.


Looking at the code, an JobUpdatedNodesEvent is handled by putting an 
TaskAttemptKill event on the async dispatcher queue and return immediately, but 
the event might not have been processed by the time  all JobTaskEvents events 
are seen by the job (the jobTaskSucceeded events are handed to Job immediately 
without going through the dispatcher). Therefore, there is a slight chance that 
the job will see all three succeeded attempts and  transition to Committing 
state before the taskAttemptKill event is handled by the dispatcher. Committing 
jobs will reject later JobTaskEvents received and causing the failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6657) job history server can fail on startup when NameNode is in start phase

2016-03-22 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6657:
-

 Summary: job history server can fail on startup when NameNode is 
in start phase
 Key: MAPREDUCE-6657
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Haibo Chen
Assignee: Haibo Chen


Job history server will try to create a history directory in HDFS on startup. 
When NameNode is in safe mode, it will keep retrying for a configurable time 
period.  However, it should also keeps retrying if the name node is in start 
state. Safe mode does not happen until the NN is out of the startup phase. 

A RetriableException with the text "NameNode still not started" is thrown when 
the NN is in its internal service startup phase. We should add the check for 
this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6652) Add configuration property to prevent JHS from loading jobs with a task count greater than X

2016-03-19 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6652:
-

 Summary: Add configuration property to prevent JHS from loading 
jobs with a task count greater than X
 Key: MAPREDUCE-6652
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6652
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Reporter: Haibo Chen
Assignee: Haibo Chen


Jobs with large number of tasks can have job history files that are large in 
size and resource-consuming(mainly memory) to parse in Job History Server. If 
there are many such jobs, the job history server can very easily hang.

It would be a good usability feature if we added a new config property that 
could be set to X, where the JHS wouldn't load the details for a job with more 
than X tasks. The job would still show up on the list of jobs page, but 
clicking on it would give a warning message that the job is too big, instead of 
actually loading the job. This way we can prevent users from loading a job 
that's way too big for the JHS, which currently makes the JHS hang. The default 
value can be -1 so that it's disabled.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6647) MR usage counters use the resources requested instead of the resources allocated

2016-03-02 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6647:
-

 Summary: MR usage counters use the resources requested instead of 
the resources allocated
 Key: MAPREDUCE-6647
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6647
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Haibo Chen
Assignee: Haibo Chen


As can be seen in the following snippet, the MR counters for usage use the 
resources requested instead of the resources allocated. The scheduler 
increment-allocation-mb configs could lead to these values not being the same. 
We could change the counters to use the allocated resources in order to account 
for this.

{code}
  private static void updateMillisCounters(JobCounterUpdateEvent jce,
  TaskAttemptImpl taskAttempt) {
   /***omitted**/
  long duration = (taskAttempt.getFinishTime() - taskAttempt.getLaunchTime());
int mbRequired =
taskAttempt.getMemoryRequired(taskAttempt.conf, taskType);
int vcoresRequired = taskAttempt.getCpuRequired(taskAttempt.conf, taskType);

int minSlotMemSize = taskAttempt.conf.getInt(
  YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
  YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);

int simSlotsRequired =
minSlotMemSize == 0 ? 0 : (int) Math.ceil((float) mbRequired
/ minSlotMemSize);

if (taskType == TaskType.MAP) {
  jce.addCounterUpdate(JobCounter.SLOTS_MILLIS_MAPS, simSlotsRequired * 
duration);
  jce.addCounterUpdate(JobCounter.MB_MILLIS_MAPS, duration * mbRequired);
  jce.addCounterUpdate(JobCounter.VCORES_MILLIS_MAPS, duration * 
vcoresRequired);
  jce.addCounterUpdate(JobCounter.MILLIS_MAPS, duration);
} else {
  jce.addCounterUpdate(JobCounter.SLOTS_MILLIS_REDUCES, simSlotsRequired * 
duration);
  jce.addCounterUpdate(JobCounter.MB_MILLIS_REDUCES, duration * mbRequired);
  jce.addCounterUpdate(JobCounter.VCORES_MILLIS_REDUCES, duration * 
vcoresRequired);
  jce.addCounterUpdate(JobCounter.MILLIS_REDUCES, duration);
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Question about Retroactive[Failure/Killed]Transition in TaskImpl

2016-03-01 Thread Haibo Chen
Hi All,

I was trying to read TaskImpl.java source code and came across the
RetroactiveFailureTransition and RetroactiveKilledTransition when the Task
is in SUCCEEDED state.

This seems a bit weird to me. How can a task, after it has succeeded,
transition to Failed/Schedule state again. Can anyone shed some light on
what the purpose of these two Retroactive Transitions is?  Thanks a lot in
advance for your explanation.

Best,
Haibo Chen


[jira] [Created] (MAPREDUCE-6643) org.apache.hadoop.mapred.TestTextInputFormat.testSplitableCodecs failed

2016-02-23 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6643:
-

 Summary: 
org.apache.hadoop.mapred.TestTextInputFormat.testSplitableCodecs failed
 Key: MAPREDUCE-6643
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6643
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Haibo Chen
Assignee: Haibo Chen


Unit test TestTextInputFormat.testSplitableCodecs() failed when the seed is  
-839658807.

Stacktrace
java.lang.AssertionError: Key in multiple partitions.
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at 
org.apache.hadoop.mapred.TestTextInputFormat.testSplitableCodecs(TestTextInputFormat.java:223)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)