Re: Next releases

2013-11-08 Thread Jun Ping Du
Hi Arun,
   Thanks for working out this list which looks great to me. In addition, I 
would like to add an item: YARN-291 to 2.3 release which enhance YARN's 
resource elasticity in cloud scenario and can benefit other scenarios i.e. 
graceful NM decommission (YARN-914), non job/app regression (or maintenance 
model) in NM rolling upgrade (YARN-671), etc. With great help from Luke, Bikas 
and Vinod, we already get the first and the most important work (YARN-311) in. 
Now, I am working on the left parts include: interfaces (RPC, CLI, REST, etc.) 
and a few enhancements (persistent, supporting different policies, etc.) and be 
optimistic on completing most of work by the end of 2013. Would you help to 
embrace it in if we can make it on time? :)

Thanks,

Junping

- Original Message -
From: Arun C Murthy a...@hortonworks.com
To: common-...@hadoop.apache.org, hdfs-...@hadoop.apache.org, 
yarn-...@hadoop.apache.org, mapreduce-dev@hadoop.apache.org
Sent: Friday, November 8, 2013 10:42:36 AM
Subject: Next releases

Gang,

 Thinking through the next couple of releases here, appreciate f/b.

 # hadoop-2.2.1

 I was looking through commit logs and there is a *lot* of content here (81 
commits as on 11/7). Some are features/improvements and some are fixes - it's 
really hard to distinguish what is important and what isn't.

 I propose we start with a blank slate (i.e. blow away branch-2.2 and start 
fresh from a copy of branch-2.2.0)  and then be very careful and meticulous 
about including only *blocker* fixes in branch-2.2. So, most of the content 
here comes via the next minor release (i.e. hadoop-2.3)

 In future, we continue to be *very* parsimonious about what gets into a patch 
release (major.minor.patch) - in general, these should be only *blocker* fixes 
or key operational issues.

 # hadoop-2.3
 
 I'd like to propose the following features for YARN/MR to make it into 
hadoop-2.3 and punt the rest to hadoop-2.4 and beyond:
 * Application History Server - This is happening in  a branch and is close; 
with it we can provide a reasonable experience for new frameworks being built 
on top of YARN.
 * Bug-fixes in RM Restart
 * Minimal support for long-running applications (e.g. security) via YARN-896
 * RM Fail-over via ZKFC
 * Anything else?

 HDFS???

 Overall, I feel like we have a decent chance of rolling hadoop-2.3 by the end 
of the year.

 Thoughts?

thanks,
Arun
 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You


Re: [VOTE] Release Apache Hadoop 2.2.0

2013-10-09 Thread Jun Ping Du
+1 (non-binding). 
Test build and deploy it on a tiny cluster and run a few jobs.

Thanks,

Junping 

- Original Message -
From: Arun C Murthy a...@hortonworks.com
To: common-...@hadoop.apache.org, hdfs-...@hadoop.apache.org, 
yarn-...@hadoop.apache.org, mapreduce-dev@hadoop.apache.org
Sent: Monday, October 7, 2013 3:00:52 PM
Subject: [VOTE] Release Apache Hadoop 2.2.0

Folks,

I've created a release candidate (rc0) for hadoop-2.2.0 that I would like to 
get released - this release fixes a small number of bugs and some protocol/api 
issues which should ensure they are now stable and will not change in 
hadoop-2.x.

The RC is available at: http://people.apache.org/~acmurthy/hadoop-2.2.0-rc0
The RC tag in svn is here: 
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.2.0-rc0

The maven artifacts are available via repository.apache.org.

Please try the release and vote; the vote will run for the usual 7 days.

thanks,
Arun

P.S.: Thanks to Colin, Andrew, Daryn, Chris and others for helping nail down 
the symlinks-related issues. I'll release note the fact that we have disabled 
it in 2.2. Also, thanks to Vinod for some heavy-lifting on the YARN side in the 
last couple of weeks.





--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You


Re: [ACTION NEEDED]: protoc 2.5.0 in trunk/branch-2/branch-2.1-beta/branch-2.1.0-beta

2013-08-16 Thread Jun Ping Du
Hi Tsuyoshi,
   I just checked Hadoop wiki on HowToContribute and it points ProtocolBuffer 
things to YARN Readme which is already updated to 2.5.0 now.

Thanks,

Junping

- Original Message -
From: Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com
To: hdfs-...@hadoop.apache.org
Cc: common-...@hadoop.apache.org, yarn-...@hadoop.apache.org, 
mapreduce-dev@hadoop.apache.org
Sent: Friday, August 16, 2013 1:55:23 PM
Subject: Re: [ACTION NEEDED]: protoc 2.5.0 in 
trunk/branch-2/branch-2.1-beta/branch-2.1.0-beta

Thanks for sharing! We also need to update Wiki or some documents, don't we?
http://wiki.apache.org/hadoop/HowToContribute

On Thu, Aug 15, 2013 at 8:03 AM, Alejandro Abdelnur t...@cloudera.com wrote:
 Following up on this.

 HADOOP-9845  HADOOP-9872 have been committed
 to trunk/branch-2/branch-2.1-beta/branch-2.1.0-beta.

 All Hadoop developers must install protoc 2.5.0 in their development
 machines for the build to run.

 All Hadoop jenkins boxes are using protoc 2.5.0

 The BUILDING.txt file has been updated to reflect that protoc 2.5.0 is the
 required one and includes instructions on how to use a different protoc
 from multiple local versions (using an ENV var). This may be handy for
 folks working with Hadoop versions using protoc 2.4.1.

 INTERIM SOLUTION IF YOU CANNOT UPGRADE TO PROTOC 2.5.0 IMMEDIATELY

 Use the following option with all your Maven commands
  '-Dprotobuf.version=2.4.1'.

 Note that this option will make the build use protoc and protobuf 2.4.1.

 Though you should upgrade to 2.5.0 at the earliest.

 As soon as we start using the new goodies from protobuf 2.5.0 (like the
 non-copy bytearrays) 2.4.1 will not work anymore.

 Thanks and apologies again for the noise through out this change.

 --
 Alejandro



-- 
- Tsuyoshi


Re: Cannot create a new Jira issue for MapReduce

2012-08-12 Thread Jun Ping Du
Thanks Ted. Those are very good suggestions as backup solutions when JIRA is 
down.
Besides alleviating the impact of JIRA downtime as you mentioned above, do we 
think of some way to keep JIRA system highly available? It is a little 
embarrassing that we deliver all kinds of HA systems to rest of world, but we 
suffering from this. :(

- Original Message -
From: Ted Yu yuzhih...@gmail.com
To: mapreduce-dev@hadoop.apache.org
Cc: hdfs-...@hadoop.apache.org, common-...@hadoop.apache.org
Sent: Sunday, August 12, 2012 12:17:36 PM
Subject: Re: Cannot create a new Jira issue for MapReduce

I made some suggestions to hbase dev mailing list a few weeks ago. The
following suggestion is about hbase development which can be extrapolated
to other Apache projects.


People can continue discussion through dev mailing list when JIRA is down.
When JIRA comes back up, transcript of such discussion can be posted back
on related issues.
Use of https://reviews.apache.org is encouraged. The review board wasn't
affected by JIRA downtime.
Running test suite by contributors and committers is encouraged which
alleviates the burden on Hadoop QA.

Goal for the above suggestions is for alleviating the impact of JIRA down
time.

BTW I have kept notifications from iss...@hbase.apache.org in my Inbox.
This shows benefit when JIRA is down.

Cheers

On Sat, Aug 11, 2012 at 7:14 PM, Jun Ping Du j...@vmware.com wrote:

 Yes. I saw JIRA is in maintenance now and the schedule is as below:

 Host Name   Service Entry Time  Author  Comment Start Time
  End TimeTypeDurationDowntime ID Trigger ID
  Actions
 ull.zones.apache.orgIssues - JIRA - General 2012-08-11 19:06:08
 danielshMigrating to a different physical host  2012-08-11 19:06:08
 2012-08-13 19:06:08 Fixed   2d 0h 0m 0s 1663N/A
 Delete/Cancel This Scheduled Downtime Entry

 Looks like it will take 2 days to migrate to a different host. As JIRA is
 a key component to dev process in community, do we think of some ways to
 lower the maintenance overhead?


 Thanks,

 Junping

 - Original Message -
 From: Steve Loughran steve.lough...@gmail.com
 To: mapreduce-dev@hadoop.apache.org
 Sent: Friday, August 10, 2012 7:33:04 AM
 Subject: Re: Cannot create a new Jira issue for MapReduce

 There's been disk problems w/ Jira recently. Githubs been playing up
 this morning to. Time to put away the dev tools and get powerpoint out
 instead

 On 9 August 2012 13:38, Robert Evans ev...@yahoo-inc.com wrote:
  It is a bit worse then that though.  I found that it did create the JIRA,
  but it is in a bad state where you cannot put it in patch available or
  close it. So we may need to do some cleanup of these JIRAs later.
 
  --Bobby
 
  On 8/9/12 3:19 PM, Ted Yu yuzhih...@gmail.com wrote:
 
 This has been reported by HBase developers as well.
 
 See https://issues.apache.org/jira/browse/INFRA-5131
 
 On Thu, Aug 9, 2012 at 1:10 PM, Benoy Antony bant...@gmail.com wrote:
 
  Hi,
 
  I am getting the following error when I try to create a Jira issue.
 
  Error creating issue: com.atlassian.jira.util.RuntimeIOException:
  java.io.IOException: read past EOF
 
  Anyone else face the same problem ?
 
  Thanks ,
  Benoy
 
 



Re: Cannot create a new Jira issue for MapReduce

2012-08-11 Thread Jun Ping Du
Yes. I saw JIRA is in maintenance now and the schedule is as below:

Host Name   Service Entry Time  Author  Comment Start Time  End 
TimeTypeDurationDowntime ID Trigger ID  Actions
ull.zones.apache.orgIssues - JIRA - General 2012-08-11 19:06:08 
danielshMigrating to a different physical host  2012-08-11 19:06:08 
2012-08-13 19:06:08 Fixed   2d 0h 0m 0s 1663N/A Delete/Cancel 
This Scheduled Downtime Entry

Looks like it will take 2 days to migrate to a different host. As JIRA is a key 
component to dev process in community, do we think of some ways to lower the 
maintenance overhead? 


Thanks,

Junping

- Original Message -
From: Steve Loughran steve.lough...@gmail.com
To: mapreduce-dev@hadoop.apache.org
Sent: Friday, August 10, 2012 7:33:04 AM
Subject: Re: Cannot create a new Jira issue for MapReduce

There's been disk problems w/ Jira recently. Githubs been playing up
this morning to. Time to put away the dev tools and get powerpoint out
instead

On 9 August 2012 13:38, Robert Evans ev...@yahoo-inc.com wrote:
 It is a bit worse then that though.  I found that it did create the JIRA,
 but it is in a bad state where you cannot put it in patch available or
 close it. So we may need to do some cleanup of these JIRAs later.

 --Bobby

 On 8/9/12 3:19 PM, Ted Yu yuzhih...@gmail.com wrote:

This has been reported by HBase developers as well.

See https://issues.apache.org/jira/browse/INFRA-5131

On Thu, Aug 9, 2012 at 1:10 PM, Benoy Antony bant...@gmail.com wrote:

 Hi,

 I am getting the following error when I try to create a Jira issue.

 Error creating issue: com.atlassian.jira.util.RuntimeIOException:
 java.io.IOException: read past EOF

 Anyone else face the same problem ?

 Thanks ,
 Benoy




Can someone review MAPREDUCE-4309 and MAPREDUCE-4310?

2012-08-01 Thread Jun Ping Du
These two patches are for Hadoop Network Topology extension (YARN part) for 
virtualization environment.

Thanks,

Junping

- Original Message -
From: Jun Ping Du j...@vmware.com
To: common-...@hadoop.apache.org, hdfs-...@hadoop.apache.org, 
mapreduce-dev@hadoop.apache.org
Cc: Mark Pollack mpoll...@vmware.com, Jurgen Leschner 
jlesch...@vmware.com, Richard McDougall r...@vmware.com
Sent: Monday, June 4, 2012 11:48:35 PM
Subject: Make Hadoop NetworkTopology and data locality more pluggable for other 
deploying topology like: virtualization.

Hello Folks,
  I just filed a Umbrella jira today to address current NetworkTopology 
issue that binding strictly to three tier network. The motivation here is to 
make hadoop more flexible for deploying topology (especially for 
cloud/virtualization case) and more configurable in data locality related 
policies like: replica placement, task scheduling, choosing block for DFSClient 
reading, balancing. 
  We submit a draft proposal in this Umbrella as well as the implementation 
code. As code base is large (~260K), the code is separated into 7 sub JIRA 
issues which seems to be more convenient for reviewing. However, we split the 
code based on functionality which cause some dependencies between patches which 
way we are not sure the best. Welcome to provide comments and suggestions on 
doc and code, and look forward to work with all of you to enhance hadoop in 
some new situations towards perfect.
  Hope this is a good start.

Cheers,

Junping

- Original Message -
From: Junping Du (JIRA) j...@apache.org
To: common-iss...@hadoop.apache.org
Sent: Monday, June 4, 2012 12:09:22 PM
Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support 
different failure and locality topologies

Junping Du created HADOOP-8468:
--

 Summary: Umbrella of enhancements to support different failure and 
locality topologies
 Key: HADOOP-8468
 URL: https://issues.apache.org/jira/browse/HADOOP-8468
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha, io
Affects Versions: 2.0.0-alpha, 1.0.0
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical


The current hadoop network topology (described in some previous issues like: 
Hadoop-692) works well in classic three-tiers network when it comes out. 
However, it does not take into account other failure models or changes in the 
infrastructure that can affect network bandwidth efficiency like: 
virtualization. 
Virtualized platform has following genes that shouldn't been ignored by hadoop 
topology in scheduling tasks, placing replica, do balancing or fetching block 
for reading: 
1. VMs on the same physical host are affected by the same hardware failure. In 
order to match the reliability of a physical deployment, replication of data 
across two virtual machines on the same host should be avoided.
2. The network between VMs on the same physical host has higher throughput and 
lower latency and does not consume any physical switch bandwidth.
Thus, we propose to make hadoop network topology extend-able and introduce a 
new level in the hierarchical topology, a node group level, which maps well 
onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: PreCommit-Admin not running

2012-07-02 Thread Jun Ping Du
Move to dev alias, it seems to stop working since weekend. 

Thanks,

Junping

- Original Message -
From: Kihwal Lee kih...@yahoo-inc.com
To: gene...@hadoop.apache.org
Sent: Tuesday, July 3, 2012 3:59:28 AM
Subject: PreCommit-Admin not running

It looks like the PreCommit-Admin build job is not running.
Can anyone give it a gentle nudge?

Kihwal



Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization.

2012-06-04 Thread Jun Ping Du
Hello Folks,
  I just filed a Umbrella jira today to address current NetworkTopology 
issue that binding strictly to three tier network. The motivation here is to 
make hadoop more flexible for deploying topology (especially for 
cloud/virtualization case) and more configurable in data locality related 
policies like: replica placement, task scheduling, choosing block for DFSClient 
reading, balancing. 
  We submit a draft proposal in this Umbrella as well as the implementation 
code. As code base is large (~260K), the code is separated into 7 sub JIRA 
issues which seems to be more convenient for reviewing. However, we split the 
code based on functionality which cause some dependencies between patches which 
way we are not sure the best. Welcome to provide comments and suggestions on 
doc and code, and look forward to work with all of you to enhance hadoop in 
some new situations towards perfect.
  Hope this is a good start.

Cheers,

Junping

- Original Message -
From: Junping Du (JIRA) j...@apache.org
To: common-iss...@hadoop.apache.org
Sent: Monday, June 4, 2012 12:09:22 PM
Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support 
different failure and locality topologies

Junping Du created HADOOP-8468:
--

 Summary: Umbrella of enhancements to support different failure and 
locality topologies
 Key: HADOOP-8468
 URL: https://issues.apache.org/jira/browse/HADOOP-8468
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha, io
Affects Versions: 2.0.0-alpha, 1.0.0
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical


The current hadoop network topology (described in some previous issues like: 
Hadoop-692) works well in classic three-tiers network when it comes out. 
However, it does not take into account other failure models or changes in the 
infrastructure that can affect network bandwidth efficiency like: 
virtualization. 
Virtualized platform has following genes that shouldn't been ignored by hadoop 
topology in scheduling tasks, placing replica, do balancing or fetching block 
for reading: 
1. VMs on the same physical host are affected by the same hardware failure. In 
order to match the reliability of a physical deployment, replication of data 
across two virtual machines on the same host should be avoided.
2. The network between VMs on the same physical host has higher throughput and 
lower latency and does not consume any physical switch bandwidth.
Thus, we propose to make hadoop network topology extend-able and introduce a 
new level in the hierarchical topology, a node group level, which maps well 
onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira