Re: Next releases
Hi Arun, Thanks for working out this list which looks great to me. In addition, I would like to add an item: YARN-291 to 2.3 release which enhance YARN's resource elasticity in cloud scenario and can benefit other scenarios i.e. graceful NM decommission (YARN-914), non job/app regression (or maintenance model) in NM rolling upgrade (YARN-671), etc. With great help from Luke, Bikas and Vinod, we already get the first and the most important work (YARN-311) in. Now, I am working on the left parts include: interfaces (RPC, CLI, REST, etc.) and a few enhancements (persistent, supporting different policies, etc.) and be optimistic on completing most of work by the end of 2013. Would you help to embrace it in if we can make it on time? :) Thanks, Junping - Original Message - From: Arun C Murthy a...@hortonworks.com To: common-...@hadoop.apache.org, hdfs-...@hadoop.apache.org, yarn-...@hadoop.apache.org, mapreduce-dev@hadoop.apache.org Sent: Friday, November 8, 2013 10:42:36 AM Subject: Next releases Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or key operational issues. # hadoop-2.3 I'd like to propose the following features for YARN/MR to make it into hadoop-2.3 and punt the rest to hadoop-2.4 and beyond: * Application History Server - This is happening in a branch and is close; with it we can provide a reasonable experience for new frameworks being built on top of YARN. * Bug-fixes in RM Restart * Minimal support for long-running applications (e.g. security) via YARN-896 * RM Fail-over via ZKFC * Anything else? HDFS??? Overall, I feel like we have a decent chance of rolling hadoop-2.3 by the end of the year. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You
Re: [VOTE] Release Apache Hadoop 2.2.0
+1 (non-binding). Test build and deploy it on a tiny cluster and run a few jobs. Thanks, Junping - Original Message - From: Arun C Murthy a...@hortonworks.com To: common-...@hadoop.apache.org, hdfs-...@hadoop.apache.org, yarn-...@hadoop.apache.org, mapreduce-dev@hadoop.apache.org Sent: Monday, October 7, 2013 3:00:52 PM Subject: [VOTE] Release Apache Hadoop 2.2.0 Folks, I've created a release candidate (rc0) for hadoop-2.2.0 that I would like to get released - this release fixes a small number of bugs and some protocol/api issues which should ensure they are now stable and will not change in hadoop-2.x. The RC is available at: http://people.apache.org/~acmurthy/hadoop-2.2.0-rc0 The RC tag in svn is here: http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.2.0-rc0 The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. thanks, Arun P.S.: Thanks to Colin, Andrew, Daryn, Chris and others for helping nail down the symlinks-related issues. I'll release note the fact that we have disabled it in 2.2. Also, thanks to Vinod for some heavy-lifting on the YARN side in the last couple of weeks. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You
Re: [ACTION NEEDED]: protoc 2.5.0 in trunk/branch-2/branch-2.1-beta/branch-2.1.0-beta
Hi Tsuyoshi, I just checked Hadoop wiki on HowToContribute and it points ProtocolBuffer things to YARN Readme which is already updated to 2.5.0 now. Thanks, Junping - Original Message - From: Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com To: hdfs-...@hadoop.apache.org Cc: common-...@hadoop.apache.org, yarn-...@hadoop.apache.org, mapreduce-dev@hadoop.apache.org Sent: Friday, August 16, 2013 1:55:23 PM Subject: Re: [ACTION NEEDED]: protoc 2.5.0 in trunk/branch-2/branch-2.1-beta/branch-2.1.0-beta Thanks for sharing! We also need to update Wiki or some documents, don't we? http://wiki.apache.org/hadoop/HowToContribute On Thu, Aug 15, 2013 at 8:03 AM, Alejandro Abdelnur t...@cloudera.com wrote: Following up on this. HADOOP-9845 HADOOP-9872 have been committed to trunk/branch-2/branch-2.1-beta/branch-2.1.0-beta. All Hadoop developers must install protoc 2.5.0 in their development machines for the build to run. All Hadoop jenkins boxes are using protoc 2.5.0 The BUILDING.txt file has been updated to reflect that protoc 2.5.0 is the required one and includes instructions on how to use a different protoc from multiple local versions (using an ENV var). This may be handy for folks working with Hadoop versions using protoc 2.4.1. INTERIM SOLUTION IF YOU CANNOT UPGRADE TO PROTOC 2.5.0 IMMEDIATELY Use the following option with all your Maven commands '-Dprotobuf.version=2.4.1'. Note that this option will make the build use protoc and protobuf 2.4.1. Though you should upgrade to 2.5.0 at the earliest. As soon as we start using the new goodies from protobuf 2.5.0 (like the non-copy bytearrays) 2.4.1 will not work anymore. Thanks and apologies again for the noise through out this change. -- Alejandro -- - Tsuyoshi
Re: Cannot create a new Jira issue for MapReduce
Thanks Ted. Those are very good suggestions as backup solutions when JIRA is down. Besides alleviating the impact of JIRA downtime as you mentioned above, do we think of some way to keep JIRA system highly available? It is a little embarrassing that we deliver all kinds of HA systems to rest of world, but we suffering from this. :( - Original Message - From: Ted Yu yuzhih...@gmail.com To: mapreduce-dev@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org, common-...@hadoop.apache.org Sent: Sunday, August 12, 2012 12:17:36 PM Subject: Re: Cannot create a new Jira issue for MapReduce I made some suggestions to hbase dev mailing list a few weeks ago. The following suggestion is about hbase development which can be extrapolated to other Apache projects. People can continue discussion through dev mailing list when JIRA is down. When JIRA comes back up, transcript of such discussion can be posted back on related issues. Use of https://reviews.apache.org is encouraged. The review board wasn't affected by JIRA downtime. Running test suite by contributors and committers is encouraged which alleviates the burden on Hadoop QA. Goal for the above suggestions is for alleviating the impact of JIRA down time. BTW I have kept notifications from iss...@hbase.apache.org in my Inbox. This shows benefit when JIRA is down. Cheers On Sat, Aug 11, 2012 at 7:14 PM, Jun Ping Du j...@vmware.com wrote: Yes. I saw JIRA is in maintenance now and the schedule is as below: Host Name Service Entry Time Author Comment Start Time End TimeTypeDurationDowntime ID Trigger ID Actions ull.zones.apache.orgIssues - JIRA - General 2012-08-11 19:06:08 danielshMigrating to a different physical host 2012-08-11 19:06:08 2012-08-13 19:06:08 Fixed 2d 0h 0m 0s 1663N/A Delete/Cancel This Scheduled Downtime Entry Looks like it will take 2 days to migrate to a different host. As JIRA is a key component to dev process in community, do we think of some ways to lower the maintenance overhead? Thanks, Junping - Original Message - From: Steve Loughran steve.lough...@gmail.com To: mapreduce-dev@hadoop.apache.org Sent: Friday, August 10, 2012 7:33:04 AM Subject: Re: Cannot create a new Jira issue for MapReduce There's been disk problems w/ Jira recently. Githubs been playing up this morning to. Time to put away the dev tools and get powerpoint out instead On 9 August 2012 13:38, Robert Evans ev...@yahoo-inc.com wrote: It is a bit worse then that though. I found that it did create the JIRA, but it is in a bad state where you cannot put it in patch available or close it. So we may need to do some cleanup of these JIRAs later. --Bobby On 8/9/12 3:19 PM, Ted Yu yuzhih...@gmail.com wrote: This has been reported by HBase developers as well. See https://issues.apache.org/jira/browse/INFRA-5131 On Thu, Aug 9, 2012 at 1:10 PM, Benoy Antony bant...@gmail.com wrote: Hi, I am getting the following error when I try to create a Jira issue. Error creating issue: com.atlassian.jira.util.RuntimeIOException: java.io.IOException: read past EOF Anyone else face the same problem ? Thanks , Benoy
Re: Cannot create a new Jira issue for MapReduce
Yes. I saw JIRA is in maintenance now and the schedule is as below: Host Name Service Entry Time Author Comment Start Time End TimeTypeDurationDowntime ID Trigger ID Actions ull.zones.apache.orgIssues - JIRA - General 2012-08-11 19:06:08 danielshMigrating to a different physical host 2012-08-11 19:06:08 2012-08-13 19:06:08 Fixed 2d 0h 0m 0s 1663N/A Delete/Cancel This Scheduled Downtime Entry Looks like it will take 2 days to migrate to a different host. As JIRA is a key component to dev process in community, do we think of some ways to lower the maintenance overhead? Thanks, Junping - Original Message - From: Steve Loughran steve.lough...@gmail.com To: mapreduce-dev@hadoop.apache.org Sent: Friday, August 10, 2012 7:33:04 AM Subject: Re: Cannot create a new Jira issue for MapReduce There's been disk problems w/ Jira recently. Githubs been playing up this morning to. Time to put away the dev tools and get powerpoint out instead On 9 August 2012 13:38, Robert Evans ev...@yahoo-inc.com wrote: It is a bit worse then that though. I found that it did create the JIRA, but it is in a bad state where you cannot put it in patch available or close it. So we may need to do some cleanup of these JIRAs later. --Bobby On 8/9/12 3:19 PM, Ted Yu yuzhih...@gmail.com wrote: This has been reported by HBase developers as well. See https://issues.apache.org/jira/browse/INFRA-5131 On Thu, Aug 9, 2012 at 1:10 PM, Benoy Antony bant...@gmail.com wrote: Hi, I am getting the following error when I try to create a Jira issue. Error creating issue: com.atlassian.jira.util.RuntimeIOException: java.io.IOException: read past EOF Anyone else face the same problem ? Thanks , Benoy
Can someone review MAPREDUCE-4309 and MAPREDUCE-4310?
These two patches are for Hadoop Network Topology extension (YARN part) for virtualization environment. Thanks, Junping - Original Message - From: Jun Ping Du j...@vmware.com To: common-...@hadoop.apache.org, hdfs-...@hadoop.apache.org, mapreduce-dev@hadoop.apache.org Cc: Mark Pollack mpoll...@vmware.com, Jurgen Leschner jlesch...@vmware.com, Richard McDougall r...@vmware.com Sent: Monday, June 4, 2012 11:48:35 PM Subject: Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization. Hello Folks, I just filed a Umbrella jira today to address current NetworkTopology issue that binding strictly to three tier network. The motivation here is to make hadoop more flexible for deploying topology (especially for cloud/virtualization case) and more configurable in data locality related policies like: replica placement, task scheduling, choosing block for DFSClient reading, balancing. We submit a draft proposal in this Umbrella as well as the implementation code. As code base is large (~260K), the code is separated into 7 sub JIRA issues which seems to be more convenient for reviewing. However, we split the code based on functionality which cause some dependencies between patches which way we are not sure the best. Welcome to provide comments and suggestions on doc and code, and look forward to work with all of you to enhance hadoop in some new situations towards perfect. Hope this is a good start. Cheers, Junping - Original Message - From: Junping Du (JIRA) j...@apache.org To: common-iss...@hadoop.apache.org Sent: Monday, June 4, 2012 12:09:22 PM Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies Junping Du created HADOOP-8468: -- Summary: Umbrella of enhancements to support different failure and locality topologies Key: HADOOP-8468 URL: https://issues.apache.org/jira/browse/HADOOP-8468 Project: Hadoop Common Issue Type: Bug Components: ha, io Affects Versions: 2.0.0-alpha, 1.0.0 Reporter: Junping Du Assignee: Junping Du Priority: Critical The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided. 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth. Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: PreCommit-Admin not running
Move to dev alias, it seems to stop working since weekend. Thanks, Junping - Original Message - From: Kihwal Lee kih...@yahoo-inc.com To: gene...@hadoop.apache.org Sent: Tuesday, July 3, 2012 3:59:28 AM Subject: PreCommit-Admin not running It looks like the PreCommit-Admin build job is not running. Can anyone give it a gentle nudge? Kihwal
Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization.
Hello Folks, I just filed a Umbrella jira today to address current NetworkTopology issue that binding strictly to three tier network. The motivation here is to make hadoop more flexible for deploying topology (especially for cloud/virtualization case) and more configurable in data locality related policies like: replica placement, task scheduling, choosing block for DFSClient reading, balancing. We submit a draft proposal in this Umbrella as well as the implementation code. As code base is large (~260K), the code is separated into 7 sub JIRA issues which seems to be more convenient for reviewing. However, we split the code based on functionality which cause some dependencies between patches which way we are not sure the best. Welcome to provide comments and suggestions on doc and code, and look forward to work with all of you to enhance hadoop in some new situations towards perfect. Hope this is a good start. Cheers, Junping - Original Message - From: Junping Du (JIRA) j...@apache.org To: common-iss...@hadoop.apache.org Sent: Monday, June 4, 2012 12:09:22 PM Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies Junping Du created HADOOP-8468: -- Summary: Umbrella of enhancements to support different failure and locality topologies Key: HADOOP-8468 URL: https://issues.apache.org/jira/browse/HADOOP-8468 Project: Hadoop Common Issue Type: Bug Components: ha, io Affects Versions: 2.0.0-alpha, 1.0.0 Reporter: Junping Du Assignee: Junping Du Priority: Critical The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided. 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth. Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira