Re: Branch merges and 3.0.0-beta1 scope
We should avoid turning this into a replay of Apache Hadoop 2.6.0 (and to a lesser degree, 2.7.0 and 2.8.0) where a bunch of last minute “experimental” features derail stability for a significantly long period of time. - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Map reduce sample program
On 8/19/17 3:28 AM, Remil Mohanan wrote: I am trying to pass multiple non key values from mapper to reducer. The only way to pass data from the mapper to the reducer is through passing key-values. One common trick is to designate a special key as the out-of-band information key and then use a custom sorting comparator to make sure that key comes first in the sort order. I'm sure you can find examples online. Similarly for reading and writing a file inside the hdfs system other than normal read and write. I don't understand. Reading and writing a file in HDFS from an MR task works exactly the same as doing it from a stand-alone program. You probably want to do it in the setup() method, though. Daniel - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Branch merges and 3.0.0-beta1 scope
On 8/22/17 3:20 AM, Steve Loughran wrote: On 21 Aug 2017, at 22:22, Vinod Kumar Vavilapalli wrote: Steve, You can be strict & ruthless about the timelines. Anything that doesn’t get in by mid-September, as was originally planned, can move to the next release - whether it is feature work on branches or feature work on trunk. The problem I see here is that code & branches being worked on for a year are now (apparently) close to being done and we are telling them to hold for 7 more months - this is not a reasonable ask.. If you are advocating for a 3.1 plan, I’m sure one of these branch ‘owners’ can volunteer. But this is how you get competing releases and split bandwidth. As for compatibility / testing etc, it seems like there is a belief that the current ‘scoped’ features are all tested well in these areas and so adding more is going to hurt the release. There is no way this is the reality, trunk has so many features that have been landing for years, the only way we can collectively attempt towards making this stable is by getting as many parties together as possible, each verifying stuff that they need. Not by excluding specific features. If everyone is confident & its coming together, it does make sense. I think those of us (myself included) who are merging stuff in do have to recognise that we really need to follow it through by being responsive to any problem -and with the release manager having the right to pull things out if its felt to be significantly threatening the stability of the final 3.0 release. I think we should also consider making the 3.0 beta the feature freeze; after that fixes on the features go in, but nothing else of significance, otherwise the value of the beta "test this code more broadly" is diminoshed At this point, there have been three planned alphas from September 2016 until July 2017 to "get in features". While a couple of upcoming features are "a few weeks" away, I think all of us are aware how predictable software development schedules can be. I think we can also all agree that rushing just to meet a release deadline isn't the best practice when it comes to software development either. Andrew has been very clear about his goals at each step and I think Wangda's willingness to not rush in resource types was an appropriate response. I'm sympathetic to the goals of getting in a feature for 3.0, but it might be a good idea for each project that is a "few weeks away" to seriously look at the readiness compared to the features which have been testing for 6+ months already. -Ray - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: [VOTE] Merge feature branch YARN-5355 (Timeline Service v2) to trunk
Such a great community effort - hats off, team! Thanks +Vinod > On Aug 21, 2017, at 11:32 PM, Vrushali Channapattan > wrote: > > Hi folks, > > Per earlier discussion [1], I'd like to start a formal vote to merge > feature branch YARN-5355 [2] (Timeline Service v.2) to trunk. The vote will > run for 7 days, and will end August 29 11:00 PM PDT. > > We have previously completed one merge onto trunk [3] and Timeline Service > v2 has been part of Hadoop release 3.0.0-alpha1. > > Since then, we have been working on extending the capabilities of Timeline > Service v2 in a feature branch [2] for a while, and we are reasonably > confident that the state of the feature meets the criteria to be merged > onto trunk and we'd love folks to get their hands on it in a test capacity > and provide valuable feedback so that we can make it production-ready. > > In a nutshell, Timeline Service v.2 delivers significant scalability and > usability improvements based on a new architecture. What we would like to > merge to trunk is termed "alpha 2" (milestone 2). The feature has a > complete end-to-end read/write flow with security and read level > authorization via whitelists. You should be able to start setting it up and > testing it. > > At a high level, the following are the key features that have been > implemented since alpha1: > - Security via Kerberos Authentication and delegation tokens > - Read side simple authorization via whitelist > - Client configurable entity sort ordering > - Richer REST APIs for apps, app attempts, containers, fetching metrics by > timerange, pagination, sub-app entities > - Support for storing sub-application entities (entities that exist outside > the scope of an application) > - Configurable TTLs (time-to-live) for tables, configurable table prefixes, > configurable hbase cluster > - Flow level aggregations done as dynamic (table level) coprocessors > - Uses latest stable HBase release 1.2.6 > > There are a total of 82 subtasks that were completed as part of this effort. > > We paid close attention to ensure that once disabled Timeline Service v.2 > does not impact existing functionality when disabled (by default). > > Special thanks to a team of folks who worked hard and contributed towards > this effort with patches, reviews and guidance: Rohith Sharma K S, Varun > Saxena, Haibo Chen, Sangjin Lee, Li Lu, Vinod Kumar Vavilapalli, Joep > Rottinghuis, Jason Lowe, Jian He, Robert Kanter, Micheal Stack. > > Regards, > Vrushali > > [1] http://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg27383.html > [2] https://issues.apache.org/jira/browse/YARN-5355 > [3] https://issues.apache.org/jira/browse/YARN-2928 > [4] https://github.com/apache/hadoop/commits/YARN-5355 - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/ [Aug 21, 2017 6:08:38 PM] (manojpec) HDFS-11988. Verify HDFS Snapshots with open files captured are [Aug 21, 2017 6:48:51 PM] (arp) HDFS-12325. SFTPFileSystem operations should restore cwd. Contributed by [Aug 21, 2017 8:45:30 PM] (jzhuge) HDFS-11738. Hedged pread takes more time when block moved from initial [Aug 22, 2017 5:43:08 AM] (Arun Suresh) YARN-5603. Metrics for Federation StateStore. (Ellen Hui via asuresh) [Aug 22, 2017 5:50:24 AM] (Arun Suresh) YARN-6923. Metrics for Federation Router. (Giovanni Matteo Fumarola via -1 overall The following subsystems voted -1: findbugs unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Hard coded reference to an absolute pathname in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(ContainerRuntimeContext) At DockerLinuxContainerRuntime.java:absolute pathname in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(ContainerRuntimeContext) At DockerLinuxContainerRuntime.java:[line 490] Failed junit tests : hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration hadoop.hdfs.TestReconstructStripedFile hadoop.hdfs.server.datanode.TestDirectoryScanner hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency hadoop.hdfs.server.namenode.TestDecommissioningStatus hadoop.hdfs.TestPread hadoop.hdfs.server.datanode.TestDataNodeUUID hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation hadoop.yarn.sls.appmaster.TestAMSimulator hadoop.yarn.sls.nodemanager.TestNMSimulator Timed out junit tests : org.apache.hadoop.hdfs.TestLeaseRecovery2 org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-compile-javac-root.txt [296K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-checkstyle-root.txt [17M] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-patch-pylint.txt [20K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-patch-shelldocs.txt [12K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/whitespace-eol.txt [11M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/whitespace-tabs.txt [1.2M] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/diff-javadoc-javadoc-root.txt [1.9M] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [672K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt [64K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/500/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt [16K] Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Branch merges and 3.0.0-beta1 scope
> On 21 Aug 2017, at 22:22, Vinod Kumar Vavilapalli wrote: > > Steve, > > You can be strict & ruthless about the timelines. Anything that doesn’t get > in by mid-September, as was originally planned, can move to the next release > - whether it is feature work on branches or feature work on trunk. > > The problem I see here is that code & branches being worked on for a year are > now (apparently) close to being done and we are telling them to hold for 7 > more months - this is not a reasonable ask.. > > If you are advocating for a 3.1 plan, I’m sure one of these branch ‘owners’ > can volunteer. But this is how you get competing releases and split bandwidth. > > As for compatibility / testing etc, it seems like there is a belief that the > current ‘scoped’ features are all tested well in these areas and so adding > more is going to hurt the release. There is no way this is the reality, trunk > has so many features that have been landing for years, the only way we can > collectively attempt towards making this stable is by getting as many parties > together as possible, each verifying stuff that they need. Not by excluding > specific features. > If everyone is confident & its coming together, it does make sense. I think those of us (myself included) who are merging stuff in do have to recognise that we really need to follow it through by being responsive to any problem -and with the release manager having the right to pull things out if its felt to be significantly threatening the stability of the final 3.0 release. I think we should also consider making the 3.0 beta the feature freeze; after that fixes on the features go in, but nothing else of significance, otherwise the value of the beta "test this code more broadly" is diminoshed -steve - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org