Re: [DISCUSS] Making 2.10 the last minor 2.x release
Hey guys, I think we diverged a bit from the initial topic of this discussion, which is removing branch-2.10, and changing the version of branch-2 from 2.11.0-SNAPSHOT to 2.10.1-SNAPSHOT. Sounds like the subject line for this thread "Making 2.10 the last minor 2.x release" confused people. It is in fact a wider matter that can be discussed when somebody actually proposes to release 2.11, which I understand nobody does at the moment. So if anybody objects removing branch-2.10 please make an argument. Otherwise we should go ahead and just do it next week. I see people still struggling to keep branch-2 and branch-2.10 in sync. Thanks, --Konstantin On Thu, Nov 21, 2019 at 3:49 PM Jonathan Hung wrote: > Thanks for the detailed thoughts, everyone. > > Eric (Badger), my understanding is the same as yours re. minor vs patch > releases. As for putting features into minor/patch releases, if we keep the > convention of putting new features only into minor releases, my assumption > is still that it's unlikely people will want to get them into branch-2 > (based on the 2.10.0 release process). For the java 11 issue, we haven't > even really removed support for java 7 in branch-2 (much less java 8), so I > feel moving to java 11 would go along with a move to branch 3. And as you > mentioned, if people really want to use java 11 on branch-2, we can always > revive branch-2. But for now I think the convenience of not needing to port > to both branch-2 and branch-2.10 (and below) outweighs the cost of > potentially needing to revive branch-2. > > Jonathan Hung > > > On Wed, Nov 20, 2019 at 10:50 AM Eric Yang wrote: > >> +1 for 2.10.x as last release for 2.x version. >> >> Software would become more compatible when more companies stress test the >> same software and making improvements in trunk. Some may be extra caution >> on moving up the version because obligation internally to keep things >> running. Company obligation should not be the driving force to maintain >> Hadoop branches. There is no proper collaboration in the community when >> every name brand company maintains its own Hadoop 2.x version. I think it >> would be more healthy for the community to reduce the branch forking and >> spend energy on trunk to harden the software. This will give more >> confidence to move up the version than trying to fix n permutations >> breakage like Flash fixing the timeline. >> >> Apache license stated, there is no warranty of any kind for code >> contributions. Fewer community release process should improve software >> quality when eyes are on trunk, and help steering toward the same end goals. >> >> regards, >> Eric >> >> >> >> On Tue, Nov 19, 2019 at 3:03 PM Eric Badger >> wrote: >> >>> Hello all, >>> >>> Is it written anywhere what the difference is between a minor release >>> and a >>> point/dot/maintenance (I'll use "point" from here on out) release? I have >>> looked around and I can't find anything other than some compatibility >>> documentation in 2.x that has since been removed in 3.x [1] [2]. I think >>> this would help shape my opinion on whether or not to keep branch-2 >>> alive. >>> My current understanding is that we can't really break compatibility in >>> either a minor or point release. But the only mention of the difference >>> between minor and point releases is how to deal with Stable, Evolving, >>> and >>> Unstable tags, and how to deal with changing default configuration >>> values. >>> So it seems like there really isn't a big official difference between the >>> two. In my mind, the functional difference between the two is that the >>> minor releases may have added features and rewrites, while the point >>> releases only have bug fixes. This might be an incorrect understanding, >>> but >>> that's what I have gathered from watching the releases over the last few >>> years. Whether or not this is a correct understanding, I think that this >>> needs to be documented somewhere, even if it is just a convention. >>> >>> Given my assumed understanding of minor vs point releases, here are the >>> pros/cons that I can think of for having a branch-2. Please add on or >>> correct me for anything you feel is missing or inadequate. >>> Pros: >>> - Features/rewrites/higher-risk patches are less likely to be put into >>> 2.10.x >>> - It is less necessary to move to 3.x >>> >>> Cons: >>> - Bug fixes are less likely to be put into 2.10.x >>> - An extra branch to maintain >>> - Committers have an extra branch (5 vs 4 total branches) to commit >>> patches to if they should go all the way back to 2.10.x >>> - It is less necessary to move to 3.x >>> >>> So on the one hand you get added stability in fewer features being >>> committed to 2.10.x, but then on the other you get fewer bug fixes being >>> committed. In a perfect world, we wouldn't have to make this tradeoff. >>> But >>> we don't live in a perfect world and committers will make mistakes either >>> because of lack of knowledge or simply because they made
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1333/ [Nov 26, 2019 12:41:41 PM] (snemeth) YARN-9937. addendum: Add missing queue configs in [Nov 26, 2019 3:36:19 PM] (github) HADOOP-16709. S3Guard: Make authoritative mode exclusive for metadata - [Nov 26, 2019 3:42:59 PM] (snemeth) YARN-9444. YARN API ResourceUtils's getRequestedResourcesFromConfig [Nov 26, 2019 7:11:26 PM] (weichiu) HADOOP-16685: FileSystem#listStatusIterator does not check if given path [Nov 26, 2019 8:22:35 PM] (snemeth) YARN-9899. Migration tool that help to generate CS config based on FS [Nov 26, 2019 8:29:12 PM] (prabhujoseph) YARN-9991. Fix Application Tag prefix to userid. Contributed by Szilard [Nov 26, 2019 8:45:12 PM] (snemeth) YARN-9362. Code cleanup in TestNMLeveldbStateStoreService. Contributed [Nov 26, 2019 9:04:07 PM] (snemeth) YARN-9290. Invalid SchedulingRequest not rejected in Scheduler -1 overall The following subsystems voted -1: asflicense findbugs pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-mawo/hadoop-yarn-applications-mawo-core Class org.apache.hadoop.applications.mawo.server.common.TaskStatus implements Cloneable but does not define or use clone method At TaskStatus.java:does not define or use clone method At TaskStatus.java:[lines 39-346] Equals method for org.apache.hadoop.applications.mawo.server.worker.WorkerId assumes the argument is of type WorkerId At WorkerId.java:the argument is of type WorkerId At WorkerId.java:[line 114] org.apache.hadoop.applications.mawo.server.worker.WorkerId.equals(Object) does not check for null argument At WorkerId.java:null argument At WorkerId.java:[lines 114-115] FindBugs : module:hadoop-cloud-storage-project/hadoop-cos Redundant nullcheck of dir, which is known to be non-null in org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at BufferPool.java:is known to be non-null in org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at BufferPool.java:[line 66] org.apache.hadoop.fs.cosn.CosNInputStream$ReadBuffer.getBuffer() may expose internal representation by returning CosNInputStream$ReadBuffer.buffer At CosNInputStream.java:by returning CosNInputStream$ReadBuffer.buffer At CosNInputStream.java:[line 87] Found reliance on default encoding in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, byte[]):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, byte[]): new String(byte[]) At CosNativeFileSystemStore.java:[line 199] Found reliance on default encoding in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, InputStream, byte[], long):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, InputStream, byte[], long): new String(byte[]) At CosNativeFileSystemStore.java:[line 178] org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.uploadPart(File, String, String, int) may fail to clean up java.io.InputStream Obligation to clean up resource created at CosNativeFileSystemStore.java:fail to clean up java.io.InputStream Obligation to clean up resource created at CosNativeFileSystemStore.java:[line 252] is not discharged Failed junit tests : hadoop.hdfs.server.balancer.TestBalancer hadoop.hdfs.server.namenode.TestNamenodeCapacityReport hadoop.hdfs.server.namenode.TestRedudantBlocks hadoop.hdfs.tools.TestDFSZKFailoverController hadoop.hdfs.server.federation.router.TestRouterFaultTolerant hadoop.yarn.server.webproxy.amfilter.TestAmFilter hadoop.yarn.server.webproxy.TestWebA
Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/ [Nov 27, 2019 12:46:38 AM] (xkrogen) HDFS-14973. More strictly enforce Balancer/Mover/SPS throttling of -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:[line 335] Failed junit tests : hadoop.util.TestReadWriteDiskValidator hadoop.fs.sftp.TestSFTPFileSystem hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.registry.secure.TestSecureLogins hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 hadoop.yarn.client.api.impl.TestAMRMClient cc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt [328K] cc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/diff-compile-cc-root-jdk1.8.0_222.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/diff-compile-javac-root-jdk1.8.0_222.txt [308K] checkstyle: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/diff-checkstyle-root.txt [16M] hadolint: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/diff-patch-shellcheck.txt [72K] shelldocs: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/diff-patch-shelldocs.txt [8.0K] whitespace: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/whitespace-eol.txt [12M] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/whitespace-tabs.txt [1.3M] xml: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/xml.txt [12K] findbugs: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt [16K] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_222.txt [1.1M] unit: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [168K] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [324K] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/518/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.tx
[jira] [Created] (MAPREDUCE-7249) Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP cause job
Wilfred Spiegelenburg created MAPREDUCE-7249: Summary: Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP cause job Key: MAPREDUCE-7249 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7249 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 3.1.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Same issue as in MAPREDUCE-7240 but this one has a different state in which the Exception {{TA_TOO_MANY_FETCH_FAILURE}} event is received: {code} 2019-11-18 23:03:40,270 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1568654141590_630203_m_003108_1 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1183) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1388) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1380) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) {code} The stack trace is from a CDH release which is highly patched 2.6 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Reopened] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error
[ https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko reopened MAPREDUCE-7240: - Reopening it to attach patches for branch-3.2 and branch-3.1. > Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at > SUCCESS_FINISHING_CONTAINER' cause job error > > > Key: MAPREDUCE-7240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.2 >Reporter: luhuachao >Assignee: luhuachao >Priority: Critical > Labels: Reviewed, applicationmaster, mrv2 > Fix For: 3.3.0 > > Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, > application_1566552310686_260041.log > > > *log in appmaster* > {noformat} > 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_52_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_49_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_51_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_50_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_53_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to > FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454 > 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_49_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_51_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.