[jira] [Comment Edited] (MAPREDUCE-6638) Do not attempt to recover jobs if encrypted spill is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542845#comment-15542845 ] Hitesh Shah edited comment on MAPREDUCE-6638 at 10/3/16 4:51 PM: - Havent looked at the patch in detail but [~haibochen]'s clarifying comments make sense. Jira title could be modified accordingly. +0 from my side. was (Author: hitesh): Havent looked at the patch in detail but [~haibochen]'s clarifying comments make sense. +0 from my side. > Do not attempt to recover jobs if encrypted spill is enabled > > > Key: MAPREDUCE-6638 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6638 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 2.7.2 >Reporter: Karthik Kambatla >Assignee: Haibo Chen > Attachments: mapreduce6638.001.patch, mapreduce6638.002.patch, > mapreduce6638.003.patch, mapreduce6638.004.patch, mapreduce6683.005.patch > > > Post the fix to CVE-2015-1776, jobs with ecrypted spills enabled cannot be > recovered if the AM fails. We should store the key some place safe so they > can actually be recovered. If there is no "safe" place, at least we should > restart the job by re-running all mappers/reducers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6638) Do not attempt to recover jobs if encrypted spill is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542845#comment-15542845 ] Hitesh Shah commented on MAPREDUCE-6638: Havent looked at the patch in detail but [~haibochen]'s clarifying comments make sense. +0 from my side. > Do not attempt to recover jobs if encrypted spill is enabled > > > Key: MAPREDUCE-6638 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6638 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 2.7.2 >Reporter: Karthik Kambatla >Assignee: Haibo Chen > Attachments: mapreduce6638.001.patch, mapreduce6638.002.patch, > mapreduce6638.003.patch, mapreduce6638.004.patch, mapreduce6683.005.patch > > > Post the fix to CVE-2015-1776, jobs with ecrypted spills enabled cannot be > recovered if the AM fails. We should store the key some place safe so they > can actually be recovered. If there is no "safe" place, at least we should > restart the job by re-running all mappers/reducers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6776) yarn.app.mapreduce.client.job.max-retries should have a more useful default
[ https://issues.apache.org/jira/browse/MAPREDUCE-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15537287#comment-15537287 ] Hitesh Shah commented on MAPREDUCE-6776: FWIW, I do agree that this is a useful behavioral change that makes sense to push to branch-2 but might be better to call it out as incompatible but at the same release note it carefully to indicate that it will improve user experience and not have any detrimental impact apart from the retry delay in some edge cases. > yarn.app.mapreduce.client.job.max-retries should have a more useful default > --- > > Key: MAPREDUCE-6776 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6776 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Miklos Szegedi > Attachments: MAPREDUCE-6776.001.patch, MAPREDUCE-6776.002.patch, > MAPREDUCE-6776.003.patch > > > The default is 0, so any communication failure results in a client failure. > Oozie doesn't like that. If the RM is failing over and Oozie gets a > communication failure, it assumes the target job has failed. I propose > raising the default to something modest like 3 or 5. The default retry > interval is 2s. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6776) yarn.app.mapreduce.client.job.max-retries should have a more useful default
[ https://issues.apache.org/jira/browse/MAPREDUCE-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15537275#comment-15537275 ] Hitesh Shah commented on MAPREDUCE-6776: >From a practical sense, this is not really an incompatible change as there is >some internal behavioral aspects that are being changed to retry 3 times >instead of no retries. However, from a pure theoretical compat perspective, a public default value is being changed as well as the value in mapred-default.xml. Tests which might be earlier doing some verification would expect immediate failures whereas now it might be reconnect or fail after 6 seconds or so. I suggest pushing this to trunk for sure as we are still in the alpha stage of releases. As for branch-2, I would check with the 2.8 release manager. > yarn.app.mapreduce.client.job.max-retries should have a more useful default > --- > > Key: MAPREDUCE-6776 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6776 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Miklos Szegedi > Attachments: MAPREDUCE-6776.001.patch, MAPREDUCE-6776.002.patch, > MAPREDUCE-6776.003.patch > > > The default is 0, so any communication failure results in a client failure. > Oozie doesn't like that. If the RM is failing over and Oozie gets a > communication failure, it assumes the target job has failed. I propose > raising the default to something modest like 3 or 5. The default retry > interval is 2s. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6638) Do not attempt to recover jobs if encrypted spill is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514897#comment-15514897 ] Hitesh Shah edited comment on MAPREDUCE-6638 at 9/23/16 12:17 AM: -- bq. (1) Avoid recovering an AM if encrypted spill is enabled Encrypted spill w.r.t recovery is not the same as a committer not supporting recovery. Any reason we cannot just re-run the job from scratch if all reducers have not completed ( or re-run all maps and incomplete reducers )? Ideally speaking, you could just re-run most of the job tasks again if needed to support proper fault tolerance even in scenarios where the key cannot be stored securely. In this scenario, the new AM can generate a new key. I would agree that this might not be a performant solution but it atleast solves the problem of not having the user to re-submit the job. If performance is an issue, users can turn off recovery when encryption is enabled for scenarios where the key cannot be stored securely. was (Author: hitesh): bq. (1) Avoid recovering an AM if encrypted spill is enabled Encrypted spill w.r.t recovery is not the same as a committer not supporting recovery. Any reason we cannot just re-run the job from scratch if all reducers have not completed? Ideally speaking, you could just re-run most of the job tasks again if needed to support proper fault tolerance even in scenarios where the key cannot be stored securely. In this scenario, the new AM can generate a new key. I would agree that this might not be a performant solution but it atleast solves the problem of not having the user to re-submit the job. If performance is an issue, users can turn off recovery when encryption is enabled for scenarios where the key cannot be stored securely. > Do not attempt to recover jobs if encrypted spill is enabled > > > Key: MAPREDUCE-6638 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6638 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 2.7.2 >Reporter: Karthik Kambatla >Assignee: Haibo Chen > Attachments: mapreduce6638.001.patch, mapreduce6638.002.patch, > mapreduce6638.003.patch, mapreduce6638.004.patch, mapreduce6683.005.patch > > > Post the fix to CVE-2015-1776, jobs with ecrypted spills enabled cannot be > recovered if the AM fails. We should store the key some place safe so they > can actually be recovered. If there is no "safe" place, at least we should > restart the job by re-running all mappers/reducers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6638) Do not attempt to recover jobs if encrypted spill is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514897#comment-15514897 ] Hitesh Shah commented on MAPREDUCE-6638: bq. (1) Avoid recovering an AM if encrypted spill is enabled Encrypted spill w.r.t recovery is not the same as a committer not supporting recovery. Any reason we cannot just re-run the job from scratch if all reducers have not completed? Ideally speaking, you could just re-run most of the job tasks again if needed to support proper fault tolerance even in scenarios where the key cannot be stored securely. In this scenario, the new AM can generate a new key. I would agree that this might not be a performant solution but it atleast solves the problem of not having the user to re-submit the job. If performance is an issue, users can turn off recovery when encryption is enabled for scenarios where the key cannot be stored securely. > Do not attempt to recover jobs if encrypted spill is enabled > > > Key: MAPREDUCE-6638 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6638 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 2.7.2 >Reporter: Karthik Kambatla >Assignee: Haibo Chen > Attachments: mapreduce6638.001.patch, mapreduce6638.002.patch, > mapreduce6638.003.patch, mapreduce6638.004.patch, mapreduce6683.005.patch > > > Post the fix to CVE-2015-1776, jobs with ecrypted spills enabled cannot be > recovered if the AM fails. We should store the key some place safe so they > can actually be recovered. If there is no "safe" place, at least we should > restart the job by re-running all mappers/reducers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491425#comment-15491425 ] Hitesh Shah commented on MAPREDUCE-6484: [~asuresh] thanks for the pointer. Any reason why MR does not use that function in that case? > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClie
[jira] [Commented] (MAPREDUCE-6776) yarn.app.mapreduce.client.job.max-retries should have a more useful default
[ https://issues.apache.org/jira/browse/MAPREDUCE-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485365#comment-15485365 ] Hitesh Shah commented on MAPREDUCE-6776: Changing this in 2.x would be an incompatible change. > yarn.app.mapreduce.client.job.max-retries should have a more useful default > --- > > Key: MAPREDUCE-6776 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6776 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > The default is 0, so any communication results in a client failure. Oozie > doesn't like that. If the RM is failing over and Oozie gets a communication > failure, it assumes the target job has failed. I propose raising the default > to something modest like 3 or 5. The default retry interval is 2s. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478696#comment-15478696 ] Hitesh Shah commented on MAPREDUCE-6484: [~asuresh] [~zxu] It seems like the getMasterAddress() functionality ideally belongs in YARN and not in MR so that other applications that make use of YARN can always leverage the same functionality. Would you agree? > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.re
[jira] [Commented] (MAPREDUCE-6062) Use TestDFSIO test random read : job failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299224#comment-15299224 ] Hitesh Shah commented on MAPREDUCE-6062: [~tfukudom] please hit "submit patch" to trigger the pre-commit build. https://wiki.apache.org/hadoop/HowToContribute has more info on dos and donts when contributing patches. In this case, I will defer to someone who has been looking at MR code in more recent times to do a review. If you do not see any updates on the jira within the next couple of days, please feel free to drop a polite email on the mapreduce-dev list asking for review help. > Use TestDFSIO test random read : job failed > --- > > Key: MAPREDUCE-6062 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6062 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: benchmarks >Affects Versions: 2.2.0 > Environment: command : hadoop jar $JAR_PATH TestDFSIO-read -random > -nrFiles 12 -size 8000 >Reporter: chongyuanhuang >Assignee: Takuya Fukudome > Attachments: MAPREDUCE-6062.patch > > > This is log: > 2014-09-01 13:57:29,876 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.IllegalArgumentException: n must be > positive > at java.util.Random.nextInt(Random.java:300) > at > org.apache.hadoop.fs.TestDFSIO$RandomReadMapper.nextOffset(TestDFSIO.java:601) > at > org.apache.hadoop.fs.TestDFSIO$RandomReadMapper.doIO(TestDFSIO.java:580) > at > org.apache.hadoop.fs.TestDFSIO$RandomReadMapper.doIO(TestDFSIO.java:546) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:134) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:37) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > 2014-09-01 13:57:29,886 INFO [main] org.apache.hadoop.mapred.Task: Runnning > cleanup for the task > 2014-09-01 13:57:29,894 WARN [main] > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete > hdfs://m101:8020/benchmarks/TestDFSIO/io_random_read/_temporary/1/_temporary/attempt_1409538816633_0005_m_01_0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-6062) Use TestDFSIO test random read : job failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reassigned MAPREDUCE-6062: -- Assignee: Takuya Fukudome [~tfukudom] added you to MR contributors. Hopefully this should get you unblocked. > Use TestDFSIO test random read : job failed > --- > > Key: MAPREDUCE-6062 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6062 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: benchmarks >Affects Versions: 2.2.0 > Environment: command : hadoop jar $JAR_PATH TestDFSIO-read -random > -nrFiles 12 -size 8000 >Reporter: chongyuanhuang >Assignee: Takuya Fukudome > > This is log: > 2014-09-01 13:57:29,876 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.IllegalArgumentException: n must be > positive > at java.util.Random.nextInt(Random.java:300) > at > org.apache.hadoop.fs.TestDFSIO$RandomReadMapper.nextOffset(TestDFSIO.java:601) > at > org.apache.hadoop.fs.TestDFSIO$RandomReadMapper.doIO(TestDFSIO.java:580) > at > org.apache.hadoop.fs.TestDFSIO$RandomReadMapper.doIO(TestDFSIO.java:546) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:134) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:37) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > 2014-09-01 13:57:29,886 INFO [main] org.apache.hadoop.mapred.Task: Runnning > cleanup for the task > 2014-09-01 13:57:29,894 WARN [main] > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete > hdfs://m101:8020/benchmarks/TestDFSIO/io_random_read/_temporary/1/_temporary/attempt_1409538816633_0005_m_01_0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically
[ https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222592#comment-14222592 ] Hitesh Shah commented on MAPREDUCE-5785: bq. I think we should commit this to branch-2 as well This change is incompatible especially as it modifies mapred-default.xml. Not sure why it would be committed to branch-2. > Derive heap size or mapreduce.*.memory.mb automatically > --- > > Key: MAPREDUCE-5785 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mr-am, task >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Fix For: 3.0.0 > > Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, > MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch > > > Currently users have to set 2 memory-related configs per Job / per task type. > One first chooses some container size map reduce.\*.memory.mb and then a > corresponding maximum Java heap size Xmx < map reduce.\*.memory.mb. This > makes sure that the JVM's C-heap (native memory + Java heap) does not exceed > this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be > - allocating big containers whereas the JVM will only use the default > -Xmx200m. > - allocating small containers that will OOM because Xmx is too high. > With this JIRA, we propose to set Xmx automatically based on an empirical > ratio that can be adjusted. Xmx is not changed automatically if provided by > the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry
[ https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057866#comment-14057866 ] Hitesh Shah commented on MAPREDUCE-5956: To add to [~mayank_bansal]'s comment, this is the 4th ( and last ) attempt of the AM and there have been no preemptions. > MapReduce AM should not use maxAttempts to determine if this is the last retry > -- > > Key: MAPREDUCE-5956 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: applicationmaster, mrv2 >Affects Versions: 2.4.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Wangda Tan >Priority: Blocker > Attachments: MR-5956.patch > > > Found this while reviewing YARN-2074. The problem is that after YARN-2074, we > don't count AM preemption towards AM failures on RM side, but MapReduce AM > itself checks the attempt id against the max-attempt count to determine if > this is the last attempt. > {code} > public void computeIsLastAMRetry() { > isLastAMRetry = appAttemptID.getAttemptId() >= maxAppAttempts; > } > {code} > This causes issues w.r.t deletion of staging directory etc.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry
[ https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055396#comment-14055396 ] Hitesh Shah commented on MAPREDUCE-5956: [~vinodkv] By definition, if an AM calls unregister, it is telling the RM that this is my last attempt and the app should not be retried. Are now you saying that all attempts should now call unregisterAttempt() which will tell the app whether it is the final attempt and should call a final unregister()? If not, I think something else is needed as an AM will only call unregister() on an error if it thinks it is the last attempt. > MapReduce AM should not use maxAttempts to determine if this is the last retry > -- > > Key: MAPREDUCE-5956 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: applicationmaster, mrv2 >Reporter: Vinod Kumar Vavilapalli >Assignee: Wangda Tan >Priority: Blocker > > Found this while reviewing YARN-2074. The problem is that after YARN-2074, we > don't count AM preemption towards AM failures on RM side, but MapReduce AM > itself checks the attempt id against the max-attempt count to determine if > this is the last attempt. > {code} > public void computeIsLastAMRetry() { > isLastAMRetry = appAttemptID.getAttemptId() >= maxAppAttempts; > } > {code} > This causes issues w.r.t deletion of staging directory etc.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5696) Add Localization counters to MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856997#comment-13856997 ] Hitesh Shah commented on MAPREDUCE-5696: The introduction of localization counters in the env is akin to introducing a new API in YARN. Could you split this jira out into 2. One in YARN for the YARN changes where the new API/interface is introduced and this jira could be leveraged for the MR specific changes. > Add Localization counters to MR > --- > > Key: MAPREDUCE-5696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: LocalizationCounters.png, MAPREDUCE-5696.v01.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of user-visible metrics. The purpose of this JIRA is to > compliment YARN-1529. While YARN-1529 attempts to provide a cluster-wide view > to cluster admins, this JIRA focuses on exposing the localization overhead on > per-job basis to the job owner/user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838320#comment-13838320 ] Hitesh Shah commented on MAPREDUCE-5487: A little bit late on this. Did anyone look into how this affects jobs where a user modifies the counter limit to be higher than the cluster configured value and what happens in the case where the jobhistory server is configured with a limit less than the user supplied limit? > In task processes, JobConf is unnecessarily loaded again in Limits > -- > > Key: MAPREDUCE-5487 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance, task >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.4.0 > > Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch > > > Limits statically loads a JobConf, which incurs costs of reading files from > disk and parsing XML. The contents of this JobConf are identical to the one > loaded by YarnChild (before adding job.xml as a resource). Allowing Limits > to initialize with the JobConf loaded in YarnChild would reduce task startup > time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (MAPREDUCE-5633) Can Hadoop use multi-cores of a processor under single machine
[ https://issues.apache.org/jira/browse/MAPREDUCE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved MAPREDUCE-5633. Resolution: Invalid Please ask such questions on the user list. http://hadoop.apache.org/mailing_lists.html#User > Can Hadoop use multi-cores of a processor under single machine > -- > > Key: MAPREDUCE-5633 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5633 > Project: Hadoop Map/Reduce > Issue Type: Task >Reporter: Asif > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1378#comment-1378 ] Hitesh Shah commented on MAPREDUCE-4421: [~jlowe] Thanks for the clarification. I believe the performance issues should hold regardless of any filesystem implementation used as long as the distributed cache layer ends up correctly interpreting the permissions to the appropriate LocalResource visibility. +1. Latest patch looks good to me. Let me know if you are waiting on anyone else to chime in on this. If not, please feel free to go ahead and commit or I shall commit later today. > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421-2.patch, MAPREDUCE-4421-3.patch, > MAPREDUCE-4421-4.patch, MAPREDUCE-4421.patch, MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782438#comment-13782438 ] Hitesh Shah commented on MAPREDUCE-4421: Sorry for the delay in the review. Regarding addMRFrameworkToDistributedCache() - one minor question: the code allows for a non-qualified URI. Should we enforce provision of a fully-qualified path always? Minor nit: I believe there should be nothing in the implementation that requires HDFS as the storage for the MR tarball? Documentation needs to change as a result unless you believe there are reasons for not mentioning other filesystems ( except maybe from a testing point of view )? Patch looks good otherwise. Thanks for adding the detailed docs. > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421-2.patch, MAPREDUCE-4421.patch, > MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773326#comment-13773326 ] Hitesh Shah commented on MAPREDUCE-4421: [~jlowe] Thanks for the detailed answers to my queries. I believe this initial patch is a good start to making MR a user-land library. As it stands, it provides the additional flexibility which can be used by anyone to deploy MR with either the full tarball or a mix-match approach. Though it might be good to have some documentation on the 2 possible approaches ( full tarball vs MR tarball ) and explain how the classpath should be setup. Depending on your viewpoint, the classpath-to-hdfs path mapping - whether it comes in from an additional file on HDFS could be considered in a follow-up jira if others believe this is a better solution. The one thing to change in the patch is the documentation for mapreduce.application.framework.path - it does not mention the use of the URI fragment and how that interacts with the configured classpath. Could you file a follow-up jira for the config handling? > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421.patch, MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765961#comment-13765961 ] Hitesh Shah commented on MAPREDUCE-4421: s/Configuration/Jobconf/ in the previous comment. > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421.patch, MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765889#comment-13765889 ] Hitesh Shah commented on MAPREDUCE-4421: [~jlowe] Had a few questions/comments related to the implementation/patch: - Why does classpath need to include all of common, hdfs and yarn jar locations? Assuming that MR is running on a YARN-based cluster, shouldn't the location of the core dependencies come from the cluster deployment i.e. via the env that the NM sets for a container. I believe the only jars that MR should have in its uploaded tarball should be the client jars. I understand that there is no clear boundary for client-side only jars for common and hdfs today ( for For YARN, I believe it should be simple to split out the client-side requirements ) but it is something we should aim for or assume that the jars deployed on the cluster are compatible. - I guess the underlying question is why use the full hadoop tarball and not just the mapreduce-only tarball? If MR is trully a user-land library, it should be treated as such and have a separate deployment approach. - I would vote to make the tar-ball in HDFS be the only way to run MR on YARN. Obviously, this cannot be done for 2.x but we should move to this model on trunk and not support the current approach at all there. Comments? - The other point is related to configs. Configuration still loads mapred-site and mapred-default files and new Configuration objects are created on the cluster. Are these files still expected on the cluster? job.xml does override these but cluster configs could still have final params. If this is meant to be addressed in a follow-up jira to ensure all MR configs come from the client, you can ignore this point for now. - How do you see framework name extracted from the path to be used? Is it just a safety check to ensure that it is found in the classpath? Will it have any relation to a version? A minor nit - framework name seems confusing in relation to the framework name in use from earlier i.e yarn vs local framework. - Description in the default-xml for mapreduce.application.framework.path does not mention the need for the URI fragment and how the fragment is used as a sanity check to the classpath. - Regarding versions, it seems like users will need to do 2 things. Change the location of the tarball on HDFS and modify the classpath. Users will need to know the exact structure of the classpath. In such a scenario, do defaults even make sense? On the other hand, if we define a common standard i.e. a base path for all MR tarballs, with each tarball in a defined structure ( possibly with version info added on later on for the code to infer the structure of the tarball ), all the user would need to do is specify the base path ( which could have a default value ) and a version which again has a default value. The latter approach would require the code to construct the necessary classpath if the upload path is in use. Do you have any comments on which of the 2 approaches makes more sense? The former is way more flexible but a bit more complex. The latter brittle/inflexible with respect to changing tarball structures but likely more easier to enforce a standard on. > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421.patch, MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5130) Add missing job config options to mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725902#comment-13725902 ] Hitesh Shah commented on MAPREDUCE-5130: [~sandyr] Was a bit thrown off by the jira description which mentions documenting *child.java.opts instead of the property names not using "child". > Add missing job config options to mapred-default.xml > > > Key: MAPREDUCE-5130 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5130 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: documentation >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5130-1.patch, MAPREDUCE-5130-1.patch, > MAPREDUCE-5130-2.patch, MAPREDUCE-5130-3.patch, MAPREDUCE-5130-4.patch, > MAPREDUCE-5130-5.patch, MAPREDUCE-5130.patch, MAPREDUCE-5130.patch > > > I came across that mapreduce.map.child.java.opts and > mapreduce.reduce.child.java.opts were missing in mapred-default.xml. I'll do > a fuller sweep to see what else is missing before posting a patch. > List so far: > mapreduce.map/reduce.child.java.opts > mapreduce.map/reduce.memory.mb > mapreduce.job.jvm.numtasks > mapreduce.input.lineinputformat.linespermap > mapreduce.task.combine.progress.records > mapreduce.map/reduce.env -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5130) Add missing job config options to mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725728#comment-13725728 ] Hitesh Shah commented on MAPREDUCE-5130: Regarding mapreduce.map/reduce.child.java.opts, aren't they to be deprecated in favor or mapreduce.[map|reduce].java.opts? > Add missing job config options to mapred-default.xml > > > Key: MAPREDUCE-5130 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5130 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: documentation >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5130-1.patch, MAPREDUCE-5130-1.patch, > MAPREDUCE-5130-2.patch, MAPREDUCE-5130-3.patch, MAPREDUCE-5130-4.patch, > MAPREDUCE-5130-5.patch, MAPREDUCE-5130.patch, MAPREDUCE-5130.patch > > > I came across that mapreduce.map.child.java.opts and > mapreduce.reduce.child.java.opts were missing in mapred-default.xml. I'll do > a fuller sweep to see what else is missing before posting a patch. > List so far: > mapreduce.map/reduce.child.java.opts > mapreduce.map/reduce.memory.mb > mapreduce.job.jvm.numtasks > mapreduce.input.lineinputformat.linespermap > mapreduce.task.combine.progress.records > mapreduce.map/reduce.env -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5416) hadoop-mapreduce-client-common depends on hadoop-yarn-server-common
Hitesh Shah created MAPREDUCE-5416: -- Summary: hadoop-mapreduce-client-common depends on hadoop-yarn-server-common Key: MAPREDUCE-5416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5416 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Hitesh Shah mapreduce-client-app and mapreduce-client-jobclient modules also depend on yarn-server-common but only in test scope. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5408) CLONE - The logging level of the tasks should be configurable by the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716470#comment-13716470 ] Hitesh Shah commented on MAPREDUCE-5408: Mostly looks good. A couple of minor comments: - DEFAULT_LOG_LEVEL could be renamed to DEFAULT_TASK_LOG_LEVEL and the type changed to a string. Having the type as Level is not buying much as it always ends up being converted to a string when used. If the intention is to retain the backport as is, this comment can be ignored for now. - Level.toLevel() has an api which takes in a default value. In the event that the user has a typo, the current usage falls back to using DEBUG where as the default-based api can be made to fall back to INFO. > CLONE - The logging level of the tasks should be configurable by the job > > > Key: MAPREDUCE-5408 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5408 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Arun C Murthy > Fix For: 1.3.0 > > Attachments: MAPREDUCE-336_branch1.patch > > > It would be nice to be able to configure the logging level of the Task JVM's > separately from the server JVM's. Reducing logging substantially increases > performance and reduces the consumption of local disk on the task trackers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5399) Large number of map tasks cause slow sort at reduce phase, invariant to amount of data to sort
[ https://issues.apache.org/jira/browse/MAPREDUCE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved MAPREDUCE-5399. Resolution: Invalid If this is indeed an issue with Apache Hadoop-1.x, please feel free to file a jira with details specific to that. Issues with a particular vendor's distro should be redirected to the vendor in question. > Large number of map tasks cause slow sort at reduce phase, invariant to > amount of data to sort > -- > > Key: MAPREDUCE-5399 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5399 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Reporter: Stanislav Barton >Priority: Critical > > We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from > 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job > in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on > input each about 100MB) and 6 000 reducers (one reducer per table region). I > was trying to figure out what at which phase the slow down appears (firstly I > suspected that the slow gathering of the 1 map output files is the > culprit) and found out that the problem is not reading the map output (the > shuffle) but the sort/merge phase that follows - the last and actual reduce > phase is fast. I have tried to up the io.sort.factor because I thought the > lots of small files are being merged on disk, but again upping that to 1000 > didnt do any difference. I have then printed the stack trace and found out > that the problem is initialization of the > org.apache.hadoop.mapred.IFileInputStream namely the creation of the > Configuration object which is not propagated along from earlier context, see > the stack trace: > Thread 13332: (state = IN_NATIVE) > - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 > (Compiled frame; information may be imprecise) > - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 > (Compiled frame) > - java.io.File.exists() @bci=20, line=733 (Compiled frame) > - sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) > @bci=136, line=999 (Compiled frame) > - sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) > @bci=3, line=966 (Compiled frame) > - sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, > line=146 (Compiled frame) > - java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame) > - > java.security.AccessController.doPrivileged(java.security.PrivilegedAction, > java.security.AccessControlContext) @bci=0 (Compiled frame) > - java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 > (Compiled frame) > - java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 > (Compiled frame) > - java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, > line=1192 (Compiled frame) > - javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame) > - > java.security.AccessController.doPrivileged(java.security.PrivilegedAction) > @bci=0 (Compiled frame) > - > javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, > java.lang.String) @bci=10, line=89 (Compiled frame) > - javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) > @bci=38, line=250 (Interpreted frame) > - javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) > @bci=273, line=223 (Interpreted frame) > - javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 > (Compiled frame) > - org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, > org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 > (Compiled frame) > - org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, > java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame) > - org.apache.hadoop.conf.Configuration.getProps() @bci=43, line=1785 > (Compiled frame) > - org.apache.hadoop.conf.Configuration.get(java.lang.String) @bci=35, > line=712 (Compiled frame) > - org.apache.hadoop.conf.Configuration.getTrimmed(java.lang.String) @bci=2, > line=731 (Compiled frame) > - org.apache.hadoop.conf.Configuration.getBoolean(java.lang.String, boolean) > @bci=2, line=1047 (Interpreted frame) > - org.apache.hadoop.mapred.IFileInputStream.(java.io.InputStream, > long, org.apache.hadoop.conf.Configuration) @bci=111, line=93 (Interpreted > frame) > - > org.apache.hadoop.mapred.IFile$Reader.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.fs.FSDataInputStream, long, > org.apache.hadoop.io.compress.CompressionCodec, > org.apache.hadoop.mapred.Counters$Counter) @bci=
[jira] [Resolved] (MAPREDUCE-5325) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter---MR changes
[ https://issues.apache.org/jira/browse/MAPREDUCE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved MAPREDUCE-5325. Resolution: Fixed Fix Version/s: 2.1.0-beta Committed to trunk, branch-2, branch-2.1-beta and branch-2.1.0-beta. Thanks Xuan. > ClientRMProtocol.getAllApplications should accept ApplicationType as a > parameter---MR changes > - > > Key: MAPREDUCE-5325 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5325 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.1.0-beta > > Attachments: MR-5325.1.patch, MR-5325.2.patch, MR-5325.3.patch, > MR-5325.4.patch, MR-5325.5.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5325) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter---MR changes
[ https://issues.apache.org/jira/browse/MAPREDUCE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703862#comment-13703862 ] Hitesh Shah commented on MAPREDUCE-5325: Overall patch being reviewed as part of YARN-727. Will be committed together to ensure build does not break. > ClientRMProtocol.getAllApplications should accept ApplicationType as a > parameter---MR changes > - > > Key: MAPREDUCE-5325 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5325 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: MR-5325.1.patch, MR-5325.2.patch, MR-5325.3.patch, > MR-5325.4.patch, MR-5325.5.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5325) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter---MR changes
[ https://issues.apache.org/jira/browse/MAPREDUCE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684234#comment-13684234 ] Hitesh Shah commented on MAPREDUCE-5325: @Xuan, will mapreduce jobs have different application types or only a single fixed type for all MR jobs? If the latter, the getAllJobs() should not be taking application type as an argument. > ClientRMProtocol.getAllApplications should accept ApplicationType as a > parameter---MR changes > - > > Key: MAPREDUCE-5325 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5325 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: MR-5325.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5324) Admin-provided user environment can be overridden by user provided values for the AM
Hitesh Shah created MAPREDUCE-5324: -- Summary: Admin-provided user environment can be overridden by user provided values for the AM Key: MAPREDUCE-5324 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5324 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Hitesh Shah Priority: Minor MRJobConfig.MR_AM_ADMIN_USER_ENV can be overridden by MRJobConfig.MR_AM_ENV. Either the variable should be renamed to something along the lines of DEFAULT_ENV or the code fixed to have the correct overrides. Current documentation clearly states user env overrides admin env. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5095) TestShuffleExceptionCount#testCheckException fails occasionally with JDK7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662321#comment-13662321 ] Hitesh Shah commented on MAPREDUCE-5095: Thanks Arpit. Committed to branch-1. > TestShuffleExceptionCount#testCheckException fails occasionally with JDK7 > - > > Key: MAPREDUCE-5095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.2 > Environment: Open JDK7 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 1.3.0 > > Attachments: MAPREDUCE-5095.patch > > Original Estimate: 1h > Time Spent: 1h > Remaining Estimate: 0h > > The test fails due a test-order dependency that can be violated when running > with JDK 7. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5191: --- Release Note: (was: Thanks Ivan. Committed to trunk.) > TestQueue#testQueue fails with timeout on Windows > - > > Key: MAPREDUCE-5191 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > Fix For: 3.0.0 > > Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.3.patch, > MAPREDUCE-5191.patch > > > Test times out on my machine after 5 seconds always on the below stack: > {code} > testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec <<< > ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:485) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) > at > sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) > at > sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) > at > sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) > at java.security.SecureRandom.nextBytes(SecureRandom.java:433) > at java.security.SecureRandom.next(SecureRandom.java:455) > at java.util.Random.nextLong(Random.java:284) > at java.io.File.generateFile(File.java:1682) > at java.io.File.createTempFile(File.java:1791) > at java.io.File.createTempFile(File.java:1828) > at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) > at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5095) TestShuffleExceptionCount#testCheckException fails occasionally with JDK7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5095: --- Release Note: (was: Thanks Arpit. Committed to branch-1. ) > TestShuffleExceptionCount#testCheckException fails occasionally with JDK7 > - > > Key: MAPREDUCE-5095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.2 > Environment: Open JDK7 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 1.3.0 > > Attachments: MAPREDUCE-5095.patch > > Original Estimate: 1h > Time Spent: 1h > Remaining Estimate: 0h > > The test fails due a test-order dependency that can be violated when running > with JDK 7. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662320#comment-13662320 ] Hitesh Shah commented on MAPREDUCE-5191: Thanks Ivan. Committed to trunk. > TestQueue#testQueue fails with timeout on Windows > - > > Key: MAPREDUCE-5191 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > Fix For: 3.0.0 > > Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.3.patch, > MAPREDUCE-5191.patch > > > Test times out on my machine after 5 seconds always on the below stack: > {code} > testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec <<< > ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:485) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) > at > sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) > at > sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) > at > sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) > at java.security.SecureRandom.nextBytes(SecureRandom.java:433) > at java.security.SecureRandom.next(SecureRandom.java:455) > at java.util.Random.nextLong(Random.java:284) > at java.io.File.generateFile(File.java:1682) > at java.io.File.createTempFile(File.java:1791) > at java.io.File.createTempFile(File.java:1828) > at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) > at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5095) TestShuffleExceptionCount#testCheckException fails occasionally with JDK7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved MAPREDUCE-5095. Resolution: Fixed Release Note: Thanks Arpit. Committed to branch-1. > TestShuffleExceptionCount#testCheckException fails occasionally with JDK7 > - > > Key: MAPREDUCE-5095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.2 > Environment: Open JDK7 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 1.3.0 > > Attachments: MAPREDUCE-5095.patch > > Original Estimate: 1h > Time Spent: 1h > Remaining Estimate: 0h > > The test fails due a test-order dependency that can be violated when running > with JDK 7. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5095) TestShuffleExceptionCount#testCheckException fails occasionally with JDK7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662316#comment-13662316 ] Hitesh Shah commented on MAPREDUCE-5095: [~arpitagarwal] Should have reviewed the whole patch in context. Thanks for the clarification. +1. Will commit shortly. > TestShuffleExceptionCount#testCheckException fails occasionally with JDK7 > - > > Key: MAPREDUCE-5095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.2 > Environment: Open JDK7 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 1.3.0 > > Attachments: MAPREDUCE-5095.patch > > Original Estimate: 1h > Time Spent: 1h > Remaining Estimate: 0h > > The test fails due a test-order dependency that can be violated when running > with JDK 7. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5191: --- Resolution: Fixed Fix Version/s: 3.0.0 Release Note: Thanks Ivan. Committed to trunk. Status: Resolved (was: Patch Available) > TestQueue#testQueue fails with timeout on Windows > - > > Key: MAPREDUCE-5191 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > Fix For: 3.0.0 > > Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.3.patch, > MAPREDUCE-5191.patch > > > Test times out on my machine after 5 seconds always on the below stack: > {code} > testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec <<< > ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:485) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) > at > sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) > at > sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) > at > sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) > at java.security.SecureRandom.nextBytes(SecureRandom.java:433) > at java.security.SecureRandom.next(SecureRandom.java:455) > at java.util.Random.nextLong(Random.java:284) > at java.io.File.generateFile(File.java:1682) > at java.io.File.createTempFile(File.java:1791) > at java.io.File.createTempFile(File.java:1828) > at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) > at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662313#comment-13662313 ] Hitesh Shah commented on MAPREDUCE-5191: +1. Committing shortly. > TestQueue#testQueue fails with timeout on Windows > - > > Key: MAPREDUCE-5191 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.3.patch, > MAPREDUCE-5191.patch > > > Test times out on my machine after 5 seconds always on the below stack: > {code} > testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec <<< > ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:485) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) > at > sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) > at > sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) > at > sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) > at java.security.SecureRandom.nextBytes(SecureRandom.java:433) > at java.security.SecureRandom.next(SecureRandom.java:455) > at java.util.Random.nextLong(Random.java:284) > at java.io.File.generateFile(File.java:1682) > at java.io.File.createTempFile(File.java:1791) > at java.io.File.createTempFile(File.java:1828) > at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) > at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5191: --- Status: Open (was: Patch Available) > TestQueue#testQueue fails with timeout on Windows > - > > Key: MAPREDUCE-5191 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.patch > > > Test times out on my machine after 5 seconds always on the below stack: > {code} > testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec <<< > ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:485) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) > at > sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) > at > sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) > at > sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) > at java.security.SecureRandom.nextBytes(SecureRandom.java:433) > at java.security.SecureRandom.next(SecureRandom.java:455) > at java.util.Random.nextLong(Random.java:284) > at java.io.File.generateFile(File.java:1682) > at java.io.File.createTempFile(File.java:1791) > at java.io.File.createTempFile(File.java:1828) > at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) > at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5240) inside of FileOutputCommitter the initialized Credentials cache appears to be empty
[ https://issues.apache.org/jira/browse/MAPREDUCE-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5240: --- Component/s: (was: mrv1) mrv2 > inside of FileOutputCommitter the initialized Credentials cache appears to be > empty > --- > > Key: MAPREDUCE-5240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5240 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.4-alpha >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.0.5-beta > > Attachments: LostCreds.java > > > I am attaching a modified wordcount job that clearly demonstrates the problem > we've encountered in running Sqoop2 on YARN (BIGTOP-949). > Here's what running it produces: > {noformat} > $ hadoop fs -mkdir in > $ hadoop fs -put /etc/passwd in > $ hadoop jar ./bug.jar org.myorg.LostCreds > 13/05/12 03:13:46 WARN mapred.JobConf: The variable mapred.child.ulimit is no > longer used. > numberOfSecretKeys: 1 > numberOfTokens: 0 > .. > .. > .. > 13/05/12 03:05:35 INFO mapreduce.Job: Job job_1368318686284_0013 failed with > state FAILED due to: Job commit failed: java.io.IOException: > numberOfSecretKeys: 0 > numberOfTokens: 0 > at > org.myorg.LostCreds$DestroyerFileOutputCommitter.commitJob(LostCreds.java:43) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:249) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:212) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > {noformat} > As you can see, even though we've clearly initialized the creds via: > {noformat} > job.getCredentials().addSecretKey(new Text("mykey"), "mysecret".getBytes()); > {noformat} > It doesn't seem to appear later in the job. > This is a pretty critical issue for Sqoop 2 since it appears to be DOA for > YARN in Hadoop 2.0.4-alpha -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654947#comment-13654947 ] Hitesh Shah commented on MAPREDUCE-5191: Does it make sense to not use the temp file method in such a scenario to reduce the time it takes to run? How about just creating a file under target/ with the name of the test as filename? On a Mac, I saw this test run on an avg of 1 second for multiple runs. > TestQueue#testQueue fails with timeout on Windows > - > > Key: MAPREDUCE-5191 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.patch > > > Test times out on my machine after 5 seconds always on the below stack: > {code} > testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec <<< > ERROR! > java.lang.Exception: test timed out after 5000 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:485) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) > at > sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) > at > sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) > at > sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) > at > sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) > at java.security.SecureRandom.nextBytes(SecureRandom.java:433) > at java.security.SecureRandom.next(SecureRandom.java:455) > at java.util.Random.nextLong(Random.java:284) > at java.io.File.generateFile(File.java:1682) > at java.io.File.createTempFile(File.java:1791) > at java.io.File.createTempFile(File.java:1828) > at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) > at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5095) TestShuffleExceptionCount#testCheckException fails occasionally with JDK7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654710#comment-13654710 ] Hitesh Shah commented on MAPREDUCE-5095: Should abortCalled also be changed to a non-static? > TestShuffleExceptionCount#testCheckException fails occasionally with JDK7 > - > > Key: MAPREDUCE-5095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.2 > Environment: Open JDK7 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 1.3.0 > > Attachments: MAPREDUCE-5095.patch > > Original Estimate: 1h > Time Spent: 1h > Remaining Estimate: 0h > > The test fails due a test-order dependency that can be violated when running > with JDK 7. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5179) Change TestHSWebServices to do string equal check on hadoop build version similar to YARN-605
[ https://issues.apache.org/jira/browse/MAPREDUCE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5179: --- Status: Patch Available (was: Open) > Change TestHSWebServices to do string equal check on hadoop build version > similar to YARN-605 > - > > Key: MAPREDUCE-5179 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5179 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: MAPREDUCE-5179.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5179) Change TestHSWebServices to do string equal check on hadoop build version similar to YARN-605
[ https://issues.apache.org/jira/browse/MAPREDUCE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642073#comment-13642073 ] Hitesh Shah commented on MAPREDUCE-5179: [~vinodkv], None others found. > Change TestHSWebServices to do string equal check on hadoop build version > similar to YARN-605 > - > > Key: MAPREDUCE-5179 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5179 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: MAPREDUCE-5179.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5179) Change TestHSWebServices to do string equal check on hadoop build version similar to YARN-605
[ https://issues.apache.org/jira/browse/MAPREDUCE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640672#comment-13640672 ] Hitesh Shah commented on MAPREDUCE-5179: Needs YARN-605 to be committed before this can go in. > Change TestHSWebServices to do string equal check on hadoop build version > similar to YARN-605 > - > > Key: MAPREDUCE-5179 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5179 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: MAPREDUCE-5179.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5178) Fix use of BuilderUtils#newApplicationReport as a result of YARN-577.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640674#comment-13640674 ] Hitesh Shah commented on MAPREDUCE-5178: Needs YARN-577 to go in before this can be committed. > Fix use of BuilderUtils#newApplicationReport as a result of YARN-577. > - > > Key: MAPREDUCE-5178 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5178 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: MAPREDUCE-5178.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5179) Change TestHSWebServices to do string equal check on hadoop build version similar to YARN-605
[ https://issues.apache.org/jira/browse/MAPREDUCE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5179: --- Attachment: MAPREDUCE-5179.1.patch > Change TestHSWebServices to do string equal check on hadoop build version > similar to YARN-605 > - > > Key: MAPREDUCE-5179 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5179 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: MAPREDUCE-5179.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5179) Change TestHSWebServices to do string equal check on hadoop build version similar to YARN-605
Hitesh Shah created MAPREDUCE-5179: -- Summary: Change TestHSWebServices to do string equal check on hadoop build version similar to YARN-605 Key: MAPREDUCE-5179 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5179 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-5178) Fix use of BuilderUtils#newApplicationReport as a result of YARN-577.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reassigned MAPREDUCE-5178: -- Assignee: Hitesh Shah > Fix use of BuilderUtils#newApplicationReport as a result of YARN-577. > - > > Key: MAPREDUCE-5178 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5178 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: MAPREDUCE-5178.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5178) Fix use of BuilderUtils#newApplicationReport as a result of YARN-577.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5178: --- Attachment: MAPREDUCE-5178.1.patch > Fix use of BuilderUtils#newApplicationReport as a result of YARN-577. > - > > Key: MAPREDUCE-5178 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5178 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Hitesh Shah > Attachments: MAPREDUCE-5178.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5178) Fix use of BuilderUtils#newApplicationReport as a result of YARN-577.
Hitesh Shah created MAPREDUCE-5178: -- Summary: Fix use of BuilderUtils#newApplicationReport as a result of YARN-577. Key: MAPREDUCE-5178 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5178 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Hitesh Shah -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5142) MR AM unregisters with state KILLED when an error causes dispatcher to shutdown
[ https://issues.apache.org/jira/browse/MAPREDUCE-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5142: --- Description: RMCommunicator sets final state to KILLED if the job is in a running state and isSignalled is set to true. {code} } else if (jobImpl.getInternalState() == JobStateInternal.KILLED || (jobImpl.getInternalState() == JobStateInternal.RUNNING && isSignalled)) { finishState = FinalApplicationStatus.KILLED; } else if (jobImpl.getInternalState() == JobStateInternal.FAILED || jobImpl.getInternalState() == JobStateInternal.ERROR) { finishState = FinalApplicationStatus.FAILED; {code} This happens when any uncaught exception in any event handler ends up causing the AsyncDispatcher to trigger a shutdown. In such a scenario, even though the AM actually failed due to some error, its actual state ends up as KILLED. was: RMCommunicator sets final state to KILLED if the job is in a running state and isSignalled is set to true. {code} } else if (jobImpl.getInternalState() == JobStateInternal.KILLED || (jobImpl.getInternalState() == JobStateInternal.RUNNING && isSignalled)) { finishState = FinalApplicationStatus.KILLED; } else if (jobImpl.getInternalState() == JobStateInternal.FAILED || jobImpl.getInternalState() == JobStateInternal.ERROR) { finishState = FinalApplicationStatus.FAILED; {code} This happens when for some reason, there is an exception in a state machine's event handler causing AsyncDispatcher to trigger a shutdown. In such a scenario, even though the AM actually failed due to some error, its actual state ends up as KILLED. > MR AM unregisters with state KILLED when an error causes dispatcher to > shutdown > --- > > Key: MAPREDUCE-5142 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5142 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Hitesh Shah > > RMCommunicator sets final state to KILLED if the job is in a running state > and isSignalled is set to true. > {code} > } else if (jobImpl.getInternalState() == JobStateInternal.KILLED > || (jobImpl.getInternalState() == JobStateInternal.RUNNING && > isSignalled)) { > finishState = FinalApplicationStatus.KILLED; > } else if (jobImpl.getInternalState() == JobStateInternal.FAILED > || jobImpl.getInternalState() == JobStateInternal.ERROR) { > finishState = FinalApplicationStatus.FAILED; > {code} > This happens when any uncaught exception in any event handler ends up causing > the AsyncDispatcher to trigger a shutdown. In such a scenario, even though > the AM actually failed due to some error, its actual state ends up as KILLED. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5142) MR AM unregisters with state KILLED when an error causes dispatcher to shutdown
[ https://issues.apache.org/jira/browse/MAPREDUCE-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5142: --- Affects Version/s: 2.0.3-alpha 0.23.5 > MR AM unregisters with state KILLED when an error causes dispatcher to > shutdown > --- > > Key: MAPREDUCE-5142 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5142 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Hitesh Shah > > RMCommunicator sets final state to KILLED if the job is in a running state > and isSignalled is set to true. > {code} > } else if (jobImpl.getInternalState() == JobStateInternal.KILLED > || (jobImpl.getInternalState() == JobStateInternal.RUNNING && > isSignalled)) { > finishState = FinalApplicationStatus.KILLED; > } else if (jobImpl.getInternalState() == JobStateInternal.FAILED > || jobImpl.getInternalState() == JobStateInternal.ERROR) { > finishState = FinalApplicationStatus.FAILED; > {code} > This happens when for some reason, there is an exception in a state machine's > event handler causing AsyncDispatcher to trigger a shutdown. In such a > scenario, even though the AM actually failed due to some error, its actual > state ends up as KILLED. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5142) MR AM unregisters with state KILLED when an error causes dispatcher to shutdown
[ https://issues.apache.org/jira/browse/MAPREDUCE-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628187#comment-13628187 ] Hitesh Shah commented on MAPREDUCE-5142: @Jason, yes - definitely the same underlying issue. Addressing the CLC creation would address a part of the issue but currently all uncaught exceptions will end up with the AM in a KILLED state. > MR AM unregisters with state KILLED when an error causes dispatcher to > shutdown > --- > > Key: MAPREDUCE-5142 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5142 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Hitesh Shah > > RMCommunicator sets final state to KILLED if the job is in a running state > and isSignalled is set to true. > {code} > } else if (jobImpl.getInternalState() == JobStateInternal.KILLED > || (jobImpl.getInternalState() == JobStateInternal.RUNNING && > isSignalled)) { > finishState = FinalApplicationStatus.KILLED; > } else if (jobImpl.getInternalState() == JobStateInternal.FAILED > || jobImpl.getInternalState() == JobStateInternal.ERROR) { > finishState = FinalApplicationStatus.FAILED; > {code} > This happens when for some reason, there is an exception in a state machine's > event handler causing AsyncDispatcher to trigger a shutdown. In such a > scenario, even though the AM actually failed due to some error, its actual > state ends up as KILLED. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5142) MR AM unregisters with state KILLED when an error causes dispatcher to shutdown
Hitesh Shah created MAPREDUCE-5142: -- Summary: MR AM unregisters with state KILLED when an error causes dispatcher to shutdown Key: MAPREDUCE-5142 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5142 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Hitesh Shah RMCommunicator sets final state to KILLED if the job is in a running state and isSignalled is set to true. {code} } else if (jobImpl.getInternalState() == JobStateInternal.KILLED || (jobImpl.getInternalState() == JobStateInternal.RUNNING && isSignalled)) { finishState = FinalApplicationStatus.KILLED; } else if (jobImpl.getInternalState() == JobStateInternal.FAILED || jobImpl.getInternalState() == JobStateInternal.ERROR) { finishState = FinalApplicationStatus.FAILED; {code} This happens when for some reason, there is an exception in a state machine's event handler causing AsyncDispatcher to trigger a shutdown. In such a scenario, even though the AM actually failed due to some error, its actual state ends up as KILLED. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622537#comment-13622537 ] Hitesh Shah commented on MAPREDUCE-5083: Minor clarification - changes.txt was modified in branch-2 and branch-2.0.4 - trunk has some additional mayhem to clear out first. > MiniMRCluster should use a random component when creating an actual cluster > --- > > Key: MAPREDUCE-5083 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.0.4-alpha > > Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk_2.txt, > MAPREDUCE-5083-trunk.txt > > > Currently all unit tests end up using the same work dir - which can affect > anyone trying to run parallel instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved MAPREDUCE-5083. Resolution: Fixed Fix Version/s: (was: 2.0.5-beta) 2.0.4-alpha Target Version/s: (was: 2.0.5-beta) Release Note: Committed to branch-2.0.4. Modified changes.txt in trunk, branch-2 and branch-2.0.4 accordingly. > MiniMRCluster should use a random component when creating an actual cluster > --- > > Key: MAPREDUCE-5083 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.0.4-alpha > > Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk_2.txt, > MAPREDUCE-5083-trunk.txt > > > Currently all unit tests end up using the same work dir - which can affect > anyone trying to run parallel instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reopened MAPREDUCE-5083: > MiniMRCluster should use a random component when creating an actual cluster > --- > > Key: MAPREDUCE-5083 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.0.5-beta > > Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk_2.txt, > MAPREDUCE-5083-trunk.txt > > > Currently all unit tests end up using the same work dir - which can affect > anyone trying to run parallel instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622520#comment-13622520 ] Hitesh Shah commented on MAPREDUCE-5083: @Stack, I will be committing shortly to branch-2.0.4. > MiniMRCluster should use a random component when creating an actual cluster > --- > > Key: MAPREDUCE-5083 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.0.5-beta > > Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk_2.txt, > MAPREDUCE-5083-trunk.txt > > > Currently all unit tests end up using the same work dir - which can affect > anyone trying to run parallel instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5088) MR Client gets an renewer token exception while Oozie is submitting a job
[ https://issues.apache.org/jira/browse/MAPREDUCE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5088: --- Fix Version/s: (was: 2.0.5-beta) (was: 3.0.0) > MR Client gets an renewer token exception while Oozie is submitting a job > - > > Key: MAPREDUCE-5088 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5088 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Roman Shaposhnik >Assignee: Daryn Sharp >Priority: Blocker > Fix For: 2.0.4-alpha > > Attachments: HADOOP-9409.patch, HADOOP-9409.patch, > MAPREDUCE-5088.patch, MAPREDUCE-5088.patch, MAPREDUCE-5088.txt > > > After the fix for HADOOP-9299 I'm now getting the following bizzare exception > in Oozie while trying to submit a job. This also seems to be KRB related: > {noformat} > 2013-03-15 13:34:16,555 WARN ActionStartXCommand:542 - USER[hue] GROUP[-] > TOKEN[] APP[MapReduce] JOB[001-130315123130987-oozie-oozi-W] > ACTION[001-130315123130987-oozie-oozi-W@Sleep] Error starting action > [Sleep]. ErrorType [ERROR], ErrorCode [UninitializedMessageException], > Message [UninitializedMessageException: Message missing required fields: > renewer] > org.apache.oozie.action.ActionExecutorException: > UninitializedMessageException: Message missing required fields: renewer > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:401) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:738) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:889) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:211) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:59) > at org.apache.oozie.command.XCommand.call(XCommand.java:277) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: com.google.protobuf.UninitializedMessageException: Message missing > required fields: renewer > at > com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:605) > at > org.apache.hadoop.security.proto.SecurityProtos$GetDelegationTokenRequestProto$Builder.build(SecurityProtos.java:973) > at > org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetDelegationTokenRequestPBImpl.mergeLocalToProto(GetDelegationTokenRequestPBImpl.java:84) > at > org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetDelegationTokenRequestPBImpl.getProto(GetDelegationTokenRequestPBImpl.java:67) > at > org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getDelegationToken(MRClientProtocolPBClientImpl.java:200) > at > org.apache.hadoop.mapred.YARNRunner.getDelegationTokenFromHS(YARNRunner.java:194) > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:273) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1439) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:581) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1439) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:576) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:723) > ... 10 more > 2013-03-15 13:34:16,555 WARN ActionStartXCommand:542 - USER[hue] GROUP[-] > TOKEN[] APP[MapReduce] JOB[001-13031512313 > {noform
[jira] [Commented] (MAPREDUCE-5088) MR Client gets an renewer token exception while Oozie is submitting a job
[ https://issues.apache.org/jira/browse/MAPREDUCE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621482#comment-13621482 ] Hitesh Shah commented on MAPREDUCE-5088: Updated fixed version to 2.0.4-alpha as assumption is that anything committed to 2.0.4-alpha should also have been committed to trunk and branch-2. > MR Client gets an renewer token exception while Oozie is submitting a job > - > > Key: MAPREDUCE-5088 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5088 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Roman Shaposhnik >Assignee: Daryn Sharp >Priority: Blocker > Fix For: 2.0.4-alpha > > Attachments: HADOOP-9409.patch, HADOOP-9409.patch, > MAPREDUCE-5088.patch, MAPREDUCE-5088.patch, MAPREDUCE-5088.txt > > > After the fix for HADOOP-9299 I'm now getting the following bizzare exception > in Oozie while trying to submit a job. This also seems to be KRB related: > {noformat} > 2013-03-15 13:34:16,555 WARN ActionStartXCommand:542 - USER[hue] GROUP[-] > TOKEN[] APP[MapReduce] JOB[001-130315123130987-oozie-oozi-W] > ACTION[001-130315123130987-oozie-oozi-W@Sleep] Error starting action > [Sleep]. ErrorType [ERROR], ErrorCode [UninitializedMessageException], > Message [UninitializedMessageException: Message missing required fields: > renewer] > org.apache.oozie.action.ActionExecutorException: > UninitializedMessageException: Message missing required fields: renewer > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:401) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:738) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:889) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:211) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:59) > at org.apache.oozie.command.XCommand.call(XCommand.java:277) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: com.google.protobuf.UninitializedMessageException: Message missing > required fields: renewer > at > com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:605) > at > org.apache.hadoop.security.proto.SecurityProtos$GetDelegationTokenRequestProto$Builder.build(SecurityProtos.java:973) > at > org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetDelegationTokenRequestPBImpl.mergeLocalToProto(GetDelegationTokenRequestPBImpl.java:84) > at > org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetDelegationTokenRequestPBImpl.getProto(GetDelegationTokenRequestPBImpl.java:67) > at > org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getDelegationToken(MRClientProtocolPBClientImpl.java:200) > at > org.apache.hadoop.mapred.YARNRunner.getDelegationTokenFromHS(YARNRunner.java:194) > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:273) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1439) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:581) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1439) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:576) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:723) > ... 10
[jira] [Commented] (MAPREDUCE-5094) Disable mem monitoring by default in MiniMRYarnCluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611837#comment-13611837 ] Hitesh Shah commented on MAPREDUCE-5094: @Stack, only max pmem is configurable directly but max vmem can be configured indirectly via the vmem-pmem ratio ( default ratio is 2.1 ). > Disable mem monitoring by default in MiniMRYarnCluster > -- > > Key: MAPREDUCE-5094 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5094 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > > YARN-449. Some hbase tests were failing since containers were getting killed. > I believe these checks are disabled by default on the branch-1 MiniMRCluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5083: --- Resolution: Fixed Fix Version/s: 2.0.5-beta Status: Resolved (was: Patch Available) Thanks Sid. Committed to branch-2 and trunk. > MiniMRCluster should use a random component when creating an actual cluster > --- > > Key: MAPREDUCE-5083 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.0.5-beta > > Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk_2.txt, > MAPREDUCE-5083-trunk.txt > > > Currently all unit tests end up using the same work dir - which can affect > anyone trying to run parallel instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611084#comment-13611084 ] Hitesh Shah commented on MAPREDUCE-5083: +1. Will commit shortly. > MiniMRCluster should use a random component when creating an actual cluster > --- > > Key: MAPREDUCE-5083 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk_2.txt, > MAPREDUCE-5083-trunk.txt > > > Currently all unit tests end up using the same work dir - which can affect > anyone trying to run parallel instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608254#comment-13608254 ] Hitesh Shah commented on MAPREDUCE-5083: ( above change needed in MiniMRCluster.java ) > MiniMRCluster should use a random component when creating an actual cluster > --- > > Key: MAPREDUCE-5083 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk.txt > > > Currently all unit tests end up using the same work dir - which can affect > anyone trying to run parallel instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608252#comment-13608252 ] Hitesh Shah commented on MAPREDUCE-5083: {code} String identifier = this.getClass().getName() {code} Should replace getClass().getName() to getSimpleName for ensuring things don't break on Windows > MiniMRCluster should use a random component when creating an actual cluster > --- > > Key: MAPREDUCE-5083 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk.txt > > > Currently all unit tests end up using the same work dir - which can affect > anyone trying to run parallel instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603885#comment-13603885 ] Hitesh Shah commented on MAPREDUCE-5066: Job notification also exists in 2.x which may face the same set of issues. > JobTracker should set a timeout when calling into job.end.notification.url > -- > > Key: MAPREDUCE-5066 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > > In current code, timeout is not specified when JobTracker (JobEndNotifier) > calls into the notification URL. When the given URL points to a server that > will not respond for a long time, job notifications are completely stuck > (given that we have only a single thread processing all notifications). We've > seen this cause noticeable delays in job execution in components that rely on > job end notifications (like Oozie workflows). > I propose we introduce a configurable timeout option and set a default to a > reasonably small value. > If we want, we can also introduce a configurable number of workers processing > the notification queue (not sure if this is needed though at this point). > I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5066: --- Affects Version/s: 2.0.3-alpha > JobTracker should set a timeout when calling into job.end.notification.url > -- > > Key: MAPREDUCE-5066 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > > In current code, timeout is not specified when JobTracker (JobEndNotifier) > calls into the notification URL. When the given URL points to a server that > will not respond for a long time, job notifications are completely stuck > (given that we have only a single thread processing all notifications). We've > seen this cause noticeable delays in job execution in components that rely on > job end notifications (like Oozie workflows). > I propose we introduce a configurable timeout option and set a default to a > reasonably small value. > If we want, we can also introduce a configurable number of workers processing > the notification queue (not sure if this is needed though at this point). > I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4442) Accessing hadoop counters from a job is unreliable in yarn during AM process cleanup window
[ https://issues.apache.org/jira/browse/MAPREDUCE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-4442: --- Labels: usability (was: ) > Accessing hadoop counters from a job is unreliable in yarn during AM process > cleanup window > > > Key: MAPREDUCE-4442 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4442 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Rahul Jain > Labels: usability > Attachments: am_logs_counter_failure.html, > rsrc_mgr_logs_counter_failed.txt > > > We found this issue during our tests moving from MapReduceV1 to MapReduceV2. > A few of our applications access job counters multiple times: > a) After submission of job, while job is execution (works fine) > b) Right after job complete notification is received (works fine) > c) Few seconds after job complete notification (fails most of the time). > The error snippet is as follows: > {code} > 2012-07-12 19:12:29,039 WARN [Client] Unexpected error reading responses on > connection Thread[IPC Client (1252749669) connection to > sjc1-ciq-ibm-grid07.carrieriq.com/10.202.50.187:47944 from hadoop,5,main] > java.lang.NullPointerException > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:852) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:781) > 2012-07-12 19:12:29,044 INFO [ClientServiceDelegate] Application state is > completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server > 2012-07-12 19:12:29,132 INFO [ClientServiceDelegate] Application state is > completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server > 2012-07-12 19:12:29,216 ERROR [UserGroupInformation] > PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException > 2012-07-12 19:12:29,216 WARN [BaseOutputStageJob] getJobCounters: Unable to > retrieve counters. null > java.io.IOException > at > org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:315) > at > org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:335) > at > org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:470) > at org.apache.hadoop.mapreduce.Job$8.run(Job.java:719) > at org.apache.hadoop.mapreduce.Job$8.run(Job.java:716) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:716) > at > org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:396) > {code} > The connection to 10.202.50.187:47944 is actually the connection to AM; > appears that we are connecting to AM to get the counters for the successful > job and not yet to the history server. > > I'll attach the logs for AM and resource mgr separately, however no unusual > activity is seen in those. > This makes me suspect that we have a race condition in the code trying to > access job counters when AM is finishing up and the job hasn't moved to > history server yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4648) Diagnostics from AM are missing from job history
[ https://issues.apache.org/jira/browse/MAPREDUCE-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-4648: --- Labels: usability (was: ) > Diagnostics from AM are missing from job history > > > Key: MAPREDUCE-4648 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4648 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.0, 2.0.0-alpha >Reporter: Jason Lowe > Labels: usability > > When a job fails during setup or commit, any diagnostics from the MapReduce > ApplicationMaster are not available in the job history. Currently the > diagnostics for the job are collected from the diagnostics of tasks run for > the job, but the AM has no corresponding task record in the job history. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4693) Historyserver should provide counters for failed tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-4693: --- Labels: usability (was: ) > Historyserver should provide counters for failed tasks > -- > > Key: MAPREDUCE-4693 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4693 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, mrv2 >Reporter: Jason Lowe > Labels: usability > > Currently the historyserver is not providing counters for failed tasks, even > though they are available via the AM as long as the job is still running. > Those counters are lost when the client needs to redirect to the > historyserver after the job completes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4692) Investigate and remove MR1 JTConfig and its constants use in the MR project on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-4692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-4692: --- Labels: usability (was: ) > Investigate and remove MR1 JTConfig and its constants use in the MR project > on trunk > > > Key: MAPREDUCE-4692 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4692 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Reporter: Harsh J >Priority: Minor > Labels: usability > > Filed on behalf of Robert from MAPREDUCE-3223 > {quote} > Are there any JIRAs to deprecate the configs from where they reside in the > code? > ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/server/jobtracker/JTConfig.java > for example. I know we cannot delete them out just yet, because MRV1 code > still exists and may be using it, but it would be good to mark all of those > configs as deprecated. So that we can delete them in trunk once the MRV1 code > is completely removed. > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4704) TaskHeartbeatHandler misreports a ping timeout as a task timeout
[ https://issues.apache.org/jira/browse/MAPREDUCE-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-4704: --- Labels: usability (was: ) > TaskHeartbeatHandler misreports a ping timeout as a task timeout > > > Key: MAPREDUCE-4704 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4704 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am, mrv2 >Affects Versions: 0.23.3 >Reporter: Jason Lowe >Priority: Minor > Labels: usability > > When a task fails to ping within the hardcoded ping timeout of 5 minutes, > TaskHeartbeatHandler logs a message reporting the wrong timeout value. It > reports a timeout of mapreduce.task.timeout seconds rather than the 5 minute > ping timeout. > This can lead to user confusion if they try increasing mapreduce.task.timeout > and see the log message showing the larger value but the task continues to > timeout after only 5 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-4818: --- Labels: usability (was: ) > Easier identification of tasks that timeout during localization > --- > > Key: MAPREDUCE-4818 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 0.23.3, 2.0.3-alpha >Reporter: Jason Lowe > Labels: usability > > When a task is taking too long to localize and is killed by the AM due to > task timeout, the job UI/history is not very helpful. The attempt simply > lists a diagnostic stating it was killed due to timeout, but there are no > logs for the attempt since it never actually got started. There are log > messages on the NM that show the container never made it past localization by > the time it was killed, but users often do not have access to those logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4794) DefaultSpeculator generates error messages on normal shutdown
[ https://issues.apache.org/jira/browse/MAPREDUCE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-4794: --- Labels: usability (was: ) > DefaultSpeculator generates error messages on normal shutdown > - > > Key: MAPREDUCE-4794 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4794 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe > Labels: usability > Attachments: MAPREDUCE-4794.patch > > > DefaultSpeculator can log the following error message on a normal shutdown of > the ApplicationMaster: > {noformat} > 2012-11-13 01:35:31,841 ERROR [DefaultSpeculator background processing] > org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: Background > thread returning, interrupted : java.lang.InterruptedException > {noformat} > and in addition for some reason it logs the corresponding backtrace to stdout. > Like the errors fixed in MAPREDUCE-4741, this error message in the syslog and > backtrace on stdout can be confusing to users as to whether the job really > succeeded. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4997) Deprecate mapreduce.jobtracker.address
[ https://issues.apache.org/jira/browse/MAPREDUCE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576013#comment-13576013 ] Hitesh Shah commented on MAPREDUCE-4997: For users to transition from MR1 to MR2, we are talking about a full cluster change - replacing the JT/TTs with new daemons - RM/NMs. In this scenario, the users would be well aware of the change and therefore have to make the necessary config changes too. Therefore, it seems like not supporting mapreduce.jobtracker.address would be more ideal so as to not give them the wrong impression that the RM is a JT replacement. > Deprecate mapreduce.jobtracker.address > -- > > Key: MAPREDUCE-4997 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4997 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > mapreduce.jobtracker.address currently is not used, but users transitioning > from mr1 to mr2 may expect their previous job configs to work, so it should > be deprecated in favor of yarn.resourcemanager.address. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4994) -jt generic command line option does not work
[ https://issues.apache.org/jira/browse/MAPREDUCE-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575341#comment-13575341 ] Hitesh Shah commented on MAPREDUCE-4994: Also, is there any reason to make the hadoop command-line be YARN and resourcemanager-aware? Ignoring what was supported in earlier versions, for the future, would it more preferable to have the local runner option be part of say a mapred command line option? > -jt generic command line option does not work > - > > Key: MAPREDUCE-4994 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4994 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-4994-1.patch, MAPREDUCE-4994.patch > > > hadoop jar myjar.jar MyDriver -fs file:/// -jt local input.txt output/ > should run a job using the local file system and the local job runner. > Instead it tries to connect to a jobtracker. > hadoop jar myjar.jar MyDriver -fs file:/// -jt host:port input.txt output/ > does not use the given host/port > This appears to be because Cluster#initialize, which loads the > ClientProtocol, contains no special handling for mapred.job.tracker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4994) -jt generic command line option does not work
[ https://issues.apache.org/jira/browse/MAPREDUCE-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575340#comment-13575340 ] Hitesh Shah commented on MAPREDUCE-4994: It makes sense to remove -jt as there is no notion of jobtracker anywhere in 2.x > -jt generic command line option does not work > - > > Key: MAPREDUCE-4994 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4994 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-4994-1.patch, MAPREDUCE-4994.patch > > > hadoop jar myjar.jar MyDriver -fs file:/// -jt local input.txt output/ > should run a job using the local file system and the local job runner. > Instead it tries to connect to a jobtracker. > hadoop jar myjar.jar MyDriver -fs file:/// -jt host:port input.txt output/ > does not use the given host/port > This appears to be because Cluster#initialize, which loads the > ClientProtocol, contains no special handling for mapred.job.tracker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4143) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/MAPREDUCE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572070#comment-13572070 ] Hitesh Shah commented on MAPREDUCE-4143: Seems like a reasonable feature to have with a slight caveat that the retry limit should be bounded by the limit configured on the RM. A client should not be able to set retry limit to 1000 for example. > ApplicationMaster retry times should be set by Client > - > > Key: MAPREDUCE-4143 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4143 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.1 > Environment: suse >Reporter: xieguiming >Priority: Minor > > We should support that different client or user have different > ApplicationMaster retry times. It also say that > "yarn.resourcemanager.am.max-retries" should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4837) Add webservices for jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-4837: --- Resolution: Fixed Fix Version/s: 1.2.0 Status: Resolved (was: Patch Available) Thanks Arun. Committed to branch-1. > Add webservices for jobtracker > -- > > Key: MAPREDUCE-4837 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4837 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Fix For: 1.2.0 > > Attachments: MAPREDUCE-4837.patch > > > Add MR-AM web-services to branch-1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4837) Add webservices for jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-4837: --- Summary: Add webservices for jobtracker (was: Add MR-AM web-services to branch-1) > Add webservices for jobtracker > -- > > Key: MAPREDUCE-4837 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4837 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Attachments: MAPREDUCE-4837.patch > > > Add MR-AM web-services to branch-1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4837) Add MR-AM web-services to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564701#comment-13564701 ] Hitesh Shah commented on MAPREDUCE-4837: Code changes seem straight-forward and look fine. Applied patch and verified "format=json"-based calls manually against branch 1. +1 assuming output of test-patch on branch-1 does not throw up any issues. > Add MR-AM web-services to branch-1 > -- > > Key: MAPREDUCE-4837 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4837 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Attachments: MAPREDUCE-4837.patch > > > Add MR-AM web-services to branch-1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560174#comment-13560174 ] Hitesh Shah commented on MAPREDUCE-4951: @Jason, having the RM ask the AM to kill the container in case of preemption would likely not work as the AM cannot be trusted. Obviously, there could be a different approach where the RM informs the AM that a particular container will be preempted soon but the RM eventually would need to trigger a kill for that container after a certain delay if it is still up. > Container preemption interpreted as task failure > > > Key: MAPREDUCE-4951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mr-am, mrv2 >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch > > > When YARN reports a completed container to the MR AM, it always interprets it > as a failure. This can lead to a job failing because too many of its tasks > failed, when in fact they only failed because the scheduler preempted them. > MR needs to recognize the special exit code value of -100 and interpret it as > a container being killed instead of a container failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4508) YARN needs to properly check the NM,AM memory properties in yarn-site.xml and mapred.xml and report errors accordingly.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443685#comment-13443685 ] Hitesh Shah commented on MAPREDUCE-4508: File MAPREDUCE-4508 for the issue mentioned in the previous comment. > YARN needs to properly check the NM,AM memory properties in yarn-site.xml and > mapred.xml and report errors accordingly. > --- > > Key: MAPREDUCE-4508 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4508 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.0-alpha > Environment: CentOs6.0, Hadoop2.0.0 Alpha >Reporter: Anil Gupta > Labels: Map, Reduce, YARN > > Please refer to this discussion on the Hadoop Mailing list: > http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/33110 > Summary: > I was running YARN(Hadoop2.0.0 Alpha) on a 8 datanode, 4 admin node > Hadoop/HBase cluster. My datanodes were only having 3.2GB of memory. So, i > configured the yarn.nodemanager.resource.memory-mb property in yarn-site.xml > to 1200. After setting the property if i run any Yarn Job then the > NodemManager wont be able to start any Map task since by default the > yarn.app.mapreduce.am.resource.mb property is set to 1500 MB in > mapred-site.xml. > Expected Behavior: NodeManager should give an error if > yarn.app.mapreduce.am.resource.mb >= yarn.nodemanager.resource.memory-mb. > Please let me know if more information is required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4578) Handle container requests that request more resources than available in the cluster
Hitesh Shah created MAPREDUCE-4578: -- Summary: Handle container requests that request more resources than available in the cluster Key: MAPREDUCE-4578 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4578 Project: Hadoop Map/Reduce Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.0-alpha, 0.23.0 Reporter: Hitesh Shah In heterogenous clusters, a simple check at the scheduler to check if the allocation request is within the max allocatable range is not enough. If there are large nodes in the cluster which are not available, there may be situations where some allocation requests will never be fulfilled. Need an approach to decide when to invalidate such requests. For application submissions, there will need to be a feedback loop for applications that could not be launched. For running AMs, AllocationResponse may need to augmented with information for invalidated/cancelled container requests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4508) YARN needs to properly check the NM,AM memory properties in yarn-site.xml and mapred.xml and report errors accordingly.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439943#comment-13439943 ] Hitesh Shah commented on MAPREDUCE-4508: Sorry for the late reply. I dont believe that an error should be thrown when the AM requested memory is greater than the NM memory. I believe this is more of a configuration bug where the scheduler max allocation should be set such that an error is thrown for any AM requesting more than that. The RM should error out if the max scheduler allocation for a single container is less than the resources required to launch a new AM. Please let me know if you have seen something contrary to this. However, depending on how the scheduler max allocation is configured, there will be situations in heterogenous clusters where certain nodes may be down creating holes causing requests for large amount of resources/memory to wait for an indefinite amount of time. This is something which needs to be addressed separately and is a bit more tricky in terms of when to decide whether the allocation request cannot be fulfilled ( both from a new AM perspective or container requests by an AM ). I will file a separate jira for that. > YARN needs to properly check the NM,AM memory properties in yarn-site.xml and > mapred.xml and report errors accordingly. > --- > > Key: MAPREDUCE-4508 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4508 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.0-alpha > Environment: CentOs6.0, Hadoop2.0.0 Alpha >Reporter: Anil Gupta > Labels: Map, Reduce, YARN > > Please refer to this discussion on the Hadoop Mailing list: > http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/33110 > Summary: > I was running YARN(Hadoop2.0.0 Alpha) on a 8 datanode, 4 admin node > Hadoop/HBase cluster. My datanodes were only having 3.2GB of memory. So, i > configured the yarn.nodemanager.resource.memory-mb property in yarn-site.xml > to 1200. After setting the property if i run any Yarn Job then the > NodemManager wont be able to start any Map task since by default the > yarn.app.mapreduce.am.resource.mb property is set to 1500 MB in > mapred-site.xml. > Expected Behavior: NodeManager should give an error if > yarn.app.mapreduce.am.resource.mb >= yarn.nodemanager.resource.memory-mb. > Please let me know if more information is required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4508) YARN needs to properly check the NM,AM memory properties in yarn-site.xml and mapred.xml and report errors accordingly.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427751#comment-13427751 ] Hitesh Shah commented on MAPREDUCE-4508: Seems like a dup of MAPREDUCE-3796 > YARN needs to properly check the NM,AM memory properties in yarn-site.xml and > mapred.xml and report errors accordingly. > --- > > Key: MAPREDUCE-4508 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4508 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.0-alpha > Environment: CentOs6.0, Hadoop2.0.0 Alpha >Reporter: Anil Gupta > Labels: Map, Reduce, YARN > > Please refer to this discussion on the Hadoop Mailing list: > http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/33110 > Summary: > I was running YARN(Hadoop2.0.0 Alpha) on a 8 datanode, 4 admin node > Hadoop/HBase cluster. My datanodes were only having 3.2GB of memory. So, i > configured the yarn.nodemanager.resource.memory-mb property in yarn-site.xml > to 1200. After setting the property if i run any Yarn Job then the > NodemManager wont be able to start any Map task since by default the > yarn.app.mapreduce.am.resource.mb property is set to 1500 MB in > mapred-site.xml. > Expected Behavior: NodeManager should give an error if > yarn.app.mapreduce.am.resource.mb >= yarn.nodemanager.resource.memory-mb. > Please let me know if more information is required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2719) MR-279: Write a shell command application
[ https://issues.apache.org/jira/browse/MAPREDUCE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-2719: --- Status: Open (was: Patch Available) Cancelling patch as the unit test should fail until 3067 is addressed. > MR-279: Write a shell command application > - > > Key: MAPREDUCE-2719 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2719 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv2 >Reporter: Sharad Agarwal >Assignee: Hitesh Shah > Fix For: 0.23.0 > > Attachments: MR-2179.1.patch, MR-2179.2.patch, MR-2179.3.patch, > mr-2719.wip.patch > > > With nextgen hadoop (mrv2), it is simple to write non-MR applications. Write > an AplicationMaster (also corresponding simple client), to submit and run a > shell command application in the cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2719) MR-279: Write a shell command application
[ https://issues.apache.org/jira/browse/MAPREDUCE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114388#comment-13114388 ] Hitesh Shah commented on MAPREDUCE-2719: Additional javac warnings due to: [WARNING] [WARNING] Some problems were encountered while building the effective model for org.apache.hadoop:hadoop-yarn-applications-distributedshell:jar:0.24.0-SNAPSHOT [WARNING] 'build.plugins.plugin.version' for org.apache.rat:apache-rat-plugin is missing. @ org.apache.hadoop:hadoop-yarn:${yarn.version}, /Users/Hitesh/dev/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/pom.xml, line 389, column 15 [WARNING] [WARNING] Some problems were encountered while building the effective model for org.apache.hadoop:hadoop-yarn-applications:pom:0.24.0-SNAPSHOT [WARNING] 'build.plugins.plugin.version' for org.apache.rat:apache-rat-plugin is missing. @ org.apache.hadoop:hadoop-yarn:${yarn.version}, /Users/Hitesh/dev/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/pom.xml, line 389, column 15 > MR-279: Write a shell command application > - > > Key: MAPREDUCE-2719 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2719 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv2 >Reporter: Sharad Agarwal >Assignee: Hitesh Shah > Fix For: 0.23.0 > > Attachments: MR-2179.1.patch, MR-2179.2.patch, MR-2179.3.patch, > mr-2719.wip.patch > > > With nextgen hadoop (mrv2), it is simple to write non-MR applications. Write > an AplicationMaster (also corresponding simple client), to submit and run a > shell command application in the cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2719) MR-279: Write a shell command application
[ https://issues.apache.org/jira/browse/MAPREDUCE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-2719: --- Status: Patch Available (was: Open) Local build still shows some javadoc warnings: [WARNING] hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationToken.java:33: warning - Tag @link: reference not found: HttpServletRequest - these are not addressed in the patch. > MR-279: Write a shell command application > - > > Key: MAPREDUCE-2719 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2719 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv2 >Reporter: Sharad Agarwal >Assignee: Hitesh Shah > Fix For: 0.23.0 > > Attachments: MR-2179.1.patch, MR-2179.2.patch, MR-2179.3.patch, > mr-2719.wip.patch > > > With nextgen hadoop (mrv2), it is simple to write non-MR applications. Write > an AplicationMaster (also corresponding simple client), to submit and run a > shell command application in the cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2719) MR-279: Write a shell command application
[ https://issues.apache.org/jira/browse/MAPREDUCE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-2719: --- Attachment: MR-2179.3.patch Addressed patch warnings reported by automated build. Changed resource allocation to a higher number as the container was getting killed by the monitoring layer causing unit test to fail. > MR-279: Write a shell command application > - > > Key: MAPREDUCE-2719 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2719 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv2 >Reporter: Sharad Agarwal >Assignee: Hitesh Shah > Fix For: 0.23.0 > > Attachments: MR-2179.1.patch, MR-2179.2.patch, MR-2179.3.patch, > mr-2719.wip.patch > > > With nextgen hadoop (mrv2), it is simple to write non-MR applications. Write > an AplicationMaster (also corresponding simple client), to submit and run a > shell command application in the cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2719) MR-279: Write a shell command application
[ https://issues.apache.org/jira/browse/MAPREDUCE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-2719: --- Status: Open (was: Patch Available) > MR-279: Write a shell command application > - > > Key: MAPREDUCE-2719 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2719 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv2 >Reporter: Sharad Agarwal >Assignee: Hitesh Shah > Fix For: 0.23.0 > > Attachments: MR-2179.1.patch, MR-2179.2.patch, mr-2719.wip.patch > > > With nextgen hadoop (mrv2), it is simple to write non-MR applications. Write > an AplicationMaster (also corresponding simple client), to submit and run a > shell command application in the cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2719) MR-279: Write a shell command application
[ https://issues.apache.org/jira/browse/MAPREDUCE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-2719: --- Attachment: MR-2179.2.patch Updated patch with a very simple integration test that deploys and runs the ds app master on the miniyarncluster. > MR-279: Write a shell command application > - > > Key: MAPREDUCE-2719 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2719 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv2 >Reporter: Sharad Agarwal >Assignee: Hitesh Shah > Attachments: MR-2179.1.patch, MR-2179.2.patch, mr-2719.wip.patch > > > With nextgen hadoop (mrv2), it is simple to write non-MR applications. Write > an AplicationMaster (also corresponding simple client), to submit and run a > shell command application in the cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2719) MR-279: Write a shell command application
[ https://issues.apache.org/jira/browse/MAPREDUCE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-2719: --- Status: Patch Available (was: Open) > MR-279: Write a shell command application > - > > Key: MAPREDUCE-2719 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2719 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv2 >Reporter: Sharad Agarwal >Assignee: Hitesh Shah > Attachments: MR-2179.1.patch, MR-2179.2.patch, mr-2719.wip.patch > > > With nextgen hadoop (mrv2), it is simple to write non-MR applications. Write > an AplicationMaster (also corresponding simple client), to submit and run a > shell command application in the cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2719) MR-279: Write a shell command application
[ https://issues.apache.org/jira/browse/MAPREDUCE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-2719: --- Attachment: MR-2179.1.patch Attaching code with relevant pom files to create a new module. Current structure is hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/ > MR-279: Write a shell command application > - > > Key: MAPREDUCE-2719 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2719 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv2 >Reporter: Sharad Agarwal >Assignee: Hitesh Shah > Attachments: MR-2179.1.patch, mr-2719.wip.patch > > > With nextgen hadoop (mrv2), it is simple to write non-MR applications. Write > an AplicationMaster (also corresponding simple client), to submit and run a > shell command application in the cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2719) MR-279: Write a shell command application
[ https://issues.apache.org/jira/browse/MAPREDUCE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113067#comment-13113067 ] Hitesh Shah commented on MAPREDUCE-2719: Tests still pending. > MR-279: Write a shell command application > - > > Key: MAPREDUCE-2719 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2719 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv2 >Reporter: Sharad Agarwal >Assignee: Hitesh Shah > Attachments: MR-2179.1.patch, mr-2719.wip.patch > > > With nextgen hadoop (mrv2), it is simple to write non-MR applications. Write > an AplicationMaster (also corresponding simple client), to submit and run a > shell command application in the cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3067) Container exit status not set properly to launched process's exit code on successful completion of process
[ https://issues.apache.org/jira/browse/MAPREDUCE-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112349#comment-13112349 ] Hitesh Shah commented on MAPREDUCE-3067: Ate a couple of words in that statement. The code in RMContainerAllocator currently keeps a count of completed maps and reduces but does not seem to check the exit status. For the sake of documentation, it would be good if you could clarify as to why the exit status does not need to be checked for map/reduce task containers. > Container exit status not set properly to launched process's exit code on > successful completion of process > -- > > Key: MAPREDUCE-3067 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3067 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.0 >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Fix For: 0.23.0 > > > When testing the distributed shell sample app master, the container exit > status was being returned incorrectly. > 11/09/21 11:32:58 INFO DistributedShell.ApplicationMaster: Got container > status for containerID= container_1316629955324_0001_01_02, > state=COMPLETE, exitStatus=-1000, diagnostics= -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3067) Container exit status not set properly to launched process's exit code on successful completion of process
[ https://issues.apache.org/jira/browse/MAPREDUCE-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112331#comment-13112331 ] Hitesh Shah commented on MAPREDUCE-3067: Second aspect to this is the exit status is checked on completion of map or reduce tasks. > Container exit status not set properly to launched process's exit code on > successful completion of process > -- > > Key: MAPREDUCE-3067 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3067 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.0 >Reporter: Hitesh Shah > Fix For: 0.23.0 > > > When testing the distributed shell sample app master, the container exit > status was being returned incorrectly. > 11/09/21 11:32:58 INFO DistributedShell.ApplicationMaster: Got container > status for containerID= container_1316629955324_0001_01_02, > state=COMPLETE, exitStatus=-1000, diagnostics= -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3067) Container exit status not set properly to launched process's exit code on successful completion of process
[ https://issues.apache.org/jira/browse/MAPREDUCE-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112329#comment-13112329 ] Hitesh Shah commented on MAPREDUCE-3067: Possible patch for addressing part of the issue. --- a/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java +++ b/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java @@ -554,6 +554,9 @@ public class ContainerImpl implements Container { static class ExitedWithSuccessTransition extends ContainerTransition { @Override public void transition(ContainerImpl container, ContainerEvent event) { + // Set exit code to 0 to denote success + container.exitCode = 0; + // TODO: Add containerWorkDir to the deletion service. // Inform the localizer to decrement reference counts and cleanup > Container exit status not set properly to launched process's exit code on > successful completion of process > -- > > Key: MAPREDUCE-3067 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3067 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.0 >Reporter: Hitesh Shah > Fix For: 0.23.0 > > > When testing the distributed shell sample app master, the container exit > status was being returned incorrectly. > 11/09/21 11:32:58 INFO DistributedShell.ApplicationMaster: Got container > status for containerID= container_1316629955324_0001_01_02, > state=COMPLETE, exitStatus=-1000, diagnostics= -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira