Re: user notification upon application error

2019-12-04 Thread Prabhu Josephraj
MapReduce Application can be configured to notify the status on completion
through mapreduce.job.end-notification.url.
Need to write a webservice to collect the status and send email to users.
Below has examples

https://hadoopi.wordpress.com/2013/09/18/hadoop-get-a-callback-on-mapreduce-job-completion/
https://amalgjose.com/2015/07/21/notification-on-completion-of-mapreduce-jobs/

Other applications Spark, TEZ does not have similar way. YARN also does not
have a generic way to notify.
Will suggest you to write a service which checks job status periodically
using YARN REST Api and notify
users with status and application logs. The interval should be higher else
it will overload YARN.

https://community.cloudera.com/t5/Support-Questions/Hi-I-have-a-query-in-Spark-How-to-set-up-email-notification/td-p/117924






On Thu, Dec 5, 2019 at 10:07 AM Manuel Sopena Ballesteros <
manuel...@garvan.org.au> wrote:

> Dear Hadoop community,
>
>
>
> I have a Hadoop cluster with yarn, spark and hdfs. Users don’t have access
> to the yarn web interface and I would like to as what options I have to
> notify users when a job fails and the error message?
>
>
>
> Is there anything I can setup for that?
>
>
>
> Thank you very much
>
>
>
> Manuel Sopena Ballesteros
>
> Big Data Engineer | Kinghorn Centre for Clinical Genomics
>
>  [image: cid:image001.png@01D4C835.ED3C2230] 
>
>
> *a:* 384 Victoria Street, Darlinghurst NSW 2010
> *p:* +61 2 9355 5760  |  +61 4 12 123 123
> *e:* manuel...@garvan.org.au
>
> Like us on Facebook  | Follow us
> on Twitter  and LinkedIn
> 
>
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


Re: how to list yarn applications by creation time and filter by username?

2019-11-27 Thread Prabhu Josephraj
The startTime field from ApplicationReport is the application creation
time. The CLI yarn application -list does not print the application
startTime.
But we can use YARN REST Api "http://:/ws/v1/cluster/apps"
which provides the startTime.

On Thu, Nov 28, 2019 at 5:10 AM Manuel Sopena Ballesteros <
manuel...@garvan.org.au> wrote:

> Thanks Prabhu,
>
>
>
> Do you know which yarn command can I use in order to get application
> creation time?
>
>
>
> Thank you
>
>
>
> Manuel
>
>
>
> *From:* Prabhu Josephraj [mailto:pjos...@cloudera.com]
> *Sent:* Wednesday, November 27, 2019 8:09 PM
> *To:* Manuel Sopena Ballesteros
> *Cc:* user@hadoop.apache.org
> *Subject:* Re: how to list yarn applications by creation time and filter
> by username?
>
>
>
> Yarn CLI does not do that, i think u need to write a script which does
> that on top of the output provided by YARN CLI.
>
>
>
> On Wed, Nov 27, 2019 at 9:19 AM Manuel Sopena Ballesteros <
> manuel...@garvan.org.au> wrote:
>
> Dear Hadoop community,
>
>
>
> I am learning yarn and would like to find an application id based on
> creation time.
>
>
>
> This is what I am trying to do:
>
>
>
> yarn app -list -appStates KILLED,FAILED
>
>
>
> But I would like to show the creation time on the output and also be able
> to sort by creation time and if possible filter out all applications that
> does not belongs to me
>
>
>
> I know that the web interface can do what I am after but I am more keen on
> working with command line.
>
>
>
> Any advice?
>
>
>
> Thank you
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


Re: how to list yarn applications by creation time and filter by username?

2019-11-27 Thread Prabhu Josephraj
Yarn CLI does not do that, i think u need to write a script which does that
on top of the output provided by YARN CLI.

On Wed, Nov 27, 2019 at 9:19 AM Manuel Sopena Ballesteros <
manuel...@garvan.org.au> wrote:

> Dear Hadoop community,
>
>
>
> I am learning yarn and would like to find an application id based on
> creation time.
>
>
>
> This is what I am trying to do:
>
>
>
> yarn app -list -appStates KILLED,FAILED
>
>
>
> But I would like to show the creation time on the output and also be able
> to sort by creation time and if possible filter out all applications that
> does not belongs to me
>
>
>
> I know that the web interface can do what I am after but I am more keen on
> working with command line.
>
>
>
> Any advice?
>
>
>
> Thank you
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


Re: Can't Change Retention Period for YARN Log Aggregation

2019-11-21 Thread Prabhu Josephraj
The deletion service runs as part of MapReduce JobHistoryServer. Can you
try restarting it?

On Fri, Nov 22, 2019 at 3:42 AM David M  wrote:

> All,
>
>
>
> I have an HDP 2.6.1 cluster where we’ve had
> yarn.log-aggregation.retain-seconds set to 30 days for a while, and
> everything was working properly. Four days ago we changed the property to
> 15 days instead and restarted the services. The check interval is set to
> the default, so we expected within 1.5 days, we’d see the logs older than
> 15 days deleted.
>
>
>
> For some reason, we are still seeing 30 days of logs kept. The other
> properties all seem to be set properly. The only weird setting I can find
> is that we are using the LogAggregationIndexedFileController as our primary
> file controller class. The LogAggregationTFileController is still available
> as the second in the list.
>
>
>
> I found YARN-8279 (https://issues.apache.org/jira/browse/YARN-8279),
> which seems sort of related, except that we are still seeing logs being put
> into the right suffix folder, and it still seems to be deleting logs older
> than 30 days. It just doesn’t seem to have updated to 15 days as the cutoff
> instead.
>
>
>
> I’ve looked in the logs for the Resource Manager, Timeline Server, and one
> of the Name Nodes, and nothing that would explain this has popped up. Any
> ideas where to go to figure out what is happening? Additionally, can
> someone confirm in which process the deletion service actually runs? Is it
> the resource manager, timeline server, or something else?
>
>
>
> Thanks!
>
>
>
> David McGinnis
>
>
>


Re: Exception in thread "main" org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized field "Token" (Class org.apache.hadoop.yarn.api.records.timeline.TimelineDelegationTokenResponse

2019-10-17 Thread Prabhu Josephraj
Suspect the TimelineClient and ApplicationHistoryServer are using different
hadoop libraries. Can you make sure the client uses the same hadoop jars
and
dependency jars as the ApplicationHistoryServer process. Simple workaround
is to disable timeline service for this job.

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
*-Dyarn.timeline-service.enabled=false* -Dimporttsv.separator=,
-Dimporttsv.columns="HBASE_ROW_KEY,
id,temp:in,temp:out,vibration,pressure:in,pressure:out"
sensor / Tmp/hbase.csv


On Thu, Oct 17, 2019 at 6:36 PM Alex Wang  wrote:

> Hello everyone:
> Our hadoop , hbase cluster has Kerberos authentication enabled.
> The hadoop version is 2.7.3 and the hbase version is 1.3.5.
>
> 1. Kinit initializes the ticket.
> Ticket cache: FILE:/tmp/krb5cc_
> Default principal: myu...@xx.com
>
> Valid starting Expires Service principal
> 10/17/2019 18:00:38 10/18/2019 18:00:38 krbtgt/xx@xx.com
> Renew until 10/24/2019 18:00:38
>
> 2. hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
> -Dimporttsv.separator=, -Dimporttsv.columns="HBASE_ROW_KEY,
> id,temp:in,temp:out,vibration,pressure:in,pressure:out" sensor /
> Tmp/hbase.csv
>
> The error is as follows. Can someone give me some advice?
>
> Exception in thread "main"
> org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized
> field "Token" (Class
> org.apache.hadoop.yarn.api.records.timeline.TimelineDelegationTokenResponse),
> not marked as ignorable
> At [Source: N/A; line: -1, column: -1] (through reference chain:
> org.apache.hadoop.yarn.api.records.timeline.TimelineDelegationTokenResponse["Token"])
> At
> org.codehaus.jackson.map.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:53)
> At
> org.codehaus.jackson.map.deser.StdDeserializationContext.unknownFieldException(StdDeserializationContext.java:267)
> At
> org.codehaus.jackson.map.deser.std.StdDeserializer.reportUnknownProperty(StdDeserializer.java:673)
> At
> org.codehaus.jackson.map.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:659)
> At
> org.codehaus.jackson.map.deser.BeanDeserializer.handleUnknownProperty(BeanDeserializer.java:1365)
> At
> org.codehaus.jackson.map.deser.BeanDeserializer._handleUnknown(BeanDeserializer.java:725)
> At
> org.codehaus.jackson.map.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:703)
> At
> org.codehaus.jackson.map.deser.BeanDeserializer.deserialize(BeanDeserializer.java:580)
> At org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2704)
> At org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1999)
> At
> org.apache.hadoop.yarn.client.api.impl.TimelineAuthenticator.validateAndParseResponse(TimelineAuthenticator.java:222)
> At
> org.apache.hadoop.yarn.client.api.impl.TimelineAuthenticator.getDelegationToken(TimelineAuthenticator.java:114)
> At
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:167)
> At
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:275)
> At
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:221)
> At
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:282)
> At org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:289)
> At
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> At org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> At org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> At java.security.AccessController.doPrivileged(Native Method)
> At javax.security.auth.Subject.doAs(Subject.java:422)
> At
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> At org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> At org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> At org.apache.hadoop.hbase.mapreduce.ImportTsv.run(ImportTsv.java:782)
> At org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> At org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> At org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:794)
>
>
> --
> Best
>


Re: can't start spark thrift after Configuring YARN container executor

2019-10-10 Thread Prabhu Josephraj
As per the error, spark user does not have permission to create directory
under NodeManager Local Directory or the existing spark user directory is
with stale uid or gid.

*Permission denied Can't create directory
/d1/hadoop/yarn/local/usercache/spark/appcache/application_1570681803028_0018*

1. Check if spark user is able to create directory under NM local dir.
2. Remove /d1/hadoop/yarn/local/usercache/spark from all NMs and rerun the
job.


On Thu, Oct 10, 2019 at 12:13 PM Manuel Sopena Ballesteros <
manuel...@garvan.org.au> wrote:

> Dear Hadoop community,
>
>
>
> I am trying to configure yarn container executor following this document
> https://www.ibm.com/support/knowledgecenter/en/SSPT3X_4.2.5/com.ibm.swg.im.infosphere.biginsights.install.doc/doc/inst_adv_yarn_config.html
>
>
>
> I follow all the steps but after restart YARN I can’t start spark thrift
> server.
>
>
>
> This is the error I can see in yarn
>
>
>
> Application application_1570681803028_0018 failed 1 times (global limit
> =2; local limit is =1) due to AM Container for
> appattempt_1570681803028_0018_01 exited with exitCode: -1000 Failing
> this attempt.Diagnostics: [2019-10-10 16:49:35.322]Application
> application_1570681803028_0018 initialization failed (exitCode=255) with
> output: main : command provided 0 main : run as user is spark main :
> requested yarn user is spark Can't create directory
> /d0/hadoop/yarn/local/usercache/spark/appcache/application_1570681803028_0018
> - Permission denied Can't create directory
> /d1/hadoop/yarn/local/usercache/spark/appcache/application_1570681803028_0018
> - Permission denied Did not create any app directories For more detailed
> output, check the application tracking page:
> http://gl-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1570681803028_0018
> Then click on links to logs of each attempt. . Failing the application.
>
>
>
>
>
> This is the content of container-executor.cfg
>
>
>
> [luffy@gl-hdp-ctrl01-mlx ~]$ cat
> /etc/hadoop/3.1.0.0-78/0/container-executor.cfg
>
>
>
>
>
> #/*
>
> # * Licensed to the Apache Software Foundation (ASF) under one
>
> # * or more contributor license agreements.  See the NOTICE file
>
> # * distributed with this work for additional information
>
> # * regarding copyright ownership.  The ASF licenses this file
>
> # * to you under the Apache License, Version 2.0 (the
>
> # * "License"); you may not use this file except in compliance
>
> # * with the License.  You may obtain a copy of the License at
>
> # *
>
> # * http://www.apache.org/licenses/LICENSE-2.0
>
> # *
>
> # * Unless required by applicable law or agreed to in writing, software
>
> # * distributed under the License is distributed on an "AS IS" BASIS,
>
> # * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
>
> # * See the License for the specific language governing permissions and
>
> # * limitations under the License.
>
> # */
>
> yarn.nodemanager.local-dirs=/d0/hadoop/yarn/local,/d1/hadoop/yarn/local
>
> yarn.nodemanager.log-dirs=/d0/hadoop/yarn/log,/d1/hadoop/yarn/log
>
> yarn.nodemanager.linux-container-executor.group=hadoop
>
> banned.users=hdfs,yarn,mapred,bin
>
> # min.user.id=1000
>
> min.user.id=80
>
>
>
> [docker]
>
>   module.enabled=false
>
>   docker.binary=/usr/bin/docker
>
>
> docker.allowed.capabilities=CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE
>
>   docker.allowed.devices=
>
>   docker.allowed.networks=host,none,bridge
>
>   docker.allowed.ro-mounts=/d0/hadoop/yarn/local,/d1/hadoop/yarn/local,
>
>
> docker.allowed.rw-mounts=/d0/hadoop/yarn/local,/d1/hadoop/yarn/local,/d0/hadoop/yarn/log,/d1/hadoop/yarn/log,
>
>   docker.privileged-containers.enabled=false
>
>   docker.trusted.registries=
>
>   docker.allowed.volume-drivers=
>
>
>
> [gpu]
>
>   module.enabled=false
>
>
>
> [cgroups]
>
>   root=
>
>   yarn-hierarchy=
>
>
>
> I was hopping if someone could help me troubleshooting about what YARN is
> trying to do and how to fix this configuration issue?
>
>
>
> Thank you very much
>
>
>
> Manuel
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


Re: NegativeArraySizeException during map segment merge

2019-09-04 Thread Prabhu Josephraj
1. Looking at IFile$Reader#nextRawValue, not sure why we create valBytes
array of size 2 * currentValueLength even though it tries to read data of
currentValueLength size.
If there is no reason, this can be fixed which will fix the problem.

public void nextRawValue(DataInputBuffer value) throws IOException {
  final byte[] valBytes = (value.getData().length < currentValueLength)
? new byte[currentValueLength << 1]
: value.getData();
  int i = readData(valBytes, 0, currentValueLength);
  if (i != currentValueLength) {
throw new IOException ("Asked for " + currentValueLength + " Got: " + i);

  }

2. Also the stack trace shows mapreduce.job.map.output.collector.class is
set to MapRFsOutputBuffer which is used on top of MapRFS. Can you test the
same
job on Hdfs with MapOutputBuffer to isolate the issue.



On Thu, Sep 5, 2019 at 2:36 AM Stephen Durfey  wrote:

> I ran into an issue and am struggling to find a way around it. I have a
> job failing with the following output (version 2.7.0 of hadoop):
>
> 2019-09-04 13:20:30,026 DEBUG [main]
> org.apache.hadoop.mapred.MapRFsOutputBuffer:
> MapId=attempt_1567541971569_2612_m_003447_0 Reducer=133Spill
> =0(110526690,1099443164, 96132208)
> 2019-09-04 13:20:30,026 DEBUG [main]
> org.apache.hadoop.mapred.MapRFsOutputBuffer:
> MapId=attempt_1567541971569_2612_m_003447_0 Reducer=133Spill =1(4123,2, 31)
> 2019-09-04 13:20:30,026 INFO [main] org.apache.hadoop.mapred.Merger:
> Merging 2 sorted segments
> 2019-09-04 13:20:30,026 DEBUG [main] com.mapr.fs.jni.MapRClient: Open:
> path =
> /var/.../mapred/nodeManager/spill/job_1567541971569_2612/attempt_1567541971569_2612_m_003447_0/spill0.out
> 2019-09-04 13:20:30,039 DEBUG [main] com.mapr.fs.Inode: Inode Open
> file:
> /var/.../mapred/nodeManager/spill/job_1567541971569_2612/attempt_1567541971569_2612_m_003447_0/spill0.out,
> size: 243373048, chunkSize: 268435456, fid: 315986.88557.30377956
> 2019-09-04 13:20:30,057 DEBUG [main]
> org.apache.hadoop.io.compress.CodecPool: Got recycled decompressor
> 2019-09-04 13:20:30,058 DEBUG [main] com.mapr.fs.jni.MapRClient: Open:
> path =
> /var/.../mapred/nodeManager/spill/job_1567541971569_2612/attempt_1567541971569_2612_m_003447_0/spill1.out
> 2019-09-04 13:20:30,058 DEBUG [main] com.mapr.fs.Inode: Inode Open
> file:
> /var/.../mapred/nodeManager/spill/job_1567541971569_2612/attempt_1567541971569_2612_m_003447_0/spill1.out,
> size: 69143436, chunkSize: 268435456, fid: 315986.88362.30378046
> 2019-09-04 13:20:30,064 DEBUG [main]
> org.apache.hadoop.io.compress.CodecPool: Got recycled decompressor
> 2019-09-04 13:20:30,064 INFO [main] org.apache.hadoop.mapred.Merger: Down
> to the last merge-pass, with 1 segments left of total size: 96132217 bytes
> 2019-09-04 13:20:30,065 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.lang.NegativeArraySizeException
> at org.apache.hadoop.mapred.IFile$Reader.nextRawValue(IFile.java:488)
> at org.apache.hadoop.mapred.Merger$Segment.nextRawValue(Merger.java:341)
> at org.apache.hadoop.mapred.Merger$Segment.getValue(Merger.java:323)
> at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:567)
> at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:209)
> at
> org.apache.hadoop.mapred.MapRFsOutputBuffer.mergeParts(MapRFsOutputBuffer.java:1403)
> at
> org.apache.hadoop.mapred.MapRFsOutputBuffer.flush(MapRFsOutputBuffer.java:1609)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:732)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:802)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
> Looking at "MapRFsOutputBuffer:
> MapId=attempt_1567541971569_2612_m_003447_0 Reducer=133Spill
> =0(110526690,1099443164, 96132208)", the value '1,099,443,164' is the raw
> length of the segment, during buffer allocation in
> IFile$Reader#nextRawValue, when that is bit shifted, it causes the integer
> overflow, and the exception I am seeing. At least, that is what it looks
> like. I'm not sure what tuning options are at my disposal to try to fix
> this issue, if any. I tried changing mapreduce.task.io.sort.mb to a small
> number (240mb), but that still resulted the same issue. Any
> help/suggestions would be appreciated :)
>
> - Stephen
>


Re: Docker container executor is failing

2019-08-30 Thread Prabhu Josephraj
Can you test with adding local into docker.trusted.registries in
container-executor.cfg.

Fyi
https://community.cloudera.com/t5/Support-Questions/Not-able-to-run-docker-container-on-yarn-even-after/m-p/224259

On Fri, Aug 30, 2019 at 2:07 PM Yen-Onn Hiu  wrote:

> hi all,
>
> I have a bash script testing the docker container executor, try to
> configure the distributedshell such like below. But keep having error as
> like below.
>
> Any helps please... Thanks!
>
>
> #!/bin/bash
> export HADOOP_HOME="/usr/hdp/3.1.0.0-78/hadoop"
> export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native"
> export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
> export HADOOP_COMMON_LIB_NATIVE_DIR="$HADOOP_HOME/lib/native"
> export JAVA_LIBRARY_PATH="$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH"
> export
> DSHELL_JAR="/usr/hdp/3.1.0.0-78/hadoop-yarn/hadoop-yarn-applications-distributedshell-3.2.0.jar"
> #export DOCKER_IMAGE="local/centos"
> export DOCKER_IMAGE="local/openjdk:8.1"
> export DSHELL_CMD="ls"
> export NUM_OF_CONTAINERS=1
>
> yarn --loglevel DEBUG jar $DSHELL_JAR \
> -shell_command $DSHELL_CMD \
> -jar $DSHELL_JAR \
> -shell_env YARN_CONTAINER_RUNTIME_TYPE="$RUNTIME" \
> -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE="$DOCKER_IMAGE" \
> -num_containers $NUM_OF_CONTAINERS
>
>
> 19/08/30 15:22:12 INFO distributedshell.ApplicationMaster: placementSpecs null
> 19/08/30 15:22:12 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ vCores:1>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution 
> Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 19/08/30 15:22:14 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, allocatedCnt=1
> 19/08/30 15:22:14 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_e101_1567140885858_0043_01_02, yarnShellId=1, 
> containerNode=hk-hdpoc-2001.agprod1.agoda.local:45454, 
> containerNodeURI=hk-hdpoc-2001.agprod1.agoda.local:8042, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 19/08/30 15:22:14 INFO distributedshell.ApplicationMaster: Setting up 
> container launch container for 
> containerid=container_e101_1567140885858_0043_01_02 with shellid=1
> 19/08/30 15:22:14 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_e101_1567140885858_0043_01_02
> 19/08/30 15:22:14 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> QUERY_CONTAINER for Container container_e101_1567140885858_0043_01_02
> 19/08/30 15:22:15 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, completedCnt=1
> 19/08/30 15:22:15 ERROR distributedshell.ApplicationMaster: 
> appattempt_1567140885858_0043_01 got container status for 
> containerID=container_e101_1567140885858_0043_01_02, state=COMPLETE, 
> exitStatus=127, diagnostics=[2019-08-30 15:22:15.671]Exception from 
> container-launch.
> Container id: container_e101_1567140885858_0043_01_02
> Exit code: 127
> Exception message: Launch container failed
> Shell output: main : command provided 4
> main : run as user is ambari-qa
> main : requested yarn user is ambari-qa
> 802b0a68c8332e819912e51eafc9527f382f48dbc91365bf5beb6ed54e14389c
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Inspecting docker container...
> Docker inspect command: /usr/bin/docker inspect --format {{.State.Pid}} 
> container_e101_1567140885858_0043_01_02
> pid from docker inspect: 0
> Obtaining the exit code...
> Docker inspect command: /usr/bin/docker inspect --format {{.State.ExitCode}} 
> container_e101_1567140885858_0043_01_02
> Exit code from docker inspect: 127
> Wrote the exit code 127 to 
> /hadoop/yarn/local/nmPrivate/application_1567140885858_0043/container_e101_1567140885858_0043_01_02/container_e101_1567140885858_0043_01_02.pid.exitcode
>
>
> [2019-08-30 15:22:15.672]Container exited with a non-zero exit code 127. Last 
> 4096 bytes of stderr.txt :
>
>
> [2019-08-30 15:22:15.673]Container exited with a non-zero exit code 127. Last 
> 4096 bytes of stderr.txt :
>
>
>
> 19/08/30 15:22:16 INFO distributedshell.ApplicationMaster: Application 
> completed. Stopping running containers
> 19/08/30 15:22:16 INFO distributedshell.ApplicationMaster: Application 
> completed. Signalling finished to RM
> 19/08/30 15:22:16 INFO impl.AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 19/08/30 15:22:16 ERROR distributedshell.ApplicationMaster: Application 
> Master failed. exiting
>
>
> --
> Hiu Yen Onn
>
>
>


Re: Could not find or load main class org.apache.hadoop.yarn.server.nodemanager.containermanager.loca lizer.ContainerLocalizer

2019-08-20 Thread Prabhu Josephraj
On Secure Cluster, The ContainerLocalizer JVM runs as job user. The below
issue happens when the job user
does not have access to the hadoop-yarn-server-nodemanager-.jar
present on hadoop classpath of
NodeManager machine.

Could not find or load main class org.apache.hadoop.yarn.server.
nodemanager.containermanager.loca lizer.ContainerLocalizer

Make sure the job user has access to
hadoop-yarn-server-nodemanager-.jar.

On Tue, Aug 20, 2019 at 3:40 PM alex noure  wrote:

> Hi all,
>
> When we configure kerberos to submit tasks, only the primary account can
> be submitted successfully, and other accounts cannot be submitted.
>
> Report Could not find or load main class
> org.apache.hadoop.yarn.server.nodemanager.containermanager.loca
> lizer.ContainerLocalizer
>
> My question is the same as this
> https://issues.apache.org/jira/browse/MAPREDUCE-7224 issue.
>
> I saw that my cluster environment variables are fine.
>
> How can I solve this problem?
>
>
>


Re: Set yarn.nodemanager.resource.memory-mb higher than node physical memory

2019-08-15 Thread Prabhu Josephraj
1. Easy way to reproduce container to exceed configured physical memory
limit is by configuring the Heap Size (500MB) of
container above the Container Size (100MB).

yarn-site.xml:  yarn.scheduler.minimum-allocation-mb 100
mapred-site.xml: yarn.app.mapreduce.am.resource.mb 100
yarn.app.mapreduce.am.command-opts -Xmx500m

Note: This is only for testing purpose. Usually the Heap Size has to be 80%
of Container Size.

2. There is no job settings which increase the memory usage of a container.
It depends on the application code.
Try adding memory intensive code inside the MapReduce application.

https://alvinalexander.com/blog/post/java/java-program-consume-all-memory-ram-on-computer
https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/QuasiMonteCarlo.java

Running pi job on a long number will also require huge memory.

yarn jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar pi 1
10

There are chances that the JVM Crashes with OutOfMemory before Yarn kills
the container for exceeding memory usage.


On Thu, Aug 15, 2019 at 10:51 PM . .  wrote:

>
> Prabhu,
>
> I reformulate my question:
>
> I successfully run following job: yarn jar
> /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar pi 3
> 10
>
> and noticed that highest node physical memory usage was alway <512MB
> during job duration; else job completed (see details below)
>
> quote
> Every 2.0s: yarn node -status hadoop-1.mydomain.local:44718
>
> 19/08/15 19:10:54 INFO client.RMProxy: Connecting to ResourceManager at
> hadoop-1.mydomain.local/192.168.100.11:8032
> Node Report :
> Node-Id : hadoop-1.mydomain.local:44718
> Rack : /default-rack
> Node-State : RUNNING
> Node-Http-Address : hadoop-1.mydomain.local:8042
> Last-Health-Update : Thu 15/Aug/19 07:10:22:75CEST
> Health-Report :
> Containers : 2
> Memory-Used : 2048MB
> Memory-Capacity : 5120MB
> CPU-Used : 2 vcores
> CPU-Capacity : 6 vcores
> Node-Labels :
> Resource Utilization by Node : *PMem:471 MB*, VMem:1413 MB,
> VCores:0.80463576
> Resource Utilization by Containers : PMem:110 MB, VMem:4014 MB,
> VCores:0.9735
> ...unquote
>
> My question is : which job setting may I use to force a node physical
> memory usage >512MB and force a job kill due (or thanks) pmem check.
> Hope above better explain my question ;)
>
> thanks/Guido
>
>
> On Thu, Aug 15, 2019 at 5:09 PM Prabhu Josephraj
>  wrote:
>
>> Jeff, Available node size for YARN is the value of
>> yarn.nodemanager.resource.memory-mb which is set ten times of 512MB.
>>
>> Guido, Did not get the below question, can you explain the same.
>>
>>Are you aware of any job syntax to tune the 'container physical
>> memory usage' to 'force' job kill/log?
>>
>>
>> On Thu, Aug 15, 2019 at 7:20 PM . . 
>> wrote:
>>
>>> Hi Prabhu,
>>>
>>> thanks for your explanation. It makes sense, but I wonder YARN allows
>>> you to define  'yarn.nodemanager.resource.memory-mb' higher then node
>>> physical memory w/out logging any entry under resourcemanager log.
>>>
>>> Are you aware of any job syntax to tune the 'container physical memory
>>> usage' to 'force' job kill/log?
>>>
>>> thanks/Guido
>>>
>>>
>>>
>>> On Thu, Aug 15, 2019 at 1:50 PM Prabhu Josephraj
>>>  wrote:
>>>
>>>> YARN allocates based on the configuration
>>>> (yarn.nodemanager.resource.memory-mb) user has configured. It has allocated
>>>> the AM Container of size 1536MB as it can fit in 5120MB Available Node
>>>> Size.
>>>>
>>>> yarn.nodemanager.pmem-check-enabled will kill the container if the
>>>> physical memory usage of the container process is above
>>>> 1536MB. MR ApplicationMaster for a pi job is light weight and it won't
>>>> require that much memory and so not got killed.
>>>>
>>>>
>>>>
>>>> On Thu, Aug 15, 2019 at 4:02 PM . . 
>>>> wrote:
>>>>
>>>>> Correct:  I set 'yarn.nodemanager.resource.memory-mb' ten times the
>>>>> node physical memory (512MB) and I was able to successfully execute a  'pi
>>>>> 1 10' mapreduce job.
>>>>>
>>>>> Since default 'yarn.app.mapreduce.am.resource.mb' value is 1536MB I
>>>>> expected the job to never start / be allocated and

Re: Set yarn.nodemanager.resource.memory-mb higher than node physical memory

2019-08-15 Thread Prabhu Josephraj
Jeff, Available node size for YARN is the value of
yarn.nodemanager.resource.memory-mb which is set ten times of 512MB.

Guido, Did not get the below question, can you explain the same.

   Are you aware of any job syntax to tune the 'container physical
memory usage' to 'force' job kill/log?


On Thu, Aug 15, 2019 at 7:20 PM . . 
wrote:

> Hi Prabhu,
>
> thanks for your explanation. It makes sense, but I wonder YARN allows you
> to define  'yarn.nodemanager.resource.memory-mb' higher then node physical
> memory w/out logging any entry under resourcemanager log.
>
> Are you aware of any job syntax to tune the 'container physical memory
> usage' to 'force' job kill/log?
>
> thanks/Guido
>
>
>
> On Thu, Aug 15, 2019 at 1:50 PM Prabhu Josephraj
>  wrote:
>
>> YARN allocates based on the configuration
>> (yarn.nodemanager.resource.memory-mb) user has configured. It has allocated
>> the AM Container of size 1536MB as it can fit in 5120MB Available Node
>> Size.
>>
>> yarn.nodemanager.pmem-check-enabled will kill the container if the
>> physical memory usage of the container process is above
>> 1536MB. MR ApplicationMaster for a pi job is light weight and it won't
>> require that much memory and so not got killed.
>>
>>
>>
>> On Thu, Aug 15, 2019 at 4:02 PM . . 
>> wrote:
>>
>>> Correct:  I set 'yarn.nodemanager.resource.memory-mb' ten times the node
>>> physical memory (512MB) and I was able to successfully execute a  'pi 1 10'
>>> mapreduce job.
>>>
>>> Since default 'yarn.app.mapreduce.am.resource.mb' value is 1536MB I
>>> expected the job to never start / be allocated and I have no valid
>>> explanation.
>>>
>>>
>>> On Wed, Aug 14, 2019 at 10:32 PM . .  wrote:
>>>
>>>> Correct:  I set 'yarn.nodemanager.resource.memory-mb' ten times the
>>>> node physical memory (512MB) and I was able to successfully execute a  'pi
>>>> 1 10' mapreduce job.
>>>>
>>>> Since default 'yarn.app.mapreduce.am.resource.mb' value is 1536MB I
>>>> expected the job to never start / be allocated and I have no valid
>>>> explanation.
>>>>
>>>>
>>>>
>>>> On Wed, Aug 14, 2019 at 8:31 PM Jeff Hubbs  wrote:
>>>>
>>>>> To make sure I understand...you've allocated *ten times* your
>>>>> physical RAM for containers? If so, I think that's your issue.
>>>>>
>>>>> For reference, under Hadoop 3.x I didn't have a cluster that would
>>>>> really do anything until its worker nodes had at least 8GiB.
>>>>>
>>>>> On 8/14/19 12:10 PM, . . wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I installed a basic 3 nodes Hadoop 2.9.1 cluster and playing with YARN
>>>>> settings.
>>>>> The 3 nodes has following configuration:
>>>>> 1 cpu / 1 core?? / 512MB RAM
>>>>>
>>>>> I wonder I was able to configure yarn-site.xml with following settings
>>>>> (higher than node physical limits) and successfully run a mapreduce 'pi 1
>>>>> 10' job
>>>>>
>>>>> quote...
>>>>> ?? 
>>>>> ?? ?? ??
>>>>> yarn.resourcemanager.scheduler.classorg.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
>>>>> 
>>>>>
>>>>> ?? ?? 
>>>>> ?? ?? ?? ?? yarn.nodemanager.resource.memory-mb
>>>>> ?? ?? ?? ?? 5120
>>>>> ?? ?? ?? ?? Amount of physical memory, in MB, that can be
>>>>> allocated for containers. If set to -1 and
>>>>> yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
>>>>> automatically calculated. In other cases, the default is
>>>>> 8192MB
>>>>> ?? ?? 
>>>>>
>>>>> ?? ?? 
>>>>> ?? ?? ?? ?? yarn.nodemanager.resource.cpu-vcores
>>>>> ?? ?? ?? ?? 6
>>>>> ?? ?? ?? ?? Number of CPU cores that can be allocated for
>>>>> containers.
>>>>> ?? ?? 
>>>>> ...unquote
>>>>>
>>>>> Can anyone provide an explanation please?
>>>>>
>>>>> Should 'yarn.nodemanager.vmem-check-enabled' and
>>>>> 'yarn.nodemanager.pmem-check-enabled' properties (set to 'true' as 
>>>>> default)
>>>>> check that my YARN settings are higher than physical limits?
>>>>>
>>>>> Which mapreduce 'pi' job settings can I use, to 'force' containers to
>>>>> use more than node physical resources?
>>>>>
>>>>> Many thanks in advance!
>>>>> Guido
>>>>>
>>>>>
>>>>>


Re: Set yarn.nodemanager.resource.memory-mb higher than node physical memory

2019-08-15 Thread Prabhu Josephraj
YARN allocates based on the configuration
(yarn.nodemanager.resource.memory-mb) user has configured. It has allocated
the AM Container of size 1536MB as it can fit in 5120MB Available Node
Size.

yarn.nodemanager.pmem-check-enabled will kill the container if the physical
memory usage of the container process is above
1536MB. MR ApplicationMaster for a pi job is light weight and it won't
require that much memory and so not got killed.



On Thu, Aug 15, 2019 at 4:02 PM . . 
wrote:

> Correct:  I set 'yarn.nodemanager.resource.memory-mb' ten times the node
> physical memory (512MB) and I was able to successfully execute a  'pi 1 10'
> mapreduce job.
>
> Since default 'yarn.app.mapreduce.am.resource.mb' value is 1536MB I
> expected the job to never start / be allocated and I have no valid
> explanation.
>
>
> On Wed, Aug 14, 2019 at 10:32 PM . .  wrote:
>
>> Correct:  I set 'yarn.nodemanager.resource.memory-mb' ten times the node
>> physical memory (512MB) and I was able to successfully execute a  'pi 1 10'
>> mapreduce job.
>>
>> Since default 'yarn.app.mapreduce.am.resource.mb' value is 1536MB I
>> expected the job to never start / be allocated and I have no valid
>> explanation.
>>
>>
>>
>> On Wed, Aug 14, 2019 at 8:31 PM Jeff Hubbs  wrote:
>>
>>> To make sure I understand...you've allocated *ten times* your physical
>>> RAM for containers? If so, I think that's your issue.
>>>
>>> For reference, under Hadoop 3.x I didn't have a cluster that would
>>> really do anything until its worker nodes had at least 8GiB.
>>>
>>> On 8/14/19 12:10 PM, . . wrote:
>>>
>>> Hi all,
>>>
>>> I installed a basic 3 nodes Hadoop 2.9.1 cluster and playing with YARN
>>> settings.
>>> The 3 nodes has following configuration:
>>> 1 cpu / 1 core?? / 512MB RAM
>>>
>>> I wonder I was able to configure yarn-site.xml with following settings
>>> (higher than node physical limits) and successfully run a mapreduce 'pi 1
>>> 10' job
>>>
>>> quote...
>>> ?? 
>>> ?? ?? ??
>>> yarn.resourcemanager.scheduler.classorg.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
>>> 
>>>
>>> ?? ?? 
>>> ?? ?? ?? ?? yarn.nodemanager.resource.memory-mb
>>> ?? ?? ?? ?? 5120
>>> ?? ?? ?? ?? Amount of physical memory, in MB, that can be
>>> allocated for containers. If set to -1 and
>>> yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
>>> automatically calculated. In other cases, the default is
>>> 8192MB
>>> ?? ?? 
>>>
>>> ?? ?? 
>>> ?? ?? ?? ?? yarn.nodemanager.resource.cpu-vcores
>>> ?? ?? ?? ?? 6
>>> ?? ?? ?? ?? Number of CPU cores that can be allocated for
>>> containers.
>>> ?? ?? 
>>> ...unquote
>>>
>>> Can anyone provide an explanation please?
>>>
>>> Should 'yarn.nodemanager.vmem-check-enabled' and
>>> 'yarn.nodemanager.pmem-check-enabled' properties (set to 'true' as default)
>>> check that my YARN settings are higher than physical limits?
>>>
>>> Which mapreduce 'pi' job settings can I use, to 'force' containers to
>>> use more than node physical resources?
>>>
>>> Many thanks in advance!
>>> Guido
>>>
>>>
>>>


Re: RM web got HTTP ERROR 500

2019-06-12 Thread Prabhu Josephraj
Hi Kevin,

 Looks different versions of hadoop-yarn-api jar is in the classpath of
Yarn ResourceManager. Can you remove the older jars if any in classpath.
lsof -p  or adding -verbose in YARN_OPTS in yarn.cmd file will help
to find the wrong jars.

Thanks,
Prabhu Joseph

On Wed, Jun 12, 2019 at 8:36 PM kevin su  wrote:

> Hi all,
>
> I have already restarted my cluster , still go same error.
>
> my *yarn-site.xml*
>  
>  
>  yarn.nodemanager.aux-services
>   mapreduce_shuffle
>   
>  
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org


Re: How to restrict users who can post domains/entities to the YARN Timeline Server?

2019-05-30 Thread Prabhu Josephraj
Hi Junseung,

  You are right, any one who has a valid kerberos ticket is allowed
to put a domain, but the owner of domain can decide who can write and read
entities into
the domain. We can write a custom Filter with extra logic to restrict
certain users from creating domain and add the custom FilterInitializer in
hadoop.http.filter.initializers.


Thanks,
Prabhu Joseph




On Thu, May 30, 2019 at 5:31 PM Junseung Hwang  wrote:

> Hi,
>
> I’m using the YARN Timeline Server v1 from Hadoop 2.7.7, and I want the
> Timeline Server to be secure.
>
> To configure Kerberos authentication and authorization, I set the
> followings in yarn-site.xml:
> - yarn.timeline-service.http-authentication.type: kerberos
> - yarn.timeline-service.http-authentication.kerberos.principal
> - yarn.timeline-service.http-authentication.kerberos.keytab
> - yarn.acl.enable: true
> - yarn.admin.acl: (space)
>
> However, as far as I know, anyone who has a Kerberos ticket can create a
> new Timeline domain unless the ID of the domain already exists. After then,
> the one can post timeline entities to the domain.
>
> My question is, is there any way to restrict users who can post domains
> and entities to Timeline Server without modifying Hadoop source codes?
>
> Best regards,
>
> Junseung.
>


Re: Resource Manager UI showing running jobs but no actual jobs running

2019-04-02 Thread Prabhu Josephraj
Hi George,

 The symptoms of YARN-7163 are - RM UI shows old completed jobs, high
Heap and CPU Usage.

High CPU Usage usually happens during continous Full GC which will inturn
causes OOM if no more
heap available to allocate new objects. High CPU Usage could be a symptom
of High Heap Usage.

1. Can you check if the jobs shown as Running are already completed ones.

2. Heap Dump from RM when UI shows old completed jobs as Running will help
to prove -
it will match the image [1] where RMActiveServiceContext applications will
have completed
applications list.

Also check comment [2] and YARN-7065 (Dup of YARN-7163) which matches the
issue reported.

[1] https://issues.apache.org/jira/secure/attachment/12885607/suspect-1.png
[2]
https://issues.apache.org/jira/browse/YARN-7163?focusedCommentId=16158652=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16158652

Thanks,
Prabhu Joseph



On Tue, Apr 2, 2019 at 8:27 PM George Liaw  wrote:

> Hi Prabhu,
>
> Unfortunately I don't believe that is the same issue we are seeing. We are
> experiencing high cpu usage and we are not getting OOM errors.
>
> Is there reason to believe they're the same issue?
>
>
> On Tue, Apr 2, 2019, 2:15 AM Prabhu Josephraj 
> wrote:
>
>> Hi George,
>>
>> Have seen this issue - RM UI will show the old job list and the RM
>> process heap usage will be high. This is due to a Bug fixed by YARN-7163.
>> Can you test with patch from YARN-7163.
>>
>> Thanks,
>> Prabhu Joseph
>>
>>
>> On Tue, Apr 2, 2019 at 4:59 AM George Liaw 
>> wrote:
>>
>>> Hi all,
>>>
>>> Using Hadoop 2.7.2.
>>> Wondering if anyone's seen an issue before where every once in a while
>>> the resource manager gets into a weird state where the Applications
>>> dashboard shows jobs running, but there are no actual jobs running on the
>>> cluster. When this happens we'll see RM cpu usage flat-lining at very high
>>> levels (around 85%), but the datanodes/nodemanagers will have no load
>>> because of no jobs running. If we restart the RM and let it fail over to
>>> the stand-by, the cluster will go back to normal behavior and start running
>>> jobs again after 15-30 minutes.
>>>
>>> Bit of a strange situation - not entirely sure why the RM would fail to
>>> realize that the jobs have finished running and that the jobs sitting in
>>> accepted state are free to run. Also strange that the RM gets stuck at high
>>> cpu usage.
>>>
>>> If anyone can point me in the right direction on how to debug or resolve
>>> this, that would be much appreciated!
>>>
>>> --
>>> George A. Liaw
>>>
>>> (408) 318-7920
>>> george.a.l...@gmail.com
>>> LinkedIn <http://www.linkedin.com/in/georgeliaw/>
>>>
>>


Re: Resource Manager UI showing running jobs but no actual jobs running

2019-04-02 Thread Prabhu Josephraj
Hi George,

Have seen this issue - RM UI will show the old job list and the RM
process heap usage will be high. This is due to a Bug fixed by YARN-7163.
Can you test with patch from YARN-7163.

Thanks,
Prabhu Joseph


On Tue, Apr 2, 2019 at 4:59 AM George Liaw  wrote:

> Hi all,
>
> Using Hadoop 2.7.2.
> Wondering if anyone's seen an issue before where every once in a while the
> resource manager gets into a weird state where the Applications dashboard
> shows jobs running, but there are no actual jobs running on the cluster.
> When this happens we'll see RM cpu usage flat-lining at very high levels
> (around 85%), but the datanodes/nodemanagers will have no load because of
> no jobs running. If we restart the RM and let it fail over to the stand-by,
> the cluster will go back to normal behavior and start running jobs again
> after 15-30 minutes.
>
> Bit of a strange situation - not entirely sure why the RM would fail to
> realize that the jobs have finished running and that the jobs sitting in
> accepted state are free to run. Also strange that the RM gets stuck at high
> cpu usage.
>
> If anyone can point me in the right direction on how to debug or resolve
> this, that would be much appreciated!
>
> --
> George A. Liaw
>
> (408) 318-7920
> george.a.l...@gmail.com
> LinkedIn 
>


Re: Spark pools support on Yarn

2019-02-26 Thread Prabhu Josephraj
Hi Anton,

   Spark Pools / Spark Fair Scheduler is scheduling the tasks within a
Spark Job. Each Spark job will have multiple stages and each stage will
have multiple tasks.
This is different from YARN Fair Scheduler which schedules the jobs
submitted to YARN Cluster. Spark Pools within a Spark Job will work on YARN
Cluster as well.

Thanks,
Prabhu Joseph

On Tue, Feb 26, 2019 at 11:53 AM Anton Puzanov 
wrote:

> Hi everyone,
>
> Spark supports in application, job concurrency execution by using pools
> and Spark's Fair scheduler (different than Yarn's Fair scheduler).
> link:
> https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
>
> Is this feature supported when Yarn is used as a cluster manager? Are
> there special configurations I have to set or common pitfalls I need to be
> aware of?
>
> Thanks,
> Anton
>
>


Re: The AvailableVCores of the scheduler queue is a negative number

2019-02-19 Thread Prabhu Josephraj
This is expected for DefaultResourceCalculator (Memory based scheduling)
where it allocates requested n memory and 1 core (logical) per container.
Say a node has 100GB and 5 cores, 15 containers requested each with 10 GB,
10 containers will be allocated and available node resource will be 0GB and
-5 cores. It stops scheduling once memory is exhausted without considering
the cores.

In case of DominantResourceCalculator, it considers both memory and cpu, it
stops scheduling once any of memory and cpu is exhausted. In above example,
it stops scheduling after 5 containers allocated (50GB and 5 cores) with
remaining 50GB and 0 cores.

On Tue, Feb 19, 2019 at 9:59 PM Huang Meilong  wrote:

> Hi,
> I'm getting metrics of scheduler queue with jmx:
>
> http://localhost:8088/jmx?qry=Hadoop:service=ResourceManager,name=QueueMetrics,*
>
> I found some negative data points of AvailableVCores, is this a bug of
> YARN?
>
>
> timestamp: 1550565127000, yarn.QueueMetrics.root.AvailableVCores=-31
> timestamp: 1550565156000, yarn.QueueMetrics.root.AvailableVCores=-31
> timestamp: 1550565186000, yarn.QueueMetrics.root.AvailableVCores=-32
> timestamp: 155056522, yarn.QueueMetrics.root.AvailableVCores=14
> timestamp: 155056525, yarn.QueueMetrics.root.AvailableVCores=14
>


Re: yarn usercache dir not resolved properly when running an example application

2019-02-14 Thread Prabhu Josephraj
In case of Distributed Shell Job - ApplicationMaster runs in normal linux
container and the subsequent shell command runs inside Docker
container. The job fails even before launching AM, that is before starting
Docker Container. I think the Distributed Shell job will fail even
without Docker Settings.

As per the error code 20 , it is mostly related to accessing of NM local
directory.

https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html

20

INITIALIZE_USER_FAILED

Couldn't get, stat, or secure the per-user NodeManager directory.

Can we try below steps on (all) NodeManager machine.

Remove all contents under /data/yarn and make sure the /data and /data/yarn
directory permission is 755 with owner root:root and local directory
is owned by yarn:hadoop.

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /
drwxr-xr-x.   5 root root44 Oct 24 11:47 data

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/
drwxr-xr-x. 4 root  root   28 Oct 24 14:30 yarn

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/
total 4
drwxr-xr-x.  5 yarn hadoop   54 Feb 14 17:32 local
drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log

And also check if Distributed Shell jobs runs fine without Docker Settings.





On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap  wrote:

> Hi Prabhu,
>
> Thanks for your reply.
> I tried the configurations as per your suggestion. But I get the
> same error.
> Is this related to container localization by any chance?.
> Also, is there any log or out information which says that the docker
> container runtime has been picked up.?
>
>
>
> On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj 
> wrote:
>
>> Hi Vinay,
>>
>> Can you try specifying below configs under Docker section in
>> container-executor.cfg which will allow Docker Containers to use the NM
>> Local Dirs.
>>
>>
>> docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
>>   docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log
>>
>> Thanks,
>> Prabhu Joseph
>>
>> On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap 
>> wrote:
>>
>>>
>>> I am using Hadoop 3.2.0 and trying to run a simple application in a
>>> docker container and I have made the required configuration changes both in
>>> *yarn-site.xml* and *container-executor.cfg* to choose
>>> LinuxContainerExecutor and docker runtime.
>>>
>>> I use the example of distributed shell in one of the hortonworks blog.
>>> https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
>>>
>>> The problem I face here is when the application is submitted to YARN it
>>> fails with a reason related to directory creation issue with the below error
>>>
>>> 2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application
>>> report from ASM for, appId=2, clientToAMToken=null,
>>> appDiagnostics=Application application_1550156488785_0002 failed 2 times
>>> due to AM Container for appattempt_1550156488785_0002_02 exited with
>>> exitCode: -1000 Failing this attempt.Diagnostics: [2019-02-14
>>> 20:51:16.282]Application application_1550156488785_0002 initialization
>>> failed (exitCode=20) with output: main : command provided 0 main : user is
>>> myuser main : requested yarn user is myuser Failed to create directory
>>> /data/yarn/local/nmPrivate/container_1550156488785_0002_02_01.tokens/usercache/myuser
>>> - Not a directory
>>>
>>> I have configured *yarn.nodemanager.local-dirs* in yarn-site.xml and I
>>> can see the same reflected in YARN web ui *localhost:8088/conf*
>>>
>>> 
>>> yarn.nodemanager.local-dirs
>>> /data/yarn/local
>>> false
>>> yarn-site.xml
>>> 
>>>
>>> I do not understand why is it trying to create usercache dir inside the
>>> nmPrivate directory.
>>>
>>> Note : I have verified the permissions for myuser to the directories and
>>> also have tried clearing the directories manually as suggested in a related
>>> post. But no fruit. I do not see any additional information about container
>>> launch failure in any other logs.
>>>
>>> How do I debug why the usercache dir is not resolved properly??
>>>
>>> Really appreciate any help on this.
>>>
>>> Thanks
>>>
>>> Vinay Kashyap
>>>
>>
>
> --
> *Thanks and regards*
> *Vinay Kashyap*
>


Re: yarn usercache dir not resolved properly when running an example application

2019-02-14 Thread Prabhu Josephraj
Hi Vinay,

Can you try specifying below configs under Docker section in
container-executor.cfg which will allow Docker Containers to use the NM
Local Dirs.

  docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
  docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log

Thanks,
Prabhu Joseph

On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap  wrote:

>
> I am using Hadoop 3.2.0 and trying to run a simple application in a docker
> container and I have made the required configuration changes both in
> *yarn-site.xml* and *container-executor.cfg* to choose
> LinuxContainerExecutor and docker runtime.
>
> I use the example of distributed shell in one of the hortonworks blog.
> https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
>
> The problem I face here is when the application is submitted to YARN it
> fails with a reason related to directory creation issue with the below error
>
> 2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application
> report from ASM for, appId=2, clientToAMToken=null,
> appDiagnostics=Application application_1550156488785_0002 failed 2 times
> due to AM Container for appattempt_1550156488785_0002_02 exited with
> exitCode: -1000 Failing this attempt.Diagnostics: [2019-02-14
> 20:51:16.282]Application application_1550156488785_0002 initialization
> failed (exitCode=20) with output: main : command provided 0 main : user is
> myuser main : requested yarn user is myuser Failed to create directory
> /data/yarn/local/nmPrivate/container_1550156488785_0002_02_01.tokens/usercache/myuser
> - Not a directory
>
> I have configured *yarn.nodemanager.local-dirs* in yarn-site.xml and I
> can see the same reflected in YARN web ui *localhost:8088/conf*
>
> 
> yarn.nodemanager.local-dirs
> /data/yarn/local
> false
> yarn-site.xml
> 
>
> I do not understand why is it trying to create usercache dir inside the
> nmPrivate directory.
>
> Note : I have verified the permissions for myuser to the directories and
> also have tried clearing the directories manually as suggested in a related
> post. But no fruit. I do not see any additional information about container
> launch failure in any other logs.
>
> How do I debug why the usercache dir is not resolved properly??
>
> Really appreciate any help on this.
>
> Thanks
>
> Vinay Kashyap
>


How Useful YARN Placement Constraint for MapReduce Jobs

2019-02-08 Thread Prabhu Josephraj
Hi,

  Was thinking on supporting YARN Placement Constraint for MapReduce
Applications, but want to check with you on how useful it will be?. Mappers
usually runs on Data Local machine and so won't need but Reducers can gain
by distributing the reducers to different machine using Anti Affinity. And
also by having Affinity to say HRegionServer Machine in case of writing
output to Hbase.

Thanks,
Prabhu Joseph