http://git-wip-us.apache.org/repos/asf/hadoop/blob/aafe5713/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm deleted file mode 100644 index 57a47fd..0000000 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm +++ /dev/null @@ -1,757 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Hadoop Map Reduce Next Generation-${project.version} - Writing YARN - Applications - --- - --- - ${maven.build.timestamp} - -Hadoop MapReduce Next Generation - Writing YARN Applications - -%{toc|section=1|fromDepth=0} - -* Purpose - - This document describes, at a high-level, the way to implement new - Applications for YARN. - -* Concepts and Flow - - The general concept is that an <application submission client> submits an - <application> to the YARN <ResourceManager> (RM). This can be done through - setting up a <<<YarnClient>>> object. After <<<YarnClient>>> is started, the - client can then set up application context, prepare the very first container of - the application that contains the <ApplicationMaster> (AM), and then submit - the application. You need to provide information such as the details about the - local files/jars that need to be available for your application to run, the - actual command that needs to be executed (with the necessary command line - arguments), any OS environment settings (optional), etc. Effectively, you - need to describe the Unix process(es) that needs to be launched for your - ApplicationMaster. - - The YARN ResourceManager will then launch the ApplicationMaster (as - specified) on an allocated container. The ApplicationMaster communicates with - YARN cluster, and handles application execution. It performs operations in an - asynchronous fashion. During application launch time, the main tasks of the - ApplicationMaster are: a) communicating with the ResourceManager to negotiate - and allocate resources for future containers, and b) after container - allocation, communicating YARN <NodeManager>s (NMs) to launch application - containers on them. Task a) can be performed asynchronously through an - <<<AMRMClientAsync>>> object, with event handling methods specified in a - <<<AMRMClientAsync.CallbackHandler>>> type of event handler. The event handler - needs to be set to the client explicitly. Task b) can be performed by launching - a runnable object that then launches containers when there are containers - allocated. As part of launching this container, the AM has to - specify the <<<ContainerLaunchContext>>> that has the launch information such as - command line specification, environment, etc. - - During the execution of an application, the ApplicationMaster communicates - NodeManagers through <<<NMClientAsync>>> object. All container events are - handled by <<<NMClientAsync.CallbackHandler>>>, associated with - <<<NMClientAsync>>>. A typical callback handler handles client start, stop, - status update and error. ApplicationMaster also reports execution progress to - ResourceManager by handling the <<<getProgress()>>> method of - <<<AMRMClientAsync.CallbackHandler>>>. - - Other than asynchronous clients, there are synchronous versions for certain - workflows (<<<AMRMClient>>> and <<<NMClient>>>). The asynchronous clients are - recommended because of (subjectively) simpler usages, and this article - will mainly cover the asynchronous clients. Please refer to <<<AMRMClient>>> - and <<<NMClient>>> for more information on synchronous clients. - -* Interfaces - - The interfaces you'd most like be concerned with are: - - * <<Client>>\<--\><<ResourceManager>>\ - By using <<<YarnClient>>> objects. - - * <<ApplicationMaster>>\<--\><<ResourceManager>>\ - By using <<<AMRMClientAsync>>> objects, handling events asynchronously by - <<<AMRMClientAsync.CallbackHandler>>> - - * <<ApplicationMaster>>\<--\><<NodeManager>>\ - Launch containers. Communicate with NodeManagers - by using <<<NMClientAsync>>> objects, handling container events by - <<<NMClientAsync.CallbackHandler>>> - - [] - - <<Note>> - - * The three main protocols for YARN application (ApplicationClientProtocol, - ApplicationMasterProtocol and ContainerManagementProtocol) are still - preserved. The 3 clients wrap these 3 protocols to provide simpler - programming model for YARN applications. - - * Under very rare circumstances, programmer may want to directly use the 3 - protocols to implement an application. However, note that <such behaviors - are no longer encouraged for general use cases>. - - [] - -* Writing a Simple Yarn Application - -** Writing a simple Client - - * The first step that a client needs to do is to initialize and start a - YarnClient. - -+---+ - YarnClient yarnClient = YarnClient.createYarnClient(); - yarnClient.init(conf); - yarnClient.start(); -+---+ - - * Once a client is set up, the client needs to create an application, and get - its application id. - -+---+ - YarnClientApplication app = yarnClient.createApplication(); - GetNewApplicationResponse appResponse = app.getNewApplicationResponse(); -+---+ - - * The response from the <<<YarnClientApplication>>> for a new application also - contains information about the cluster such as the minimum/maximum resource - capabilities of the cluster. This is required so that to ensure that you can - correctly set the specifications of the container in which the - ApplicationMaster would be launched. Please refer to - <<<GetNewApplicationResponse>>> for more details. - - * The main crux of a client is to setup the <<<ApplicationSubmissionContext>>> - which defines all the information needed by the RM to launch the AM. A client - needs to set the following into the context: - - * Application info: id, name - - * Queue, priority info: Queue to which the application will be submitted, - the priority to be assigned for the application. - - * User: The user submitting the application - - * <<<ContainerLaunchContext>>>: The information defining the container in - which the AM will be launched and run. The <<<ContainerLaunchContext>>>, as - mentioned previously, defines all the required information needed to run - the application such as the local <<R>>esources (binaries, jars, files - etc.), <<E>>nvironment settings (CLASSPATH etc.), the <<C>>ommand to be - executed and security <<T>>okens (<RECT>). - - [] - -+---+ - // set the application submission context - ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext(); - ApplicationId appId = appContext.getApplicationId(); - - appContext.setKeepContainersAcrossApplicationAttempts(keepContainers); - appContext.setApplicationName(appName); - - // set local resources for the application master - // local files or archives as needed - // In this scenario, the jar file for the application master is part of the local resources - Map<String, LocalResource> localResources = new HashMap<String, LocalResource>(); - - LOG.info("Copy App Master jar from local filesystem and add to local environment"); - // Copy the application master jar to the filesystem - // Create a local resource to point to the destination jar path - FileSystem fs = FileSystem.get(conf); - addToLocalResources(fs, appMasterJar, appMasterJarPath, appId.toString(), - localResources, null); - - // Set the log4j properties if needed - if (!log4jPropFile.isEmpty()) { - addToLocalResources(fs, log4jPropFile, log4jPath, appId.toString(), - localResources, null); - } - - // The shell script has to be made available on the final container(s) - // where it will be executed. - // To do this, we need to first copy into the filesystem that is visible - // to the yarn framework. - // We do not need to set this as a local resource for the application - // master as the application master does not need it. - String hdfsShellScriptLocation = ""; - long hdfsShellScriptLen = 0; - long hdfsShellScriptTimestamp = 0; - if (!shellScriptPath.isEmpty()) { - Path shellSrc = new Path(shellScriptPath); - String shellPathSuffix = - appName + "/" + appId.toString() + "/" + SCRIPT_PATH; - Path shellDst = - new Path(fs.getHomeDirectory(), shellPathSuffix); - fs.copyFromLocalFile(false, true, shellSrc, shellDst); - hdfsShellScriptLocation = shellDst.toUri().toString(); - FileStatus shellFileStatus = fs.getFileStatus(shellDst); - hdfsShellScriptLen = shellFileStatus.getLen(); - hdfsShellScriptTimestamp = shellFileStatus.getModificationTime(); - } - - if (!shellCommand.isEmpty()) { - addToLocalResources(fs, null, shellCommandPath, appId.toString(), - localResources, shellCommand); - } - - if (shellArgs.length > 0) { - addToLocalResources(fs, null, shellArgsPath, appId.toString(), - localResources, StringUtils.join(shellArgs, " ")); - } - - // Set the env variables to be setup in the env where the application master will be run - LOG.info("Set the environment for the application master"); - Map<String, String> env = new HashMap<String, String>(); - - // put location of shell script into env - // using the env info, the application master will create the correct local resource for the - // eventual containers that will be launched to execute the shell scripts - env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLOCATION, hdfsShellScriptLocation); - env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTTIMESTAMP, Long.toString(hdfsShellScriptTimestamp)); - env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLEN, Long.toString(hdfsShellScriptLen)); - - // Add AppMaster.jar location to classpath - // At some point we should not be required to add - // the hadoop specific classpaths to the env. - // It should be provided out of the box. - // For now setting all required classpaths including - // the classpath to "." for the application jar - StringBuilder classPathEnv = new StringBuilder(Environment.CLASSPATH.$$()) - .append(ApplicationConstants.CLASS_PATH_SEPARATOR).append("./*"); - for (String c : conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH, - YarnConfiguration.DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH)) { - classPathEnv.append(ApplicationConstants.CLASS_PATH_SEPARATOR); - classPathEnv.append(c.trim()); - } - classPathEnv.append(ApplicationConstants.CLASS_PATH_SEPARATOR).append( - "./log4j.properties"); - - // Set the necessary command to execute the application master - Vector<CharSequence> vargs = new Vector<CharSequence>(30); - - // Set java executable command - LOG.info("Setting up app master command"); - vargs.add(Environment.JAVA_HOME.$$() + "/bin/java"); - // Set Xmx based on am memory size - vargs.add("-Xmx" + amMemory + "m"); - // Set class name - vargs.add(appMasterMainClass); - // Set params for Application Master - vargs.add("--container_memory " + String.valueOf(containerMemory)); - vargs.add("--container_vcores " + String.valueOf(containerVirtualCores)); - vargs.add("--num_containers " + String.valueOf(numContainers)); - vargs.add("--priority " + String.valueOf(shellCmdPriority)); - - for (Map.Entry<String, String> entry : shellEnv.entrySet()) { - vargs.add("--shell_env " + entry.getKey() + "=" + entry.getValue()); - } - if (debugFlag) { - vargs.add("--debug"); - } - - vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/AppMaster.stdout"); - vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/AppMaster.stderr"); - - // Get final commmand - StringBuilder command = new StringBuilder(); - for (CharSequence str : vargs) { - command.append(str).append(" "); - } - - LOG.info("Completed setting up app master command " + command.toString()); - List<String> commands = new ArrayList<String>(); - commands.add(command.toString()); - - // Set up the container launch context for the application master - ContainerLaunchContext amContainer = ContainerLaunchContext.newInstance( - localResources, env, commands, null, null, null); - - // Set up resource type requirements - // For now, both memory and vcores are supported, so we set memory and - // vcores requirements - Resource capability = Resource.newInstance(amMemory, amVCores); - appContext.setResource(capability); - - // Service data is a binary blob that can be passed to the application - // Not needed in this scenario - // amContainer.setServiceData(serviceData); - - // Setup security tokens - if (UserGroupInformation.isSecurityEnabled()) { - // Note: Credentials class is marked as LimitedPrivate for HDFS and MapReduce - Credentials credentials = new Credentials(); - String tokenRenewer = conf.get(YarnConfiguration.RM_PRINCIPAL); - if (tokenRenewer == null || tokenRenewer.length() == 0) { - throw new IOException( - "Can't get Master Kerberos principal for the RM to use as renewer"); - } - - // For now, only getting tokens for the default file-system. - final Token<?> tokens[] = - fs.addDelegationTokens(tokenRenewer, credentials); - if (tokens != null) { - for (Token<?> token : tokens) { - LOG.info("Got dt for " + fs.getUri() + "; " + token); - } - } - DataOutputBuffer dob = new DataOutputBuffer(); - credentials.writeTokenStorageToStream(dob); - ByteBuffer fsTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength()); - amContainer.setTokens(fsTokens); - } - - appContext.setAMContainerSpec(amContainer); -+---+ - - * After the setup process is complete, the client is ready to submit - the application with specified priority and queue. - -+---+ - // Set the priority for the application master - Priority pri = Priority.newInstance(amPriority); - appContext.setPriority(pri); - - // Set the queue to which this application is to be submitted in the RM - appContext.setQueue(amQueue); - - // Submit the application to the applications manager - // SubmitApplicationResponse submitResp = applicationsManager.submitApplication(appRequest); - - yarnClient.submitApplication(appContext); -+---+ - - * At this point, the RM will have accepted the application and in the - background, will go through the process of allocating a container with the - required specifications and then eventually setting up and launching the AM - on the allocated container. - - * There are multiple ways a client can track progress of the actual task. - - * It can communicate with the RM and request for a report of the application - via the <<<getApplicationReport()>>> method of <<<YarnClient>>>. - -+-----+ - // Get application report for the appId we are interested in - ApplicationReport report = yarnClient.getApplicationReport(appId); -+-----+ - - The <<<ApplicationReport>>> received from the RM consists of the following: - - * General application information: Application id, queue to which the - application was submitted, user who submitted the application and the - start time for the application. - - * ApplicationMaster details: the host on which the AM is running, the - rpc port (if any) on which it is listening for requests from clients - and a token that the client needs to communicate with the AM. - - * Application tracking information: If the application supports some form - of progress tracking, it can set a tracking url which is available via - <<<ApplicationReport>>>'s <<<getTrackingUrl()>>> method that a client - can look at to monitor progress. - - * Application status: The state of the application as seen by the - ResourceManager is available via - <<<ApplicationReport#getYarnApplicationState>>>. If the - <<<YarnApplicationState>>> is set to <<<FINISHED>>>, the client should - refer to <<<ApplicationReport#getFinalApplicationStatus>>> to check for - the actual success/failure of the application task itself. In case of - failures, <<<ApplicationReport#getDiagnostics>>> may be useful to shed - some more light on the the failure. - - * If the ApplicationMaster supports it, a client can directly query the AM - itself for progress updates via the host:rpcport information obtained from - the application report. It can also use the tracking url obtained from the - report if available. - - * In certain situations, if the application is taking too long or due to other - factors, the client may wish to kill the application. <<<YarnClient>>> - supports the <<<killApplication>>> call that allows a client to send a kill - signal to the AM via the ResourceManager. An ApplicationMaster if so - designed may also support an abort call via its rpc layer that a client may - be able to leverage. - -+---+ - yarnClient.killApplication(appId); -+---+ - -** Writing an ApplicationMaster (AM) - - * The AM is the actual owner of the job. It will be launched - by the RM and via the client will be provided all the - necessary information and resources about the job that it has been tasked - with to oversee and complete. - - * As the AM is launched within a container that may (likely - will) be sharing a physical host with other containers, given the - multi-tenancy nature, amongst other issues, it cannot make any assumptions - of things like pre-configured ports that it can listen on. - - * When the AM starts up, several parameters are made available - to it via the environment. These include the <<<ContainerId>>> for the - AM container, the application submission time and details - about the NM (NodeManager) host running the ApplicationMaster. - Ref <<<ApplicationConstants>>> for parameter names. - - * All interactions with the RM require an <<<ApplicationAttemptId>>> (there can - be multiple attempts per application in case of failures). The - <<<ApplicationAttemptId>>> can be obtained from the AM's container id. There - are helper APIs to convert the value obtained from the environment into - objects. - -+---+ - Map<String, String> envs = System.getenv(); - String containerIdString = - envs.get(ApplicationConstants.AM_CONTAINER_ID_ENV); - if (containerIdString == null) { - // container id should always be set in the env by the framework - throw new IllegalArgumentException( - "ContainerId not set in the environment"); - } - ContainerId containerId = ConverterUtils.toContainerId(containerIdString); - ApplicationAttemptId appAttemptID = containerId.getApplicationAttemptId(); -+---+ - - * After an AM has initialized itself completely, we can start the two clients: - one to ResourceManager, and one to NodeManagers. We set them up with our - customized event handler, and we will talk about those event handlers in - detail later in this article. - -+---+ - AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); - amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); - amRMClient.init(conf); - amRMClient.start(); - - containerListener = createNMCallbackHandler(); - nmClientAsync = new NMClientAsyncImpl(containerListener); - nmClientAsync.init(conf); - nmClientAsync.start(); -+---+ - - * The AM has to emit heartbeats to the RM to keep it informed that the AM is - alive and still running. The timeout expiry interval at the RM is defined by - a config setting accessible via - <<<YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS>>> with the default being - defined by <<<YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS>>>. The - ApplicationMaster needs to register itself with the ResourceManager to - start hearbeating. - -+---+ - // Register self with ResourceManager - // This will start heartbeating to the RM - appMasterHostname = NetUtils.getHostname(); - RegisterApplicationMasterResponse response = amRMClient - .registerApplicationMaster(appMasterHostname, appMasterRpcPort, - appMasterTrackingUrl); -+---+ - - * In the response of the registration, maximum resource capability if included. You may want to use this to check the application's request. - -+---+ - // Dump out information about cluster capability as seen by the - // resource manager - int maxMem = response.getMaximumResourceCapability().getMemory(); - LOG.info("Max mem capabililty of resources in this cluster " + maxMem); - - int maxVCores = response.getMaximumResourceCapability().getVirtualCores(); - LOG.info("Max vcores capabililty of resources in this cluster " + maxVCores); - - // A resource ask cannot exceed the max. - if (containerMemory > maxMem) { - LOG.info("Container memory specified above max threshold of cluster." - + " Using max value." + ", specified=" + containerMemory + ", max=" - + maxMem); - containerMemory = maxMem; - } - - if (containerVirtualCores > maxVCores) { - LOG.info("Container virtual cores specified above max threshold of cluster." - + " Using max value." + ", specified=" + containerVirtualCores + ", max=" - + maxVCores); - containerVirtualCores = maxVCores; - } - List<Container> previousAMRunningContainers = - response.getContainersFromPreviousAttempts(); - LOG.info("Received " + previousAMRunningContainers.size() - + " previous AM's running containers on AM registration."); -+---+ - - * Based on the task requirements, the AM can ask for a set of containers to run - its tasks on. We can now calculate how many containers we need, and request - those many containers. - -+---+ - List<Container> previousAMRunningContainers = - response.getContainersFromPreviousAttempts(); - List<Container> previousAMRunningContainers = - response.getContainersFromPreviousAttempts(); - LOG.info("Received " + previousAMRunningContainers.size() - + " previous AM's running containers on AM registration."); - - int numTotalContainersToRequest = - numTotalContainers - previousAMRunningContainers.size(); - // Setup ask for containers from RM - // Send request for containers to RM - // Until we get our fully allocated quota, we keep on polling RM for - // containers - // Keep looping until all the containers are launched and shell script - // executed on them ( regardless of success/failure). - for (int i = 0; i < numTotalContainersToRequest; ++i) { - ContainerRequest containerAsk = setupContainerAskForRM(); - amRMClient.addContainerRequest(containerAsk); - } -+---+ - - * In <<<setupContainerAskForRM()>>>, the follow two things need some set up: - - * Resource capability: Currently, YARN supports memory based resource - requirements so the request should define how much memory is needed. The - value is defined in MB and has to less than the max capability of the - cluster and an exact multiple of the min capability. Memory resources - correspond to physical memory limits imposed on the task containers. It - will also support computation based resource (vCore), as shown in the code. - - * Priority: When asking for sets of containers, an AM may define different - priorities to each set. For example, the Map-Reduce AM may assign a higher - priority to containers needed for the Map tasks and a lower priority for - the Reduce tasks' containers. - - [] - -+---+ - private ContainerRequest setupContainerAskForRM() { - // setup requirements for hosts - // using * as any host will do for the distributed shell app - // set the priority for the request - Priority pri = Priority.newInstance(requestPriority); - - // Set up resource type requirements - // For now, memory and CPU are supported so we set memory and cpu requirements - Resource capability = Resource.newInstance(containerMemory, - containerVirtualCores); - - ContainerRequest request = new ContainerRequest(capability, null, null, - pri); - LOG.info("Requested container ask: " + request.toString()); - return request; - } -+---+ - - * After container allocation requests have been sent by the application - manager, contailers will be launched asynchronously, by the event handler of - the <<<AMRMClientAsync>>> client. The handler should implement - <<<AMRMClientAsync.CallbackHandler>>> interface. - - * When there are containers allocated, the handler sets up a thread that runs - the code to launch containers. Here we use the name - <<<LaunchContainerRunnable>>> to demonstrate. We will talk about the - <<<LaunchContainerRunnable>>> class in the following part of this article. - -+---+ - @Override - public void onContainersAllocated(List<Container> allocatedContainers) { - LOG.info("Got response from RM for container ask, allocatedCnt=" - + allocatedContainers.size()); - numAllocatedContainers.addAndGet(allocatedContainers.size()); - for (Container allocatedContainer : allocatedContainers) { - LaunchContainerRunnable runnableLaunchContainer = - new LaunchContainerRunnable(allocatedContainer, containerListener); - Thread launchThread = new Thread(runnableLaunchContainer); - - // launch and start the container on a separate thread to keep - // the main thread unblocked - // as all containers may not be allocated at one go. - launchThreads.add(launchThread); - launchThread.start(); - } - } -+---+ - - * On heart beat, the event handler reports the progress of the application. - -+---+ - @Override - public float getProgress() { - // set progress to deliver to RM on next heartbeat - float progress = (float) numCompletedContainers.get() - / numTotalContainers; - return progress; - } -+---+ - - [] - - * The container launch thread actually launches the containers on NMs. After a - container has been allocated to the AM, it needs to follow a similar process - that the client followed in setting up the <<<ContainerLaunchContext>>> for - the eventual task that is going to be running on the allocated Container. - Once the <<<ContainerLaunchContext>>> is defined, the AM can start it through - the <<<NMClientAsync>>>. - -+---+ - // Set the necessary command to execute on the allocated container - Vector<CharSequence> vargs = new Vector<CharSequence>(5); - - // Set executable command - vargs.add(shellCommand); - // Set shell script path - if (!scriptPath.isEmpty()) { - vargs.add(Shell.WINDOWS ? ExecBatScripStringtPath - : ExecShellStringPath); - } - - // Set args for the shell command if any - vargs.add(shellArgs); - // Add log redirect params - vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout"); - vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"); - - // Get final commmand - StringBuilder command = new StringBuilder(); - for (CharSequence str : vargs) { - command.append(str).append(" "); - } - - List<String> commands = new ArrayList<String>(); - commands.add(command.toString()); - - // Set up ContainerLaunchContext, setting local resource, environment, - // command and token for constructor. - - // Note for tokens: Set up tokens for the container too. Today, for normal - // shell commands, the container in distribute-shell doesn't need any - // tokens. We are populating them mainly for NodeManagers to be able to - // download anyfiles in the distributed file-system. The tokens are - // otherwise also useful in cases, for e.g., when one is running a - // "hadoop dfs" command inside the distributed shell. - ContainerLaunchContext ctx = ContainerLaunchContext.newInstance( - localResources, shellEnv, commands, null, allTokens.duplicate(), null); - containerListener.addContainer(container.getId(), container); - nmClientAsync.startContainerAsync(container, ctx); -+---+ - - * The <<<NMClientAsync>>> object, together with its event handler, handles container events. Including container start, stop, status update, and occurs an error. - - * After the ApplicationMaster determines the work is done, it needs to unregister itself through the AM-RM client, and then stops the client. - -+---+ - try { - amRMClient.unregisterApplicationMaster(appStatus, appMessage, null); - } catch (YarnException ex) { - LOG.error("Failed to unregister application", ex); - } catch (IOException e) { - LOG.error("Failed to unregister application", e); - } - - amRMClient.stop(); -+---+ - -~~** Defining the context in which your code runs - -~~*** Container Resource Requests - -~~*** Local Resources - -~~*** Environment - -~~**** Managing the CLASSPATH - -~~** Security - -* FAQ - -** How can I distribute my application's jars to all of the nodes in the YARN - cluster that need it? - - * You can use the LocalResource to add resources to your application request. - This will cause YARN to distribute the resource to the ApplicationMaster - node. If the resource is a tgz, zip, or jar - you can have YARN unzip it. - Then, all you need to do is add the unzipped folder to your classpath. For - example, when creating your application request: - -+---+ - File packageFile = new File(packagePath); - Url packageUrl = ConverterUtils.getYarnUrlFromPath( - FileContext.getFileContext.makeQualified(new Path(packagePath))); - - packageResource.setResource(packageUrl); - packageResource.setSize(packageFile.length()); - packageResource.setTimestamp(packageFile.lastModified()); - packageResource.setType(LocalResourceType.ARCHIVE); - packageResource.setVisibility(LocalResourceVisibility.APPLICATION); - - resource.setMemory(memory); - containerCtx.setResource(resource); - containerCtx.setCommands(ImmutableList.of( - "java -cp './package/*' some.class.to.Run " - + "1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout " - + "2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr")); - containerCtx.setLocalResources( - Collections.singletonMap("package", packageResource)); - appCtx.setApplicationId(appId); - appCtx.setUser(user.getShortUserName); - appCtx.setAMContainerSpec(containerCtx); - yarnClient.submitApplication(appCtx); -+---+ - - As you can see, the <<<setLocalResources>>> command takes a map of names to - resources. The name becomes a sym link in your application's cwd, so you can - just refer to the artifacts inside by using ./package/*. - - Note: Java's classpath (cp) argument is VERY sensitive. - Make sure you get the syntax EXACTLY correct. - - Once your package is distributed to your AM, you'll need to follow the same - process whenever your AM starts a new container (assuming you want the - resources to be sent to your container). The code for this is the same. You - just need to make sure that you give your AM the package path (either HDFS, or - local), so that it can send the resource URL along with the container ctx. - -** How do I get the ApplicationMaster's <<<ApplicationAttemptId>>>? - - * The <<<ApplicationAttemptId>>> will be passed to the AM via the environment - and the value from the environment can be converted into an - <<<ApplicationAttemptId>>> object via the ConverterUtils helper function. - -** Why my container is killed by the NodeManager? - - * This is likely due to high memory usage exceeding your requested container - memory size. There are a number of reasons that can cause this. First, look - at the process tree that the NodeManager dumps when it kills your container. - The two things you're interested in are physical memory and virtual memory. - If you have exceeded physical memory limits your app is using too much - physical memory. If you're running a Java app, you can use -hprof to look at - what is taking up space in the heap. If you have exceeded virtual memory, you - may need to increase the value of the the cluster-wide configuration variable - <<<yarn.nodemanager.vmem-pmem-ratio>>>. - -** How do I include native libraries? - - * Setting <<<-Djava.library.path>>> on the command line while launching a - container can cause native libraries used by Hadoop to not be loaded - correctly and can result in errors. It is cleaner to use - <<<LD_LIBRARY_PATH>>> instead. - -* Useful Links - - * {{{http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html}YARN Architecture}} - - * {{{http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html}YARN Capacity Scheduler}} - - * {{{http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html}YARN Fair Scheduler}} - -* Sample code - - * Yarn distributed shell: in <<<hadoop-yarn-applications-distributedshell>>> - project after you set up your development environment. -
http://git-wip-us.apache.org/repos/asf/hadoop/blob/aafe5713/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YARN.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YARN.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YARN.apt.vm deleted file mode 100644 index 465c5d1..0000000 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YARN.apt.vm +++ /dev/null @@ -1,77 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - YARN - --- - --- - ${maven.build.timestamp} - -Apache Hadoop NextGen MapReduce (YARN) - - MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, - what we call, MapReduce 2.0 (MRv2) or YARN. - - The fundamental idea of MRv2 is to split up the two major functionalities of - the JobTracker, resource management and job scheduling/monitoring, into - separate daemons. The idea is to have a global ResourceManager (<RM>) and - per-application ApplicationMaster (<AM>). An application is either a single - job in the classical sense of Map-Reduce jobs or a DAG of jobs. - - The ResourceManager and per-node slave, the NodeManager (<NM>), form the - data-computation framework. The ResourceManager is the ultimate authority that - arbitrates resources among all the applications in the system. - - The per-application ApplicationMaster is, in effect, a framework specific - library and is tasked with negotiating resources from the ResourceManager and - working with the NodeManager(s) to execute and monitor the tasks. - -[./yarn_architecture.gif] MapReduce NextGen Architecture - - The ResourceManager has two main components: Scheduler and - ApplicationsManager. - - The Scheduler is responsible for allocating resources to the various running - applications subject to familiar constraints of capacities, queues etc. The - Scheduler is pure scheduler in the sense that it performs no monitoring or - tracking of status for the application. Also, it offers no guarantees about - restarting failed tasks either due to application failure or hardware - failures. The Scheduler performs its scheduling function based the resource - requirements of the applications; it does so based on the abstract notion of - a resource <Container> which incorporates elements such as memory, cpu, disk, - network etc. In the first version, only <<<memory>>> is supported. - - The Scheduler has a pluggable policy plug-in, which is responsible for - partitioning the cluster resources among the various queues, applications etc. - The current Map-Reduce schedulers such as the CapacityScheduler and the - FairScheduler would be some examples of the plug-in. - - The CapacityScheduler supports <<<hierarchical queues>>> to allow for more - predictable sharing of cluster resources - - The ApplicationsManager is responsible for accepting job-submissions, - negotiating the first container for executing the application specific - ApplicationMaster and provides the service for restarting the - ApplicationMaster container on failure. - - The NodeManager is the per-machine framework agent who is responsible for - containers, monitoring their resource usage (cpu, memory, disk, network) and - reporting the same to the ResourceManager/Scheduler. - - The per-application ApplicationMaster has the responsibility of negotiating - appropriate resource containers from the Scheduler, tracking their status and - monitoring for progress. - - MRV2 maintains <<API compatibility>> with previous stable release - (hadoop-1.x). This means that all Map-Reduce jobs should still run - unchanged on top of MRv2 with just a recompile. - http://git-wip-us.apache.org/repos/asf/hadoop/blob/aafe5713/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm deleted file mode 100644 index b5e5a25..0000000 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm +++ /dev/null @@ -1,286 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Yarn Commands - --- - --- - ${maven.build.timestamp} - -Yarn Commands - -%{toc|section=1|fromDepth=0} - -* Overview - - Yarn commands are invoked by the bin/yarn script. Running the yarn script - without any arguments prints the description for all commands. - ------- -Usage: yarn [--config confdir] [--loglevel loglevel] COMMAND ------- - - Yarn has an option parsing framework that employs parsing generic options as - well as running classes. - -*---------------+--------------+ -|| COMMAND_OPTIONS || Description | -*---------------+--------------+ -| --config confdir | Overwrites the default Configuration directory. Default -| | is $\{HADOOP_PREFIX\}/conf. -*---------------+--------------+ -| --loglevel loglevel | Overwrites the log level. Valid log levels are FATAL, -| | ERROR, WARN, INFO, DEBUG, and TRACE. Default is INFO. -*---------------+--------------+ -| COMMAND COMMAND_OPTIONS | Various commands with their options are described -| | in the following sections. The commands have been -| | grouped into {{User Commands}} and -| | {{Administration Commands}}. -*---------------+--------------+ - -* {User Commands} - - Commands useful for users of a Hadoop cluster. - -** jar - - Runs a jar file. Users can bundle their Yarn code in a jar file and execute - it using this command. - -------- - Usage: yarn jar <jar> [mainClass] args... -------- - -** application - - Prints application(s) report/kill application - -------- - Usage: yarn application <options> -------- - -*---------------+--------------+ -|| COMMAND_OPTIONS || Description | -*---------------+--------------+ -| -list | Lists applications from the RM. Supports optional use of -appTypes -| | to filter applications based on application type, and -appStates to -| | filter applications based on application state. -*---------------+--------------+ -| -appStates States | Works with -list to filter applications based on input -| | comma-separated list of application states. The valid -| | application state can be one of the following: \ -| | ALL, NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, -| | FINISHED, FAILED, KILLED -*---------------+--------------+ -| -appTypes Types | Works with -list to filter applications based on input -| | comma-separated list of application types. -*---------------+--------------+ -| -status ApplicationId | Prints the status of the application. -*---------------+--------------+ -| -kill ApplicationId | Kills the application. -*---------------+--------------+ - -** node - - Prints node report(s) - -------- - Usage: yarn node <options> -------- - -*---------------+--------------+ -|| COMMAND_OPTIONS || Description | -*---------------+--------------+ -| -list | Lists all running nodes. Supports optional use of -states to filter -| | nodes based on node state, and -all to list all nodes. -*---------------+--------------+ -| -states States | Works with -list to filter nodes based on input -| | comma-separated list of node states. -*---------------+--------------+ -| -all | Works with -list to list all nodes. -*---------------+--------------+ -| -status NodeId | Prints the status report of the node. -*---------------+--------------+ - -** logs - - Dump the container logs - -------- - Usage: yarn logs -applicationId <application ID> <options> -------- - -*---------------+--------------+ -|| COMMAND_OPTIONS || Description | -*---------------+--------------+ -| -applicationId \<application ID\> | Specifies an application id | -*---------------+--------------+ -| -appOwner AppOwner | AppOwner (assumed to be current user if not -| | specified) -*---------------+--------------+ -| -containerId ContainerId | ContainerId (must be specified if node address is -| | specified) -*---------------+--------------+ -| -nodeAddress NodeAddress | NodeAddress in the format nodename:port (must be -| | specified if container id is specified) -*---------------+--------------+ - -** classpath - - Prints the class path needed to get the Hadoop jar and the required libraries - -------- - Usage: yarn classpath -------- - -** version - - Prints the version. - -------- - Usage: yarn version -------- - - -* {Administration Commands} - - Commands useful for administrators of a Hadoop cluster. - -** resourcemanager - - Start the ResourceManager - -------- - Usage: yarn resourcemanager [-format-state-store] -------- - -*---------------+--------------+ -|| COMMAND_OPTIONS || Description | -*---------------+--------------+ -| -format-state-store | Formats the RMStateStore. This will clear the -| | RMStateStore and is useful if past applications are no -| | longer needed. This should be run only when the -| | ResourceManager is not running. -*---------------+--------------+ - -** nodemanager - - Start the NodeManager - -------- - Usage: yarn nodemanager -------- - -** proxyserver - - Start the web proxy server - -------- - Usage: yarn proxyserver -------- - -** rmadmin - - Runs ResourceManager admin client - ----- - yarn rmadmin [-refreshQueues] - [-refreshNodes] - [-refreshUserToGroupsMapping] - [-refreshSuperUserGroupsConfiguration] - [-refreshAdminAcls] - [-refreshServiceAcl] - [-getGroups [username]] - [-transitionToActive [--forceactive] [--forcemanual] <serviceId>] - [-transitionToStandby [--forcemanual] <serviceId>] - [-failover [--forcefence] [--forceactive] <serviceId1> <serviceId2>] - [-getServiceState <serviceId>] - [-checkHealth <serviceId>] - [-help [cmd]] ----- - -*---------------+--------------+ -|| COMMAND_OPTIONS || Description | -*---------------+--------------+ -| -refreshQueues | Reload the queues' acls, states and scheduler specific -| | properties. ResourceManager will reload the mapred-queues -| | configuration file. -*---------------+--------------+ -| -refreshNodes | Refresh the hosts information at the ResourceManager. | -*---------------+--------------+ -| -refreshUserToGroupsMappings| Refresh user-to-groups mappings. | -*---------------+--------------+ -| -refreshSuperUserGroupsConfiguration | Refresh superuser proxy groups -| | mappings. -*---------------+--------------+ -| -refreshAdminAcls | Refresh acls for administration of ResourceManager | -*---------------+--------------+ -| -refreshServiceAcl | Reload the service-level authorization policy file -| | ResourceManager will reload the authorization policy -| | file. -*---------------+--------------+ -| -getGroups [username] | Get groups the specified user belongs to. -*---------------+--------------+ -| -transitionToActive [--forceactive] [--forcemanual] \<serviceId\> | -| | Transitions the service into Active state. -| | Try to make the target active -| | without checking that there is no active node -| | if the --forceactive option is used. -| | This command can not be used if automatic failover is enabled. -| | Though you can override this by --forcemanual option, -| | you need caution. -*---------------+--------------+ -| -transitionToStandby [--forcemanual] \<serviceId\> | -| | Transitions the service into Standby state. -| | This command can not be used if automatic failover is enabled. -| | Though you can override this by --forcemanual option, -| | you need caution. -*---------------+--------------+ -| -failover [--forceactive] \<serviceId1\> \<serviceId2\> | -| | Initiate a failover from serviceId1 to serviceId2. -| | Try to failover to the target service even if it is not ready -| | if the --forceactive option is used. -| | This command can not be used if automatic failover is enabled. -*---------------+--------------+ -| -getServiceState \<serviceId\> | Returns the state of the service. -*---------------+--------------+ -| -checkHealth \<serviceId\> | Requests that the service perform a health -| | check. The RMAdmin tool will exit with a -| | non-zero exit code if the check fails. -*---------------+--------------+ -| -help [cmd] | Displays help for the given command or all commands if none is -| | specified. -*---------------+--------------+ - - -** daemonlog - - Get/Set the log level for each daemon. - -------- - Usage: yarn daemonlog -getlevel <host:port> <name> - Usage: yarn daemonlog -setlevel <host:port> <name> <level> -------- - -*---------------+--------------+ -|| COMMAND_OPTIONS || Description | -*---------------+--------------+ -| -getlevel \<host:port\> \<name\> | Prints the log level of the daemon running -| | at \<host:port\>. This command internally connects to -| | http://\<host:port\>/logLevel?log=\<name\> -*---------------+--------------+ -| -setlevel \<host:port\> \<name\> \<level\> | Sets the log level of the daemon -| | running at \<host:port\>. This command internally connects to -| | http://\<host:port\>/logLevel?log=\<name\> -*---------------+--------------+ - - http://git-wip-us.apache.org/repos/asf/hadoop/blob/aafe5713/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm deleted file mode 100644 index 43e5b02..0000000 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm +++ /dev/null @@ -1,82 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Apache Hadoop NextGen MapReduce - --- - --- - ${maven.build.timestamp} - -MapReduce NextGen aka YARN aka MRv2 - - The new architecture introduced in hadoop-0.23, divides the two major - functions of the JobTracker: resource management and job life-cycle management - into separate components. - - The new ResourceManager manages the global assignment of compute resources to - applications and the per-application ApplicationMaster manages the - applicationâs scheduling and coordination. - - An application is either a single job in the sense of classic MapReduce jobs - or a DAG of such jobs. - - The ResourceManager and per-machine NodeManager daemon, which manages the - user processes on that machine, form the computation fabric. - - The per-application ApplicationMaster is, in effect, a framework specific - library and is tasked with negotiating resources from the ResourceManager and - working with the NodeManager(s) to execute and monitor the tasks. - - More details are available in the {{{./YARN.html}Architecture}} document. - - -Documentation Index - -* YARN - - * {{{./YARN.html}YARN Architecture}} - - * {{{./CapacityScheduler.html}Capacity Scheduler}} - - * {{{./FairScheduler.html}Fair Scheduler}} - - * {{{./ResourceManagerRestart.htaml}ResourceManager Restart}} - - * {{{./ResourceManagerHA.html}ResourceManager HA}} - - * {{{./WebApplicationProxy.html}Web Application Proxy}} - - * {{{./TimelineServer.html}YARN Timeline Server}} - - * {{{./WritingYarnApplications.html}Writing YARN Applications}} - - * {{{./YarnCommands.html}YARN Commands}} - - * {{{hadoop-sls/SchedulerLoadSimulator.html}Scheduler Load Simulator}} - - * {{{./NodeManagerRestart.html}NodeManager Restart}} - - * {{{./DockerContainerExecutor.html}DockerContainerExecutor}} - - * {{{./NodeManagerCGroups.html}Using CGroups}} - - * {{{./SecureContainer.html}Secure Containers}} - - * {{{./registry/index.html}Registry}} - -* YARN REST APIs - - * {{{./WebServicesIntro.html}Introduction}} - - * {{{./ResourceManagerRest.html}Resource Manager}} - - * {{{./NodeManagerRest.html}Node Manager}} http://git-wip-us.apache.org/repos/asf/hadoop/blob/aafe5713/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md ---------------------------------------------------------------------- diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md new file mode 100644 index 0000000..3c32cdd --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md @@ -0,0 +1,186 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +Hadoop: Capacity Scheduler +========================== + +* [Purpose](#Purpose) +* [Overview](#Overview) +* [Features](#Features) +* [Configuration](#Configuration) + * [Setting up `ResourceManager` to use `CapacityScheduler`](#Setting_up_ResourceManager_to_use_CapacityScheduler`) + * [Setting up queues](#Setting_up_queues) + * [Queue Properties](#Queue_Properties) + * [Other Properties](#Other_Properties) + * [Reviewing the configuration of the CapacityScheduler](#Reviewing_the_configuration_of_the_CapacityScheduler) +* [Changing Queue Configuration](#Changing_Queue_Configuration) + +Purpose +------- + +This document describes the `CapacityScheduler`, a pluggable scheduler for Hadoop which allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities. + +Overview +-------- + +The `CapacityScheduler` is designed to run Hadoop applications as a shared, multi-tenant cluster in an operator-friendly manner while maximizing the throughput and the utilization of the cluster. + +Traditionally each organization has it own private set of compute resources that have sufficient capacity to meet the organization's SLA under peak or near peak conditions. This generally leads to poor average utilization and overhead of managing multiple independent clusters, one per each organization. Sharing clusters between organizations is a cost-effective manner of running large Hadoop installations since this allows them to reap benefits of economies of scale without creating private clusters. However, organizations are concerned about sharing a cluster because they are worried about others using the resources that are critical for their SLAs. + +The `CapacityScheduler` is designed to allow sharing a large cluster while giving each organization capacity guarantees. The central idea is that the available resources in the Hadoop cluster are shared among multiple organizations who collectively fund the cluster based on their computing needs. There is an added benefit that an organization can access any excess capacity not being used by others. This provides elasticity for the organizations in a cost-effective manner. + +Sharing clusters across organizations necessitates strong support for multi-tenancy since each organization must be guaranteed capacity and safe-guards to ensure the shared cluster is impervious to single rouge application or user or sets thereof. The `CapacityScheduler` provides a stringent set of limits to ensure that a single application or user or queue cannot consume disproportionate amount of resources in the cluster. Also, the `CapacityScheduler` provides limits on initialized/pending applications from a single user and queue to ensure fairness and stability of the cluster. + +The primary abstraction provided by the `CapacityScheduler` is the concept of *queues*. These queues are typically setup by administrators to reflect the economics of the shared cluster. + +To provide further control and predictability on sharing of resources, the `CapacityScheduler` supports *hierarchical queues* to ensure resources are shared among the sub-queues of an organization before other queues are allowed to use free resources, there-by providing *affinity* for sharing free resources among applications of a given organization. + +Features +-------- + +The `CapacityScheduler` supports the following features: + +* **Hierarchical Queues** - Hierarchy of queues is supported to ensure resources are shared among the sub-queues of an organization before other queues are allowed to use free resources, there-by providing more control and predictability. + +* **Capacity Guarantees** - Queues are allocated a fraction of the capacity of the grid in the sense that a certain capacity of resources will be at their disposal. All applications submitted to a queue will have access to the capacity allocated to the queue. Adminstrators can configure soft limits and optional hard limits on the capacity allocated to each queue. + +* **Security** - Each queue has strict ACLs which controls which users can submit applications to individual queues. Also, there are safe-guards to ensure that users cannot view and/or modify applications from other users. Also, per-queue and system administrator roles are supported. + +* **Elasticity** - Free resources can be allocated to any queue beyond it's capacity. When there is demand for these resources from queues running below capacity at a future point in time, as tasks scheduled on these resources complete, they will be assigned to applications on queues running below the capacity (pre-emption is not supported). This ensures that resources are available in a predictable and elastic manner to queues, thus preventing artifical silos of resources in the cluster which helps utilization. + +* **Multi-tenancy** - Comprehensive set of limits are provided to prevent a single application, user and queue from monopolizing resources of the queue or the cluster as a whole to ensure that the cluster isn't overwhelmed. + +* **Operability** + + * Runtime Configuration - The queue definitions and properties such as capacity, ACLs can be changed, at runtime, by administrators in a secure manner to minimize disruption to users. Also, a console is provided for users and administrators to view current allocation of resources to various queues in the system. Administrators can *add additional queues* at runtime, but queues cannot be *deleted* at runtime. + + * Drain applications - Administrators can *stop* queues at runtime to ensure that while existing applications run to completion, no new applications can be submitted. If a queue is in `STOPPED` state, new applications cannot be submitted to *itself* or *any of its child queueus*. Existing applications continue to completion, thus the queue can be *drained* gracefully. Administrators can also *start* the stopped queues. + +* **Resource-based Scheduling** - Support for resource-intensive applications, where-in a application can optionally specify higher resource-requirements than the default, there-by accomodating applications with differing resource requirements. Currently, *memory* is the the resource requirement supported. + +Configuration +------------- + +###Setting up `ResourceManager` to use `CapacityScheduler` + + To configure the `ResourceManager` to use the `CapacityScheduler`, set the following property in the **conf/yarn-site.xml**: + +| Property | Value | +|:---- |:---- | +| `yarn.resourcemanager.scheduler.class` | `org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler` | + +###Setting up queues + + `etc/hadoop/capacity-scheduler.xml` is the configuration file for the `CapacityScheduler`. + + The `CapacityScheduler` has a pre-defined queue called *root*. All queueus in the system are children of the root queue. + + Further queues can be setup by configuring `yarn.scheduler.capacity.root.queues` with a list of comma-separated child queues. + + The configuration for `CapacityScheduler` uses a concept called *queue path* to configure the hierarchy of queues. The *queue path* is the full path of the queue's hierarchy, starting at *root*, with . (dot) as the delimiter. + + A given queue's children can be defined with the configuration knob: `yarn.scheduler.capacity.<queue-path>.queues`. Children do not inherit properties directly from the parent unless otherwise noted. + + Here is an example with three top-level child-queues `a`, `b` and `c` and some sub-queues for `a` and `b`: + +```xml +<property> + <name>yarn.scheduler.capacity.root.queues</name> + <value>a,b,c</value> + <description>The queues at the this level (root is the root queue). + </description> +</property> + +<property> + <name>yarn.scheduler.capacity.root.a.queues</name> + <value>a1,a2</value> + <description>The queues at the this level (root is the root queue). + </description> +</property> + +<property> + <name>yarn.scheduler.capacity.root.b.queues</name> + <value>b1,b2,b3</value> + <description>The queues at the this level (root is the root queue). + </description> +</property> +``` + +###Queue Properties + + * Resource Allocation + +| Property | Description | +|:---- |:---- | +| `yarn.scheduler.capacity.<queue-path>.capacity` | Queue *capacity* in percentage (%) as a float (e.g. 12.5). The sum of capacities for all queues, at each level, must be equal to 100. Applications in the queue may consume more resources than the queue's capacity if there are free resources, providing elasticity. | +| `yarn.scheduler.capacity.<queue-path>.maximum-capacity` | Maximum queue capacity in percentage (%) as a float. This limits the *elasticity* for applications in the queue. Defaults to -1 which disables it. | +| `yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent` | Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if there is demand for resources. The user limit can vary between a minimum and maximum value. The the former (the minimum value) is set to this property value and the latter (the maximum value) depends on the number of users who have submitted applications. For e.g., suppose the value of this property is 25. If two users have submitted applications to a queue, no single user can use more than 50% of the queue resources. If a third user submits an application, no single user can use more than 33% of the queue resources. With 4 or more users, no user can use more than 25% of the queues resources. A value of 100 implies no user limits are imposed. The default is 100. Value is specified as a integer. | +| `yarn.scheduler.capacity.<queue-path>.user-limit-factor` | The multiple of the queue capacity which can be configured to allow a single user to acquire more resources. By default this is set to 1 which ensures that a single user can never take more than the queue's configured capacity irrespective of how idle th cluster is. Value is specified as a float. | +| `yarn.scheduler.capacity.<queue-path>.maximum-allocation-mb` | The per queue maximum limit of memory to allocate to each container request at the Resource Manager. This setting overrides the cluster configuration `yarn.scheduler.maximum-allocation-mb`. This value must be smaller than or equal to the cluster maximum. | +| `yarn.scheduler.capacity.<queue-path>.maximum-allocation-vcores` | The per queue maximum limit of virtual cores to allocate to each container request at the Resource Manager. This setting overrides the cluster configuration `yarn.scheduler.maximum-allocation-vcores`. This value must be smaller than or equal to the cluster maximum. | + + * Running and Pending Application Limits + + The `CapacityScheduler` supports the following parameters to control the running and pending applications: + +| Property | Description | +|:---- |:---- | +| `yarn.scheduler.capacity.maximum-applications` / `yarn.scheduler.capacity.<queue-path>.maximum-applications` | Maximum number of applications in the system which can be concurrently active both running and pending. Limits on each queue are directly proportional to their queue capacities and user limits. This is a hard limit and any applications submitted when this limit is reached will be rejected. Default is 10000. This can be set for all queues with `yarn.scheduler.capacity.maximum-applications` and can also be overridden on a per queue basis by setting `yarn.scheduler.capacity.<queue-path>.maximum-applications`. Integer value expected. | +| `yarn.scheduler.capacity.maximum-am-resource-percent` / `yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent` | Maximum percent of resources in the cluster which can be used to run application masters - controls number of concurrent active applications. Limits on each queue are directly proportional to their queue capacities and user limits. Specified as a float - ie 0.5 = 50%. Default is 10%. This can be set for all queues with `yarn.scheduler.capacity.maximum-am-resource-percent` and can also be overridden on a per queue basis by setting `yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent` | + + * Queue Administration & Permissions + + The `CapacityScheduler` supports the following parameters to the administer the queues: + +| Property | Description | +|:---- |:---- | +| `yarn.scheduler.capacity.<queue-path>.state` | The *state* of the queue. Can be one of `RUNNING` or `STOPPED`. If a queue is in `STOPPED` state, new applications cannot be submitted to *itself* or *any of its child queues*. Thus, if the *root* queue is `STOPPED` no applications can be submitted to the entire cluster. Existing applications continue to completion, thus the queue can be *drained* gracefully. Value is specified as Enumeration. | +| `yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications` | The *ACL* which controls who can *submit* applications to the given queue. If the given user/group has necessary ACLs on the given queue or *one of the parent queues in the hierarchy* they can submit applications. *ACLs* for this property *are* inherited from the parent queue if not specified. | +| `yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue` | The *ACL* which controls who can *administer* applications on the given queue. If the given user/group has necessary ACLs on the given queue or *one of the parent queues in the hierarchy* they can administer applications. *ACLs* for this property *are* inherited from the parent queue if not specified. | + +**Note:** An *ACL* is of the form *user1*, *user2spacegroup1*, *group2*. The special value of * implies *anyone*. The special value of *space* implies *no one*. The default is * for the root queue if not specified. + +###Other Properties + + * Resource Calculator + +| Property | Description | +|:---- |:---- | +| `yarn.scheduler.capacity.resource-calculator` | The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator only uses Memory while DominantResourceCalculator uses Dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. A Java ResourceCalculator class name is expected. | + + * Data Locality + +| Property | Description | +|:---- |:---- | +| `yarn.scheduler.capacity.node-locality-delay` | Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers. Typically, this should be set to number of nodes in the cluster. By default is setting approximately number of nodes in one rack which is 40. Positive integer value is expected. | + +###Reviewing the configuration of the CapacityScheduler + + Once the installation and configuration is completed, you can review it after starting the YARN cluster from the web-ui. + + * Start the YARN cluster in the normal manner. + + * Open the `ResourceManager` web UI. + + * The */scheduler* web-page should show the resource usages of individual queues. + +Changing Queue Configuration +---------------------------- + +Changing queue properties and adding new queues is very simple. You need to edit **conf/capacity-scheduler.xml** and run *yarn rmadmin -refreshQueues*. + + $ vi $HADOOP_CONF_DIR/capacity-scheduler.xml + $ $HADOOP_YARN_HOME/bin/yarn rmadmin -refreshQueues + +**Note:** Queues cannot be *deleted*, only addition of new queues is supported - the updated queue configuration should be a valid one i.e. queue-capacity at each *level* should be equal to 100%. http://git-wip-us.apache.org/repos/asf/hadoop/blob/aafe5713/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainerExecutor.md.vm ---------------------------------------------------------------------- diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainerExecutor.md.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainerExecutor.md.vm new file mode 100644 index 0000000..fbfe04b --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainerExecutor.md.vm @@ -0,0 +1,154 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +Docker Container Executor +========================= + +* [Overview](#Overview) +* [Cluster Configuration](#Cluster_Configuration) +* [Tips for connecting to a secure docker repository](#Tips_for_connecting_to_a_secure_docker_repository) +* [Job Configuration](#Job_Configuration) +* [Docker Image Requirements](#Docker_Image_Requirements) +* [Working example of yarn launched docker containers](#Working_example_of_yarn_launched_docker_containers) + +Overview +-------- + +[Docker](https://www.docker.io/) combines an easy-to-use interface to Linux containers with easy-to-construct image files for those containers. In short, Docker launches very light weight virtual machines. + +The Docker Container Executor (DCE) allows the YARN NodeManager to launch YARN containers into Docker containers. Users can specify the Docker images they want for their YARN containers. These containers provide a custom software environment in which the user's code runs, isolated from the software environment of the NodeManager. These containers can include special libraries needed by the application, and they can have different versions of Perl, Python, and even Java than what is installed on the NodeManager. Indeed, these containers can run a different flavor of Linux than what is running on the NodeManager -- although the YARN container must define all the environments and libraries needed to run the job, nothing will be shared with the NodeManager. + +Docker for YARN provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). + +Cluster Configuration +--------------------- + +Docker Container Executor runs in non-secure mode of HDFS and YARN. It will not run in secure mode, and will exit if it detects secure mode. + +The DockerContainerExecutor requires Docker daemon to be running on the NodeManagers, and the Docker client installed and able to start Docker containers. To prevent timeouts while starting jobs, the Docker images to be used by a job should already be downloaded in the NodeManagers. Here's an example of how this can be done: + + sudo docker pull sequenceiq/hadoop-docker:2.4.1 + +This should be done as part of the NodeManager startup. + +The following properties must be set in yarn-site.xml: + +```xml +<property> + <name>yarn.nodemanager.docker-container-executor.exec-name</name> + <value>/usr/bin/docker</value> + <description> + Name or path to the Docker client. This is a required parameter. If this is empty, + user must pass an image name as part of the job invocation(see below). + </description> +</property> + +<property> + <name>yarn.nodemanager.container-executor.class</name> + <value>org.apache.hadoop.yarn.server.nodemanager.DockerContainerExecutor</value> + <description> + This is the container executor setting that ensures that all +jobs are started with the DockerContainerExecutor. + </description> +</property> +``` + +Administrators should be aware that DCE doesn't currently provide user name-space isolation. This means, in particular, that software running as root in the YARN container will have root privileges in the underlying NodeManager. Put differently, DCE currently provides no better security guarantees than YARN's Default Container Executor. In fact, DockerContainerExecutor will exit if it detects secure yarn. + +Tips for connecting to a secure docker repository +------------------------------------------------- + +By default, docker images are pulled from the docker public repository. The format of a docker image url is: *username*/*image\_name*. For example, sequenceiq/hadoop-docker:2.4.1 is an image in docker public repository that contains java and hadoop. + +If you want your own private repository, you provide the repository url instead of your username. Therefore, the image url becomes: *private\_repo\_url*/*image\_name*. For example, if your repository is on localhost:8080, your images would be like: localhost:8080/hadoop-docker + +To connect to a secure docker repository, you can use the following invocation: + +``` + docker login [OPTIONS] [SERVER] + + Register or log in to a Docker registry server, if no server is specified + "https://index.docker.io/v1/" is the default. + + -e, --email="" Email + -p, --password="" Password + -u, --username="" Username +``` + +If you want to login to a self-hosted registry you can specify this by adding the server name. + + docker login <private_repo_url> + +This needs to be run as part of the NodeManager startup, or as a cron job if the login session expires periodically. You can login to multiple docker repositories from the same NodeManager, but all your users will have access to all your repositories, as at present the DockerContainerExecutor does not support per-job docker login. + +Job Configuration +----------------- + +Currently you cannot configure any of the Docker settings with the job configuration. You can provide Mapper, Reducer, and ApplicationMaster environment overrides for the docker images, using the following 3 JVM properties respectively(only for MR jobs): + +* `mapreduce.map.env`: You can override the mapper's image by passing `yarn.nodemanager.docker-container-executor.image-name`=*your_image_name* to this JVM property. + +* `mapreduce.reduce.env`: You can override the reducer's image by passing `yarn.nodemanager.docker-container-executor.image-name`=*your_image_name* to this JVM property. + +* `yarn.app.mapreduce.am.env`: You can override the ApplicationMaster's image by passing `yarn.nodemanager.docker-container-executor.image-name`=*your_image_name* to this JVM property. + +Docker Image Requirements +------------------------- + +The Docker Images used for YARN containers must meet the following requirements: + +The distro and version of Linux in your Docker Image can be quite different from that of your NodeManager. (Docker does have a few limitations in this regard, but you're not likely to hit them.) However, if you're using the MapReduce framework, then your image will need to be configured for running Hadoop. Java must be installed in the container, and the following environment variables must be defined in the image: JAVA_HOME, HADOOP_COMMON_PATH, HADOOP_HDFS_HOME, HADOOP_MAPRED_HOME, HADOOP_YARN_HOME, and HADOOP_CONF_DIR + +Working example of yarn launched docker containers +-------------------------------------------------- + +The following example shows how to run teragen using DockerContainerExecutor. + +Step 1. First ensure that YARN is properly configured with DockerContainerExecutor(see above). + +```xml +<property> + <name>yarn.nodemanager.docker-container-executor.exec-name</name> + <value>docker -H=tcp://0.0.0.0:4243</value> + <description> + Name or path to the Docker client. The tcp socket must be + where docker daemon is listening. + </description> +</property> + +<property> + <name>yarn.nodemanager.container-executor.class</name> + <value>org.apache.hadoop.yarn.server.nodemanager.DockerContainerExecutor</value> + <description> + This is the container executor setting that ensures that all +jobs are started with the DockerContainerExecutor. + </description> +</property> +``` + +Step 2. Pick a custom Docker image if you want. In this example, we'll use sequenceiq/hadoop-docker:2.4.1 from the docker hub repository. It has jdk, hadoop, and all the previously mentioned environment variables configured. + +Step 3. Run. + +```bash +hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-${project.version}.jar \ + teragen \ + -Dmapreduce.map.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.4.1" \ + -Dyarn.app.mapreduce.am.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.4.1" \ + 1000 \ + teragen_out_dir +``` + + Once it succeeds, you can check the yarn debug logs to verify that docker indeed has launched containers. +