[GitHub] [flink] rmetzger commented on a change in pull request #14238: [FLINK-20347] Rework YARN documentation page

GitBox Fri, 27 Nov 2020 09:29:33 -0800


rmetzger commented on a change in pull request #14238:
URL: https://github.com/apache/flink/pull/14238#discussion_r531709904




##########
File path: docs/deployment/resource-providers/yarn.md
##########
@@ -0,0 +1,200 @@
+---
+title:  "Apache Hadoop YARN"
+nav-title: YARN
+nav-parent_id: resource_providers
+nav-pos: 4
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Getting Started
+
+This *Getting Started* section guides you through setting up a fully 
functional Flink Cluster on YARN.
+
+### Introduction
+
+[Apache Hadoop 
YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
 is a resource provider popular with many data processing frameworks.
+Flink services are submitted to YARN's ResourceManager, which spawns 
containers on machines managed by YARN NodeManagers. Flink deploys its 
JobManager and TaskManager instances into such containers.
+
+Flink can dynamically allocate and de-allocate TaskManager resources depending 
on the number of processing slots required by the job(s) running on the 
JobManager.
+
+### Preparation
+
+This *Getting Started* section assumes a functional YARN environment, starting 
from version 2.4.1. YARN environments are provided most conveniently through 
services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. 
[Manually setting up a YARN environment 
locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html)
 or [on a 
cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html)
 is not recommended for following through this *Getting Started* tutorial. 
+
+
+- Make sure your YARN cluster is ready for accepting Flink applications by 
running `yarn top`. It should show no error messages.
+- Download a recent Flink distribution from the [download page]({{ 
site.download_url }}) and unpack it.
+- **Important** Make sure that the `HADOOP_CLASSPATH` environment variable is 
set up (it can be checked by running `echo $HADOOP_CLASSPATH`). If not, set it 
up using 
+
+{% highlight bash %}
+export HADOOP_CLASSPATH=`hadoop classpath`
+{% endhighlight %}
+
+
+### Starting a Flink Session on YARN

Review comment:
       I think it only makes sense to mention `HADOOP_CONF_DIR`. Afaik 
`HADOOP_HOME` is from Hadoop 1.x times, and deprecated.
   
   I will mention  `HADOOP_CONF_DIR` in the reference section about the 
configuration, to not confuse new users in the beginners section.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rmetzger commented on a change in pull request #14238: [FLINK-20347] Rework YARN documentation page

Reply via email to