[GitHub] incubator-flink pull request: [FLINK-1157] Document TaskManager sl...

uce Mon, 01 Dec 2014 10:10:28 -0800

Github user uce commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/246#discussion_r21106379
  
    --- Diff: docs/config.md ---
    @@ -266,3 +272,79 @@ So if `yarn.am.rpc.port` is configured to `10245` and 
the session's application
     
     - `yarn.am.rpc.port`: The port that is being opened by the Application 
Master (AM) to 
     let the YARN client connect for an RPC serice. (DEFAULT: Port 10245)
    +
    +
    +## Background
    +
    +### Configuring the Network Buffers
    +
    +Network buffers are a critical resource for the communication layers. They 
are
    +used to buffer records before transmission over a network, and to buffer
    +incoming data before dissecting it into records and handing them to the
    +application. A sufficient number of network buffers are critical to 
achieve a
    +good throughput.
    +
    +In general, configure the task manager to have so many buffers that each 
logical
    +network connection on you expect to be open at the same time has a 
dedicated
    +buffer. A logical network connection exists for each point-to-point 
exchange of
    +data over the network, which typically happens at repartitioning- or
    +broadcasting steps. In those, each parallel task inside the TaskManager 
has to
    +be able to talk to all other parallel tasks. Hence, the required number of
    +buffers on a task manager is *total-degree-of-parallelism* (number of 
targets)
    +\* *intra-node-parallelism* (number of sources in one task manager) \* *n*.
    +Here, *n* is a constant that defines how many repartitioning-/broadcasting 
steps
    +you expect to be active at the same time.
    +
    +Since the *intra-node-parallelism* is typically the number of cores, and 
more
    +than 4 repartitioning or broadcasting channels are rarely active in 
parallel, it
    +frequently boils down to *\#cores\^2\^* \* *\#machines* \* 4. To support 
for
    +example a cluster of 20 8-core machines, you should use roughly 5000 
network
    +buffers for optimal throughput.
    +
    +Each network buffer is by default 64 KiBytes large. In the above example, 
the
    +system would allocate roughly 300 MiBytes for network buffers.
    +
    +The number and size of network buffers can be configured with the following
    +parameters:
    +
    +- `taskmanager.network.numberOfBuffers`, and
    +- `taskmanager.network.bufferSizeInBytes`.
    +
    +### Configuring Temporary I/O Directories
    +
    +Although Flink aims to process as much data in main memory as possible,
    +it is not uncommon that  more data needs to be processed than memory is
    +available. Flink's runtime is designed to  write temporary data to disk
    +to handle these situations.
    +
    +The `taskmanager.tmp.dirs` parameter specifies a list of directories into 
which
    +Flink writes temporary files. The paths of the directories need to be
    +separated by ':' (colon character).  Flink will concurrently write (or
    +read) one temporary file to (from) each configured directory.  This way,
    +temporary I/O can be evenly distributed over multiple independent I/O 
devices
    +such as hard disks to improve performance.  To leverage fast I/O devices 
(e.g.,
    +SSD, RAID, NAS), it is possible to specify a directory multiple times.
    +
    +If the `taskmanager.tmp.dirs` parameter is not explicitly specified,
    +Flink writes temporary data to the temporary  directory of the operating
    +system, such as */tmp* in Linux systems.
    +
    +
    +### Configuring TaskManager processing slots
    +
    +A processing slot allows Flink to execute a distributed DataSet 
transformation, such as a
    +data source or a map-transformation.
    +
    +Each Flink TaskManager provides processing slots in the cluster. The 
number of slots
    +is typically proportional to the number of available CPU cores __of each__ 
TaskManager.
    +As a general recommendation, the number of available CPU cores is a good 
default for 
    +`taskmanager.numberOfTaskSlots`.
    +
    +When starting a Flink application, users can supply the default number of 
slots to use for that job.
    +The command line value therefore is called `-p` (for parallelism). In 
addition, it is possible
    +to [set the number of slots in the programming 
APIs](programming_guide.html#parallel-execution) for 
    +the whole application and individual operators.
    +
    +Flink is currently scheduling an application to slots by "filling" them 
up. 
    +If the cluster has 20 machines with 2 slots each (40 slots in total) but 
the application is running
    +with a parallelism of 20, only 10 machines are processing data.
    --- End diff --
    
    are processing => will process



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-1157] Document TaskManager sl...

Reply via email to