Hi Sebastian!

I think this is the right place to ask.

In principle, there are no strong hardware requirements. (Of course, more
main memory and higher I/O bandwidth always help).

The memory size requirement does not grow with the data size, since the
system spills to disk, if needed.

The most important point is the one you touched already, the number of
network buffers. Since the current version can only do streaming exchanges,
you need enough buffers to cover all streams. The rough formula for that
is: #slots * parallelism * 2 * N (where N is the number of concurrent
shuffles you plan to have). Typically a N of 4 is enough.

(The slots is the scheduling unit staring in 0.6. In 0.5 and earlier, you
can think #cores instead of #slots).
(Explanation: When shuffling, each task slot will need two buffers (send
side and receive side) for each target (parallelism many).
In future versions, we plan to automatically distribute memory to the
network stack, but right now this is a parameter to adjust manually.

NOTE: There is currently a shortcoming that makes the memory requirement
grow with the length of the processing pipeline. This is on our list to
solve soon.

Let me know if you have further questions!

Stephan






On Fri, Jul 4, 2014 at 3:46 PM, Kruse, Sebastian <sebastian.kr...@hpi.de>
wrote:

> Hi everyone,
>
> I apologize in advance if that is not the right mailing list for my
> question. If there is a better place for it, please let me know.
>
> Basically, I wanted to ask if you have some statement about the hardware
> requirements of Flink to process larger amounts of data beginning from,
> say, 20 GBs. Currently, I am facing issues in my jobs, e.g., there are not
> enough buffers for safe execution of some operations. Since the machines
> that run my TaskTrackers have unfortunately very limited main memory, I
> cannot increase the number of buffers (and heap space in general) too much.
> Currently, I assigned them 1.5 GB.
>
> So, the exact questions are:
>
> *         Do you have experiences with a suitable HW setup for crunching
> larger amounts of data, maybe from the TU cluster?
>
> *         Are there any configuration tips, you can provide, e.g.
> pertaining to the buffer configuration?
>
> *         Are there any general statements on the growth of Flink's memory
> requirements wrt. to the size of the input data?
>
> Thanks for your help!
> Sebastian
>

Reply via email to