Re: Question about limitations of iterative algorithms

Fabian Hueske Tue, 03 Nov 2015 06:49:42 -0800

Hi Andre,

Thanks for reaching out to the Flink community!


I am not sure your analysis is based on correct assumptions about Flink's
delta iterations.
Flink's delta iterations do support
- working and solution sets of different types
- worksets that grow and shrink or are completely replaced in each iteration
- solution sets that grow (up to the point of the too few memory segments
error)
- termination via a custom aggregator. So it is max number of iterations,
empty workset, and a custom termination criterion.

The problem of the "too few memory segments" occurs because the solution
set is held in an in-memory hash table.
Spilling that table would result in a significant iteration slowdown. One
solution is to throw more resources at the problem (more nodes, more RAM).
Another work-around is to reduce the amount of managed memory and switch
the solution set hash table to "unmanaged". This will keep the result set
in a regular Java HashMap but this might cause a OOMError and kill the JVM
if it grows too large.
AFAIK, there is no way to shrink the solution set at the moment. It might
be worth to investigate into that direction.

Regarding your questions about the while-not-empty loop. What exactly is
the difference to the default delta iteration. It does stop when the
workset is empty (in addition to the custom termination criterion).
I am not sure if I got all your points. Please let me know, if I got
something wrong.

Thanks,
Fabian

2015-11-03 13:50 GMT+01:00 André Petermann <
peterm...@informatik.uni-leipzig.de>:

> Sorry for redundant post ...
>
>
> Hi all,
>
> We are working on an implementation of Frequent Subgraph Mining using
> Flink. At that, the "too few memory segments error" prevents the most
> promising solution. The problem is not specific for graphs, but all
> iterative problems where
>
> - working and solution sets contain data of different types
> - the working set may grow, shrink or is replaced for each iteration
> - the solution set grows for each iteration
> - the termination criterion is based on data set metrics,
>   e.g. while working set not empty
>
> An illustration of our problem workflow, generalized to graph unspecific
> frequent pattern mining, can be found here:
>
> https://github.com/dbs-leipzig/gradoop/blob/master/dev-support/loopWithIntermediateResultMemory.pdf
>
> Page 1 shows the most promising solution. We started implementing it
> using a for loop. However, the "too few memory segments error" makes it
> untestable. As the iteration body itself is a complex workflow and the
> number of iterations is arbitrary, unrolling it while reserving operator
> memory will be a permanent limitation. Increasing limits or physical
> memory would only delay the problem. The resulting question is:
>
> Would it be possible to implement a while-not-empty or at least a for
> loop, that isn't unrolled and can be executed more memory efficient?
>
> Page 2 shows an alternative solution to our problem using the concept of
> delta iteration. However, Flink delta iteration does neither support
> broadcasting nor working-set independent intermediate results. Page 3
> shows our working solution using two workarounds for those restrictions.
> However, these workarounds lead to unnecessary memory consumption and
> redundant expensive computations. So, in the case the answer to the
> first question is no, a second question:
>
> Would it be possible to extend the delta iteration by the support of
> rich map functions with broadcast sets and the memory of intermediate
> results?
>
> We think, that a while-not-empty loop might be useful for other
> algorithms too, e.g. variable length path search in graphs. Did we miss
> Flink features meeting our requirements? Do you think it's worth to
> create an improvement issue? At that, we'd of course be willing to
> contribute in the form of development.
>
> Best Regards
> Andre
>
> --
> -------------------------------------------
> PhD Student
> University of Leipzig
> Department of Computer Science
> Database Research Group
>
> email: peterm...@informatik.uni-leipzig.de
> web:   dbs.uni-leipzig.de
> -------------------------------------------
>

Re: Question about limitations of iterative algorithms

Reply via email to