[jira] [Commented] (FLINK-2540) LocalBufferPool.requestBuffer gets into infinite loop

ASF GitHub Bot (JIRA) Wed, 19 Aug 2015 18:58:37 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704151#comment-14704151
 ]


ASF GitHub Bot commented on FLINK-2540:
---------------------------------------

GitHub user uce opened a pull request:

    https://github.com/apache/flink/pull/1036

    [FLINK-2540] [optimizer] [runtime] Propagate union batch exchanges to union 
inputs

    The DataExchangeMode of union nodes was not respected when translating an 
OptimizedPlan to a JobGraph. This could result in deadlocks, when a branched 
data flow was closed.
    
    Union nodes with a batch exchange will propagate their exchange mode to all 
inputs of their inputs when the JobGraph is generated.
    
    This PR adds a test for this and propagates the data exchange mode of union 
nodes to their input in the JobGraphGenerator.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uce/flink union_exchange-2540

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1036.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1036
    
----
commit bac21bf5d77c8e15c608ecbf006d29e7af1dd68a
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2015-07-23T13:12:38Z

    [FLINK-2398][api-breaking] Introduce StreamGraphGenerator
    
    This decouples the building of the StreamGraph from the API methods.
    Before the methods would build the StreamGraph as they go. Now the API
    methods build a hierachy of StreamTransformation nodes. From these a
    StreamGraph is generated upon execution.
    
    This also introduces some API breaking changes:
    
     - The result of methods that create sinks is now DataStreamSink instead
       of DataStream
     - Iterations cannot have feedback edges with differing parallelism
     - "Preserve partitioning" is not the default for feedback edges. The
       previous option for this is removed.
     - You can close an iteration several times, no need for a union.
     - Strict checking of whether partitioning and parallelism work
       together. I.e. if upstream and downstream parallelism don't match it
       is not legal to have Forward partitioning anymore. This was not very
       transparent: When you went from low parallelism to high dop some
       downstream  operators would never get any input. When you went from high
       parallelism to low dop you would get skew in the downstream operators
       because all elements that would be forwarded to an operator that is not
       "there" go to another operator. This requires insertion of global()
       or rebalance() in some places. For example with most sources which
       have parallelism one.
    
    This also makes StreamExecutionEnvironment.execute() behave consistently
    across different execution environments (local, remote ...): The list of
    operators to be executed are cleared after execute is called.

commit cbaccac630d579a9d7d7237aba5a40010e5c1814
Author: Ufuk Celebi <u...@apache.org>
Date:   2015-08-19T22:38:37Z

    [FLINK-2540] [optimizer] [runtime] Propagate union batch exchanges to union 
inputs
    
    The DataExchangeMode of union nodes was not respected when translating an 
OptimizedPlan
    to a JobGraph. This could result in deadlocks, when a branched data flow 
was closed.
    
    Union nodes with a batch exchange will propagate their exchange mode to all 
inputs of
    their inputs when the JobGraph is generated.

----


> LocalBufferPool.requestBuffer gets into infinite loop
> -----------------------------------------------------
>
>                 Key: FLINK-2540
>                 URL: https://issues.apache.org/jira/browse/FLINK-2540
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>            Reporter: Gabor Gevay
>            Assignee: Ufuk Celebi
>            Priority: Blocker
>             Fix For: 0.10, 0.9.1
>
>
> I'm trying to run a complicated computation that looks like this: [1].
> One of the DataSource->Filter->Map chains finishes fine, but the other one 
> freezes. Debugging shows that it is spinning in the while loop in 
> LocalBufferPool.requestBuffer.
> askToRecycle is false. Both numberOfRequestedMemorySegments and 
> currentPoolSize is 128, so it never goes into that if either.
> This is a stack trace: [2]
> And here is the code, if you would like to run it: [3]. Unfortunately, I 
> can't make it more minimal, becuase if I remove some operators, the problem 
> disappears. The class to start is malom.Solver. (On first run, it calculates 
> some lookuptables for a few minutes, and puts them into /tmp/movegen)
> [1] http://compalg.inf.elte.hu/~ggevay/flink/plan.txt
> [2] http://compalg.inf.elte.hu/~ggevay/flink/stacktrace.txt
> [3] https://github.com/ggevay/flink/tree/deadlock-malom



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2540) LocalBufferPool.requestBuffer gets into infinite loop

Reply via email to