GitHub user mxm opened a pull request:

    https://github.com/apache/flink/pull/858

    [FLINK-2097] Implement job session management

    This is a joint effort by @StephanEwen and me to introduce a session 
management in Flink. Session are used to keep a copy of the ExecutionGraph in 
the job manager for the session lifetime. It is important that the 
ExecutionGraph is not kept around longer because it consumes a lot of memory. 
Its intermediate results can also be freed. To integrate sessions properly into 
Flink, some refactoring was necessary. In particular these are:
    
    - JobId is created through the ExecutionEnvironment and passed through
    - Sessions can be termined by the ExecutionEnvironment or directly through 
the executor
    - Session are cancelled implicitly through "reapers" or shutdown hooks in 
the ExecutionEnvironment, otherwise they time out
    - LocalExecutor and RemoteExecutor manage sessions
    - The Client only deals with the communication with the job manager and is 
agnostic of session management
    
    With the session management, we will be able to properly support 
backtracking of produced intermediate results. This makes calls to 
count()/collect()/print() efficient and enables to write 
incremental/interactive jobs. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mxm/flink session-dev

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/858.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #858
    
----
commit 9852d392bfe69b056596acfba001ab0a574f0ac0
Author: Maximilian Michels <m...@apache.org>
Date:   2015-05-13T15:06:47Z

    [FLINK-2097] [core] Implement job session management.
    
    Sessions make sure that the JobManager does not immediately discard a 
JobGraph after execution, but keeps it
    around for further operations to be attached to the graph. That is the 
basis if interactive sessions.
    
    This pull request implements a rudimentary session management. Together 
with the backtracking #640,
    this will enable users to submit jobs to the cluster and access 
intermediate results. Session handling ensures that the results are cleared 
eventually.
    
    ExecutionGraphs are kept as long as
    - no timeout occurred or
    - the session has not been explicitly ended

commit 65464ad19d39a29d41d071b2a4524b414e297147
Author: Stephan Ewen <se...@apache.org>
Date:   2015-05-29T12:35:33Z

    [FLINK-2097] [core] Improve session management.
    
     - The Client manages only connections to the JobManager, it is not job 
specific
     - Executors provide a more explicit life cycle and methods to start new 
sessions
     - Sessions are handled by the environments
     - The environments use reapers (local) and shutdown hooks (remote) to 
ensure session termination
       when the environment runs out of scope

commit 6d89edd4a63fa3971c0246f46c7b8c98f3fc6c30
Author: Maximilian Michels <m...@apache.org>
Date:   2015-06-18T14:38:09Z

    [FLINK-2097] [core] Finalize session management

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to