+1
On Fri, Sep 5, 2014 at 2:25 PM, Kostas Tzoumas <ktzou...@apache.org> wrote: > +1 for refactoring using Akka, the arguments are overwhelming. > > > On Fri, Sep 5, 2014 at 2:04 PM, Robert Metzger <rmetz...@apache.org> > wrote: > > > I agree with using Akka for RPC. It is ASF 2.0 licensed, seems to have a > > big community [1] and users [2] that depend on the system. > > > > The YARN client is also using the old RPC service. I would like to > rewrite > > it with Akka once we have added it into the other parts of the system, to > > learn it. > > > > > > [1] https://github.com/akka/akka/pulls > > [2] > > http://doc.akka.io/docs/akka/2.0.4/additional/companies-using-akka.html > > > > > > > > On Fri, Sep 5, 2014 at 1:34 PM, Stephan Ewen <se...@apache.org> wrote: > > > > > This proposes to refactor the RPC service and the coordination between > > > Client, JobManager, and TaskManager to use the Akka actor library. > > > > > > Even though Akka is written in Scala, it offers a Java interface and we > > can > > > use Akka completely from Java. > > > > > > Below are a list of arguments why this would help the system: > > > > > > > > > Problems with the current RPC service: > > > -------------------------------------------------------- > > > > > > - No asynchronous calls with callbacks. This is the reason why > several > > > parts of the runtime poll the status, introducing unnecessary latency. > > > > > > - No exception forwarding (many exceptions are simply swallowed), > > making > > > debugging and operation in flaky environments very hard > > > > > > - Limited number of handler threads. The RPC can only handle a fix > > number > > > of concurrent requests, forcing you to maintain separate thread pools > to > > > delegate actions to > > > > > > - No support for primitive data types (or boxed primitives) as > > arguments, > > > everything has to be a specially serializable type > > > > > > - Problematic threading model. The RPC continuously spawns and > > terminates > > > threads > > > > > > > > > > > > Benefits of switching to the Akka actor model: > > > > > > > > > ------------------------------------------------------------------------------- > > > > > > - Akka solves all of the above issues out of the box > > > > > > - The supervisor model allows you to do failure detection of actors. > > That > > > provides a unified way of detecting and handling failures (missing > > > heartbeats, failed calls, ...) > > > > > > - Akka has tools to make stateful actors persistent and restart them > on > > > other machines in cases of failure. That would greatly help in > > implementing > > > "master fail-over", which will become important > > > > > > - You can define many "call targets" (actors). Tasks (on > taskmanagers) > > > can directly call their ExecutionVertex on the JobManager, rather than > > > calling the JobManager, creating a Runnable that looks up the execution > > > vertex, and so on... > > > > > > - The actor model's approach to queue actions on an actor and run the > > one > > > after another makes the concurrency model of the state machine very > > simple > > > and robust > > > > > > - We "outsource" our own concerns about maintaining and improving > that > > > part of the system > > > > > > Greetings, > > > Stephan > > > > > >