[jira] [Commented] (SAMZA-516) Support standalone Samza jobs

Jay Kreps (JIRA) Thu, 22 Jan 2015 16:28:30 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288494#comment-14288494
 ]


Jay Kreps commented on SAMZA-516:
---------------------------------

This is great.

One gotcha I wasn't sure if this covered is attempting to provide mutual 
exclusion for a partition. If a new process is started it needs to get an 
assignment. It is important that whomever currently has the assignment has 
ceased consuming it's prior assignment, check pointed offsets, and flushed all 
results prior to the new process taking over that partition.

Some random thoughts:
0. I really really think that a simple main method stub that runs the job is 
what you want. Don't try to make a daemon that runs multiple jobs--that is what 
mesos/yarn are for.
1. Not sure why you need any new layers here? I see this as just a wrapper 
around the container that handles partition changes. You could still think of 
this as "the container".
2. I suspect separating the coordinator from the job as separate processes will 
isolate the coordinator from job GC issues but it will also add a lot of 
complexity--e.g. now one can fail but not the other. For stream processing I 
suspect long default zk session timeouts would solve the problem just as well 
and be much simpler.
3. Why would you ever have more than one job for a sql query? Even if it 
repartitions many times can't you do that all in one job that just has a big 
switch statement over the inputs? Operationally I think one job per query would 
make things much easier. Separating into multiple jobs "isolates" them, but you 
can't really isolate a query from itself. Does this introduce scheduling issues 
or something?

> Support standalone Samza jobs
> -----------------------------
>
>                 Key: SAMZA-516
>                 URL: https://issues.apache.org/jira/browse/SAMZA-516
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.9.0
>            Reporter: Chris Riccomini
>            Assignee: Chris Riccomini
>
> Samza currently supports two modes of operation out of the box: local and 
> YARN. With local mode, a single Java process starts the JobCoordinator, 
> creates a single container, and executes it locally. All partitions are 
> procesed within this container.  With YARN, a YARN grid is required to 
> execute the Samza job. In addition, SAMZA-375 introduces a patch to run Samza 
> in Mesos.
> There have been several requests lately to be able to run Samza jobs without 
> any resource manager (YARN, Mesos, etc), but still run it in a distributed 
> fashion.
> The goal of this ticket is to design and implement a samza-standalone module, 
> which will:
> # Support executing a single Samza job in one or more containers.
> # Support failover, in cases where a machine is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-516) Support standalone Samza jobs

Reply via email to