[
https://issues.apache.org/jira/browse/KAFKA-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241143#comment-14241143
]
Joe Stein commented on KAFKA-1207:
----------------------------------
Hey [~jayson.minard] we have gone back and forth the last year between "build a
scheduler" just for Kafka or "build an executor layer that works in
Marathon/Aurora". What we did first was give Aurora a shot since it already has
an executor (Thermus) and see about getting Kafka to run there. That script is
here https://github.com/stealthly/borealis/blob/master/scripts/kafka.aurora for
doing what we did. It relied on an undocumented feature in Aurora that we used
which Bill Farner talked about here when I spoke with him on a podcast
http://allthingshadoop.com/2014/10/26/resource-scheduling-and-task-launching-with-apache-mesos-and-apache-aurora-at-twitter/
Anyways, there were/are issues with that implementation so we decided then to
give Marathon https://mesosphere.github.io/marathon/docs/ a try. We started off
with this code as a pattern to use
https://github.com/brndnmtthws/kafka-on-marathon and so far it is working out
great. It definitely added more work on our side but it is running and doing
exactly what we expect.
We have been speaking with others about this too and think we could come up
with a standalone scheduler that would work out of the box. I don't know if it
makes sense though for that to be a JVM process though. We were thinking of
writing it in Go. One *VERY* important reason to have another shell launching
Kafka is because you want to be able to change scripts and bounce brokers (you
kind of have to do this) and if you rolling restart or something your tasks
Mesos will schedule them to wherever it wants. Some Kafka improvements are
coming that mitigate that some
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Command+Line+and+Related+Improvements
but I don't think it would ever be 100% (Kafka is not like Storm or Spark in
how it runs). On the Mesos side you can manage this with roles and constraints
but at the end of the day you are dealing with a *persistent* server. The way
we have gotten around this is using the shell script as an agent that can fetch
the updates configs and do restart of the process, etc, etc, etc. There is new
feature coming out in Mesos https://issues.apache.org/jira/browse/MESOS-1554
that will make this better however I still like the supervisor shell script
strategy ... we could morph the supervisor shell script strategy as a custom
scheduler/executor (framework) for Kafka (absolutely) but I am not sure if the
project would accept Go code for this feature or not? I would be +1 on it
going in and have a few engineers available to work on it over the next 1-2
months. We could also write the whole thing in Java or Scala too though I still
don't know if that is going to make it any easier/better to support in the
community vs Go.
Would love more thoughts and discussions on this here.
> Launch Kafka from within Apache Mesos
> -------------------------------------
>
> Key: KAFKA-1207
> URL: https://issues.apache.org/jira/browse/KAFKA-1207
> Project: Kafka
> Issue Type: Bug
> Reporter: Joe Stein
> Labels: mesos
> Fix For: 0.9.0
>
> Attachments: KAFKA-1207.patch, KAFKA-1207_2014-01-19_00:04:58.patch,
> KAFKA-1207_2014-01-19_00:48:49.patch
>
>
> There are a few components to this.
> 1) The Framework: This is going to be responsible for starting up and
> managing the fail over of brokers within the mesos cluster. This will have
> to get some Kafka focused paramaters for launching new replica brokers,
> moving topics and partitions around based on what is happening in the grid
> through time.
> 2) The Scheduler: This is what is going to ask for resources for Kafka
> brokers (new ones, replacement ones, commissioned ones) and other operations
> such as stopping tasks (decommissioning brokers). I think this should also
> expose a user interface (or at least a rest api) for producers and consumers
> so we can have producers and consumers run inside of the mesos cluster if
> folks want (just add the jar)
> 3) The Executor : This is the task launcher. It launches tasks kills them
> off.
> 4) Sharing data between Scheduler and Executor: I looked at the a few
> implementations of this. I like parts of the Storm implementation but think
> using the environment variable
> ExectorInfo.CommandInfo.Enviornment.Variables[] is the best shot. We can
> have a command line bin/kafka-mesos-scheduler-start.sh that would build the
> contrib project if not already built and support conf/server.properties to
> start.
> The Framework and operating Scheduler would run in on an administrative node.
> I am probably going to hook Apache Curator into it so it can do it's own
> failure to a another follower. Running more than 2 should be sufficient as
> long as it can bring back it's state (e.g. from zk). I think we can add this
> in after once everything is working.
> Additional detail can be found on the Wiki page
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38570672
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)