[jira] [Commented] (SAMZA-348) Configure Samza jobs through a stream

David Chen (JIRA) Wed, 17 Sep 2014 15:55:43 -0700

    [ 
https://issues.apache.org/jira/browse/SAMZA-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138156#comment-14138156
 ]


David Chen commented on SAMZA-348:
----------------------------------

Another idea would be to implement a configuration DSL using a scripting 
language like Python, which is both easy to implement and also allows you to 
embed Python code in your configuration script.

I prefer to have a declarative DSL such as the one used by [Google's build 
system, 
Blaze|http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html].
 I found a more detailed example can be found in [this GitHub 
Gist|https://gist.github.com/wiseman/3834928].

This would not be difficult to implement since these statements are simply 
Python function calls and since the DSL is valid Python code, it is also 
possible to have regular Python code in your configuration script. This way, 
after each statement is evaluated, the Samza client program can either compile 
it into JProperties (as a stop-gap solution) or turn it into a Kafka message 
and publish it to the configuration stream.

Of course, we can have both a command line program and a DSL, and I am pretty 
sure that as Samza takes off, people would want to start writing DSLs and 
clients for other languages as well. The key would be to make sure that the 
common interface the DSLs and tools talk to is solid.

I have opened SAMZA-416 to discuss the DSL further.

> Configure Samza jobs through a stream
> -------------------------------------
>
>                 Key: SAMZA-348
>                 URL: https://issues.apache.org/jira/browse/SAMZA-348
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Chris Riccomini
>              Labels: project
>         Attachments: DESIGN-SAMZA-348-0.md, DESIGN-SAMZA-348-0.pdf
>
>
> Samza's existing config setup is problematic for a number of reasons:
> # It's completely immutable once a job starts. This prevents any dynamic 
> reconfiguration and auto-scaling. It is debatable whether we want these 
> feature or not, but our existing implementation actively prevents it. See 
> SAMZA-334 for discussion.
> # We pass existing configuration through environment variables. YARN exports 
> environment variables in a shell script, which limits the size to the varargs 
> length on the machine. This is usually ~128KB. See SAMZA-333 and SAMZA-337 
> for details.
> # User-defined configuration (the Config object) and programmatic 
> configuration (checkpoints and TaskName:State mappings (see SAMZA-123)) are 
> handled differently. It's debatable whether this makes sense.
> In SAMZA-123, [~jghoman] and I propose implementing a ConfigLog. This log 
> would replace both the checkpoint topic and the existing config environment 
> variables in SamzaContainer and Samza's YARN AM.
> I'd like to keep this ticket's scope limited to just the implementation of 
> the ConfigLog, and not re-designing how Samza's config is used in the code 
> (SAMZA-40). We should, however, discuss how this feature would affect dynamic 
> reconfiguration/auto-scaling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-348) Configure Samza jobs through a stream

Reply via email to