[jira] [Commented] (SAMZA-348) Configure Samza jobs through a stream

Chinmay Soman (JIRA) Mon, 15 Sep 2014 10:06:56 -0700

    [ 
https://issues.apache.org/jira/browse/SAMZA-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134128#comment-14134128
 ]


Chinmay Soman commented on SAMZA-348:
-------------------------------------

This looks pretty good . +1

The HTTP JSON interface with a pull model sounds good to me ! This also makes 
it easy for the user to see what config is actually being used (a common 
problem in distributed systems).

My comments on the open questions:
* Multi writer problem: I think we can make the auto-config by the Samza AM - 
as a tunable property. This should be used when the user does not want to keep 
tuning the config. In addition, maybe it is better for the user to make any 
config related changes from a web based endpoint (maybe hosted in the AM). This 
way, the config hosted by the AM becomes the source of truth and not cfg2 
(something similar to what Azkaban also does).

* Config stream naming:
Maybe we can still standardize this. The configure-job.sh script can take a job 
name for which a config stream is to be written. We can simply wait for the 
Kafka topic deletion to be available - to solve the problem of resetting the 
config.



> Configure Samza jobs through a stream
> -------------------------------------
>
>                 Key: SAMZA-348
>                 URL: https://issues.apache.org/jira/browse/SAMZA-348
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Chris Riccomini
>              Labels: project
>         Attachments: DESIGN-SAMZA-348-0.md, DESIGN-SAMZA-348-0.pdf
>
>
> Samza's existing config setup is problematic for a number of reasons:
> # It's completely immutable once a job starts. This prevents any dynamic 
> reconfiguration and auto-scaling. It is debatable whether we want these 
> feature or not, but our existing implementation actively prevents it. See 
> SAMZA-334 for discussion.
> # We pass existing configuration through environment variables. YARN exports 
> environment variables in a shell script, which limits the size to the varargs 
> length on the machine. This is usually ~128KB. See SAMZA-333 and SAMZA-337 
> for details.
> # User-defined configuration (the Config object) and programmatic 
> configuration (checkpoints and TaskName:State mappings (see SAMZA-123)) are 
> handled differently. It's debatable whether this makes sense.
> In SAMZA-123, [~jghoman] and I propose implementing a ConfigLog. This log 
> would replace both the checkpoint topic and the existing config environment 
> variables in SamzaContainer and Samza's YARN AM.
> I'd like to keep this ticket's scope limited to just the implementation of 
> the ConfigLog, and not re-designing how Samza's config is used in the code 
> (SAMZA-40). We should, however, discuss how this feature would affect dynamic 
> reconfiguration/auto-scaling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-348) Configure Samza jobs through a stream

Reply via email to