[jira] [Created] (SAMZA-348) Configure Samza jobs through a stream

Chris Riccomini (JIRA) Fri, 18 Jul 2014 09:12:12 -0700

Chris Riccomini created SAMZA-348:
-------------------------------------

             Summary: Configure Samza jobs through a stream
                 Key: SAMZA-348
                 URL: https://issues.apache.org/jira/browse/SAMZA-348
             Project: Samza
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Chris Riccomini



Samza's existing config setup is problematic for a number of reasons:

# It's completely immutable once a job starts. This prevents any dynamic 
reconfiguration and auto-scaling. It is debatable whether we want these feature 
or not, but our existing implementation actively prevents it. See SAMZA-334 for 
discussion.
# We pass existing configuration through environment variables. YARN exports 
environment variables in a shell script, which limits the size to the varargs 
length on the machine. This is usually ~128KB. See SAMZA-333 and SAMZA-337 for 
details.
# User-defined configuration (the Config object) and programmatic configuration 
(checkpoints and TaskName:State mappings (see SAMZA-123)) are handled 
differently. It's debatable whether this makes sense.

In SAMZA-123, [~jghoman] and I propose implementing a ConfigLog. This log would 
replace both the checkpoint topic and the existing config environment variables 
in SamzaContainer and Samza's YARN AM.

I'd like to keep this ticket's scope limited to just the implementation of the 
ConfigLog, and not re-designing how Samza's config is used in the code 
(SAMZA-40). We should, however, discuss how this feature would affect dynamic 
reconfiguration/auto-scaling.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (SAMZA-348) Configure Samza jobs through a stream

Reply via email to