Chris Riccomini created SAMZA-348:
-------------------------------------
Summary: Configure Samza jobs through a stream
Key: SAMZA-348
URL: https://issues.apache.org/jira/browse/SAMZA-348
Project: Samza
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Chris Riccomini
Samza's existing config setup is problematic for a number of reasons:
# It's completely immutable once a job starts. This prevents any dynamic
reconfiguration and auto-scaling. It is debatable whether we want these feature
or not, but our existing implementation actively prevents it. See SAMZA-334 for
discussion.
# We pass existing configuration through environment variables. YARN exports
environment variables in a shell script, which limits the size to the varargs
length on the machine. This is usually ~128KB. See SAMZA-333 and SAMZA-337 for
details.
# User-defined configuration (the Config object) and programmatic configuration
(checkpoints and TaskName:State mappings (see SAMZA-123)) are handled
differently. It's debatable whether this makes sense.
In SAMZA-123, [~jghoman] and I propose implementing a ConfigLog. This log would
replace both the checkpoint topic and the existing config environment variables
in SamzaContainer and Samza's YARN AM.
I'd like to keep this ticket's scope limited to just the implementation of the
ConfigLog, and not re-designing how Samza's config is used in the code
(SAMZA-40). We should, however, discuss how this feature would affect dynamic
reconfiguration/auto-scaling.
--
This message was sent by Atlassian JIRA
(v6.2#6252)