[
https://issues.apache.org/jira/browse/SAMZA-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Riccomini updated SAMZA-348:
----------------------------------
Attachment: DESIGN-SAMZA-348-1.pdf
DESIGN-SAMZA-348-1.md
I've updated the design document.
* Updated job coordinator section.
* Added `control-job.sh` CLI examples.
* Added ConfigStream implementation and protocol sections.
* Added scalability estimations for configuration size, write throughput, and
container heartbeats.
* Added multi-writer race condition and transactionality design section.
* Added Mesos impact section.
* Added YARN AM work-preserving restart impact section.
* Eliminated 'Open Questions' section.
The largest change was in the 'Design Proposal' section, which became much more
detailed and complete. If you only have time to read one section, make it the
'Design Proposal' pages.
> Configure Samza jobs through a stream
> -------------------------------------
>
> Key: SAMZA-348
> URL: https://issues.apache.org/jira/browse/SAMZA-348
> Project: Samza
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Chris Riccomini
> Labels: project
> Attachments: DESIGN-SAMZA-348-0.md, DESIGN-SAMZA-348-0.pdf,
> DESIGN-SAMZA-348-1.md, DESIGN-SAMZA-348-1.pdf
>
>
> Samza's existing config setup is problematic for a number of reasons:
> # It's completely immutable once a job starts. This prevents any dynamic
> reconfiguration and auto-scaling. It is debatable whether we want these
> feature or not, but our existing implementation actively prevents it. See
> SAMZA-334 for discussion.
> # We pass existing configuration through environment variables. YARN exports
> environment variables in a shell script, which limits the size to the varargs
> length on the machine. This is usually ~128KB. See SAMZA-333 and SAMZA-337
> for details.
> # User-defined configuration (the Config object) and programmatic
> configuration (checkpoints and TaskName:State mappings (see SAMZA-123)) are
> handled differently. It's debatable whether this makes sense.
> In SAMZA-123, [~jghoman] and I propose implementing a ConfigLog. This log
> would replace both the checkpoint topic and the existing config environment
> variables in SamzaContainer and Samza's YARN AM.
> I'd like to keep this ticket's scope limited to just the implementation of
> the ConfigLog, and not re-designing how Samza's config is used in the code
> (SAMZA-40). We should, however, discuss how this feature would affect dynamic
> reconfiguration/auto-scaling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)