[ 
https://issues.apache.org/jira/browse/SAMZA-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135809#comment-14135809
 ] 

Chris Riccomini commented on SAMZA-348:
---------------------------------------

bq. It would be useful to do a rough math on the potential size that this log 
can grow assuming a worst case scenario for cluster setup.

Agreed. Other math that we should do is how many containers we can have running 
if they're polling the job coordinator's HTTP server with a frequency of N 
seconds.

bq.  However, if the container fails when the AM is down, they would not be 
able to start since they cannot get the offsets from the AM.

True. I've been thinking about this a bit more. I think this should be fine. In 
both Mesos and YARN, if the container fails, it actually won't be restarted 
anyway (since restarting the container requires a job coordinator to decide 
which partitions are assigned, which box the container should be on, etc). Even 
if the distributed execution framework were to restart the container, I think 
the desired behavior is to just block until the AM comes back, or to kill 
itself permanently, and wait for the AM to come back and restart it properly.

bq. The AM would need to use a transactionally aware consumer to ensure it 
reads the data in a consistent state.

Good point. Related to this, if we implement the protocol in the ConfigStream 
as key-value (vs. an entire config blob as one value), then you might wish to 
use transactions to atomically write multiple key-value pairs together 
(all-or-nothing) into the ConfigStream. Again, this would require a 
transactional consumer.

I'll update the design docs with this feedback.

> Configure Samza jobs through a stream
> -------------------------------------
>
>                 Key: SAMZA-348
>                 URL: https://issues.apache.org/jira/browse/SAMZA-348
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Chris Riccomini
>              Labels: project
>         Attachments: DESIGN-SAMZA-348-0.md, DESIGN-SAMZA-348-0.pdf
>
>
> Samza's existing config setup is problematic for a number of reasons:
> # It's completely immutable once a job starts. This prevents any dynamic 
> reconfiguration and auto-scaling. It is debatable whether we want these 
> feature or not, but our existing implementation actively prevents it. See 
> SAMZA-334 for discussion.
> # We pass existing configuration through environment variables. YARN exports 
> environment variables in a shell script, which limits the size to the varargs 
> length on the machine. This is usually ~128KB. See SAMZA-333 and SAMZA-337 
> for details.
> # User-defined configuration (the Config object) and programmatic 
> configuration (checkpoints and TaskName:State mappings (see SAMZA-123)) are 
> handled differently. It's debatable whether this makes sense.
> In SAMZA-123, [~jghoman] and I propose implementing a ConfigLog. This log 
> would replace both the checkpoint topic and the existing config environment 
> variables in SamzaContainer and Samza's YARN AM.
> I'd like to keep this ticket's scope limited to just the implementation of 
> the ConfigLog, and not re-designing how Samza's config is used in the code 
> (SAMZA-40). We should, however, discuss how this feature would affect dynamic 
> reconfiguration/auto-scaling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to