[ 
https://issues.apache.org/jira/browse/SAMZA-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167459#comment-14167459
 ] 

Chris Riccomini commented on SAMZA-40:
--------------------------------------

Samza's deployment model very much models Hadoop's--it only knows about jobs. 
It is up to other tooling built on top of Samza to provide topology 
abstractions (just like Oozie, Azkaban, etc). This assumption has been baked in 
from the beginning. The reasoning for not wanting topologies is that they don't 
model how things are really working.

# In theory, a bunch of jobs are wired together in a topology, and they all 
know about each other. In practice, we're talking about multi-subscriber 
streams that connect the jobs. Anyone may consume or produce to these streams 
(including non-samza jobs). So, even if you have a topology defined, it doesn't 
always reflect reality.
# Many topologies have jobs owned by different developers, or teams. This is 
problematic, as it forces a shared code base (and usually a shared deployment 
schedule), which might not be desirable.
# Topologies tend to force a deployment model where multiple jobs are deployed 
at once, which is not desirable.

Anecdotally, I've spoken to more than one person who's used another stream 
processing framework that uses topologies, and they've ended up just writing 
one job per-topology, to circumvent the problems that I defined above.

As far as preventing/catching mismatches in errors/partitioning, I think this 
is one of the things that a layer on top of Samza should provide (e.g. a SQL 
layer). There is probably also some opportunity to address this within a single 
job's config (e.g. defining a join job, and validating that all partitions for 
all input streams match), but I haven't thought much about that part of it.

> Refactor Samza configuration
> ----------------------------
>
>                 Key: SAMZA-40
>                 URL: https://issues.apache.org/jira/browse/SAMZA-40
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>              Labels: project
>
> Samza's configuration system has several problems that we need to resolved.
> * Want to auto-generate documentation based off of configuration.
> * Should support global defaults for a config property. Right now, we do 
> config.getFoo.getOrElse() everywhere.
> * Should validate config up front, rather than thrown runtime exceptions 
> randomly throughout the code.
> * We are mixing wiring and configuration together. How do other systems 
> handle this?
> * We have fragmented configuration (anybody can define configuration). How do 
> other systems handle this?
> * How to handle undefined configuration? How to make this interoperable with 
> both Java and Scala (i.e. should we support Option in Scala)?
> * Should remain immutable.
> * Should remove implicits. It's just confusing.
> * Do we want to support complex types (list, map) for values, not just String?
> We need a design proposal for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to