Re: Architectural reason to split in 4 topologies / impact on the kafka ressources

James Sirota Mon, 25 Jun 2018 15:38:55 -0700

There is a way to wire the system to bypass enrichment and profiling, but you 
would then bypass a lot of key features of the system.  It would be unwise to 
do that.


25.06.2018, 15:13, "Michel Sumbul" <michelsum...@gmail.com>:
> Hi Casey,
>
> Thats make completely sense.
> Short question, if there is no enrichment or no profiling, does the message
> still pass through the enrichment/profiling topic?
>
> If yes, do you think its possible to imagine a way that for messages that
> doesn't need enrichment or profiling to skip the topic and to go directly
> to the next one? This is again to avoid in/out in kafka.
>
> Thanks for the explaination,
> Michel
>
> 2018-06-23 3:58 GMT+01:00 Casey Stella <ceste...@gmail.com>:
>
>>  Hey Michel,
>>
>>  Those are good questions and there were some reasons surrounding that. In
>>  fact, historically, we had fewer topologies (e.g. indexing and enrichment
>>  were merged). Even earlier on, we had just one giant topology per parser
>>  that enriched and indexed. The long story short is that we moved this way
>>  because we saw how people were using metron and we gained more insight
>>  tuning Metron. That led us down this architectural path.
>>
>>  Some of the reasons that we went this way:
>>
>>     - Fewer large topologies were a nightmare to tune
>>        - Enrichment would have different memory requirements than, say,
>>        parsers or indexing
>>        - You can adjust the kafka topic params per topology to adjust the
>>        number of partitions, etc.
>>     - Having the separate topologies gives a natural set of extension points
>>     for customization and enhancement (e.g. you want a phase between parsing
>>     and enrichment).
>>     - Decoupling the topologies lets us spin up and down parts of Metron
>>     without affecting others (e.g. you don't have to take down enrichments
>>  to
>>     add a parser, even for a moment)
>>     - The movement to Flux meant we were limited in how much we could adjust
>>     the topology at runtime (e.g. colocating parsers and enrichment would
>>  mean
>>     moving away from flux essentially as the topology changes its structure)
>>
>>  Best,
>>
>>  Casey
>>
>>  On Fri, Jun 22, 2018 at 5:25 PM Michel Sumbul <michelsum...@gmail.com>
>>  wrote:
>>
>>  > Hi Everyone,
>>  >
>>  > I was asking myself what was the architectural reason to split the
>>  > ingestion in metron in 4 differents toppologies that all read/write to
>>  > kafka?
>>  >
>>  > For example, why the parsing and enrichment topologies have not been
>>  > merged? Would it not be possible when you parse the message to directly
>>  > enricht it?
>>  >
>>  > Im asking that because splitting in several topologies means that all of
>>  > the topologies read/write to Kafka, which produce a bigger load on the
>>  > kafka cluster and then a need for way more infrastructure/servers. The
>>  cost
>>  > is especially true when we speak about TBs of data ingested every day.
>>  >
>>  > Im sure there were a very good reason, I was just curious.
>>  >
>>  > Thanks,
>>  > Michel
>>  >

------------------- 
Thank you,

James Sirota
PMC- Apache Metron
jsirota AT apache DOT org

Re: Architectural reason to split in 4 topologies / impact on the kafka ressources

Reply via email to