Re: Architectural reason to split in 4 topologies / impact on the kafka ressources

Michel Sumbul Mon, 25 Jun 2018 15:44:55 -0700

Depending on the source of data, it might be interesting to bypass a step
that the user concider useless.
For example if you have a source of data that dont need profiling and you
want to have it ingested like the other source to allow the  SOC analyst to
use it in there analysis. To have everything at the same place.


How can we bypass it for a specific sensor?

2018-06-25 23:38 GMT+01:00 James Sirota <[email protected]>:

> There is a way to wire the system to bypass enrichment and profiling, but
> you would then bypass a lot of key features of the system.  It would be
> unwise to do that.
>
> 25.06.2018, 15:13, "Michel Sumbul" <[email protected]>:
> > Hi Casey,
> >
> > Thats make completely sense.
> > Short question, if there is no enrichment or no profiling, does the
> message
> > still pass through the enrichment/profiling topic?
> >
> > If yes, do you think its possible to imagine a way that for messages that
> > doesn't need enrichment or profiling to skip the topic and to go directly
> > to the next one? This is again to avoid in/out in kafka.
> >
> > Thanks for the explaination,
> > Michel
> >
> > 2018-06-23 3:58 GMT+01:00 Casey Stella <[email protected]>:
> >
> >>  Hey Michel,
> >>
> >>  Those are good questions and there were some reasons surrounding that.
> In
> >>  fact, historically, we had fewer topologies (e.g. indexing and
> enrichment
> >>  were merged). Even earlier on, we had just one giant topology per
> parser
> >>  that enriched and indexed. The long story short is that we moved this
> way
> >>  because we saw how people were using metron and we gained more insight
> >>  tuning Metron. That led us down this architectural path.
> >>
> >>  Some of the reasons that we went this way:
> >>
> >>     - Fewer large topologies were a nightmare to tune
> >>        - Enrichment would have different memory requirements than, say,
> >>        parsers or indexing
> >>        - You can adjust the kafka topic params per topology to adjust
> the
> >>        number of partitions, etc.
> >>     - Having the separate topologies gives a natural set of extension
> points
> >>     for customization and enhancement (e.g. you want a phase between
> parsing
> >>     and enrichment).
> >>     - Decoupling the topologies lets us spin up and down parts of Metron
> >>     without affecting others (e.g. you don't have to take down
> enrichments
> >>  to
> >>     add a parser, even for a moment)
> >>     - The movement to Flux meant we were limited in how much we could
> adjust
> >>     the topology at runtime (e.g. colocating parsers and enrichment
> would
> >>  mean
> >>     moving away from flux essentially as the topology changes its
> structure)
> >>
> >>  Best,
> >>
> >>  Casey
> >>
> >>  On Fri, Jun 22, 2018 at 5:25 PM Michel Sumbul <[email protected]>
> >>  wrote:
> >>
> >>  > Hi Everyone,
> >>  >
> >>  > I was asking myself what was the architectural reason to split the
> >>  > ingestion in metron in 4 differents toppologies that all read/write
> to
> >>  > kafka?
> >>  >
> >>  > For example, why the parsing and enrichment topologies have not been
> >>  > merged? Would it not be possible when you parse the message to
> directly
> >>  > enricht it?
> >>  >
> >>  > Im asking that because splitting in several topologies means that
> all of
> >>  > the topologies read/write to Kafka, which produce a bigger load on
> the
> >>  > kafka cluster and then a need for way more infrastructure/servers.
> The
> >>  cost
> >>  > is especially true when we speak about TBs of data ingested every
> day.
> >>  >
> >>  > Im sure there were a very good reason, I was just curious.
> >>  >
> >>  > Thanks,
> >>  > Michel
> >>  >
>
> -------------------
> Thank you,
>
> James Sirota
> PMC- Apache Metron
> jsirota AT apache DOT org
>
>

Re: Architectural reason to split in 4 topologies / impact on the kafka ressources

Reply via email to