I have experienced issues with ES and HDFS indexing in production and have 
previously split out the topologies into two separate topologies.  As you state 
the benefits of this approach are (a) tuning each topology separately, (b) 
ability to attribute problems to a specific topology (why is something slow?) 
and (c) graceful degradation.  When ES, for example, fails partially or 
catastrophically and your ES topology goes all kinds of crazy, HDFS topology 
keeps humming along unaffected.  Once Metron-1205 is in you will be able to 
re-index into ES (or potentially other sources) from HDFS at will.  The major 
con for this architecture is that there is a greater chance that all your data 
sources will get out of sync because they index/store data at different rates.  
But even given that I would vote +1 on splitting out the topologies. 

25.09.2017, 09:37, "Casey Stella" <[email protected]>:
> One of the lessons that have bubbled up in doing some performance analysis
> is that having the indexing topology share both the ES and the HDFS writer
> in the same topology can be problematic from a tuning perspective.
> Specifically, it's hard to square that circle and make both perform fast
> enough to not cause significant back-pressure in kafka (and often Commit
> Exceptions in the kafka spout).
>
> I wanted to get the community's opinion about the possibility of separating
> the two current writers into separate topologies which could be tuned
> separately.
>
> Pros:
>
>    - Practically speaking, tuning separately is often a lot easier than
>    trying to tune together
>    - This opens us up with the beginnings of an abstraction that may be
>    reusable to expose new indexers to Metron
>
> Cons:
>
>    - It has the potential to mask a problem. We may want to ensure that
>    the writers write at the same rate and don't get far ahead of one another.
>    In the current setup, this is inherent in the design. If we separate them,
>    they may be reading at different rates and one index may get ahead of the
>    other.
>    - The management pack section around indexing would need to be
>    reconsidered if we split them up
>
> Personally, I'm strongly in favor of splitting them up, but I want to make
> sure that we don't miss an important nuance here. The first con is
> concerning to me, but I'd argue that another lesson from performance tuning
> is that we need to monitor the average partition lag over time in the
> management UI for the various consumer groups and ensure that writing keeps
> up with reading. If we insist on this assertion being true for all healthy
> metron installations, the primary con goes away in my mind.
>
> Anyway, I'm sure I've missed some pros and cons, so it'd be great to hear
> community feedback here. Thoughts?

------------------- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Reply via email to