One of the lessons that have bubbled up in doing some performance analysis is that having the indexing topology share both the ES and the HDFS writer in the same topology can be problematic from a tuning perspective. Specifically, it's hard to square that circle and make both perform fast enough to not cause significant back-pressure in kafka (and often Commit Exceptions in the kafka spout).
I wanted to get the community's opinion about the possibility of separating the two current writers into separate topologies which could be tuned separately. Pros: - Practically speaking, tuning separately is often a lot easier than trying to tune together - This opens us up with the beginnings of an abstraction that may be reusable to expose new indexers to Metron Cons: - It has the potential to mask a problem. We may want to ensure that the writers write at the same rate and don't get far ahead of one another. In the current setup, this is inherent in the design. If we separate them, they may be reading at different rates and one index may get ahead of the other. - The management pack section around indexing would need to be reconsidered if we split them up Personally, I'm strongly in favor of splitting them up, but I want to make sure that we don't miss an important nuance here. The first con is concerning to me, but I'd argue that another lesson from performance tuning is that we need to monitor the average partition lag over time in the management UI for the various consumer groups and ensure that writing keeps up with reading. If we insist on this assertion being true for all healthy metron installations, the primary con goes away in my mind. Anyway, I'm sure I've missed some pros and cons, so it'd be great to hear community feedback here. Thoughts?