+1 to the split. I also feel it's much easier to dissect problems when these actions are separated. It's also easier to fine tune each independently, which may have additional performance benefits.
M On Mon, Sep 25, 2017 at 5:31 PM, James Sirota <[email protected]> wrote: > I have experienced issues with ES and HDFS indexing in production and have > previously split out the topologies into two separate topologies. As you > state the benefits of this approach are (a) tuning each topology > separately, (b) ability to attribute problems to a specific topology (why > is something slow?) and (c) graceful degradation. When ES, for example, > fails partially or catastrophically and your ES topology goes all kinds of > crazy, HDFS topology keeps humming along unaffected. Once Metron-1205 is > in you will be able to re-index into ES (or potentially other sources) from > HDFS at will. The major con for this architecture is that there is a > greater chance that all your data sources will get out of sync because they > index/store data at different rates. But even given that I would vote +1 > on splitting out the topologies. > > 25.09.2017, 09:37, "Casey Stella" <[email protected]>: > > One of the lessons that have bubbled up in doing some performance > analysis > > is that having the indexing topology share both the ES and the HDFS > writer > > in the same topology can be problematic from a tuning perspective. > > Specifically, it's hard to square that circle and make both perform fast > > enough to not cause significant back-pressure in kafka (and often Commit > > Exceptions in the kafka spout). > > > > I wanted to get the community's opinion about the possibility of > separating > > the two current writers into separate topologies which could be tuned > > separately. > > > > Pros: > > > > - Practically speaking, tuning separately is often a lot easier than > > trying to tune together > > - This opens us up with the beginnings of an abstraction that may be > > reusable to expose new indexers to Metron > > > > Cons: > > > > - It has the potential to mask a problem. We may want to ensure that > > the writers write at the same rate and don't get far ahead of one > another. > > In the current setup, this is inherent in the design. If we separate > them, > > they may be reading at different rates and one index may get ahead of > the > > other. > > - The management pack section around indexing would need to be > > reconsidered if we split them up > > > > Personally, I'm strongly in favor of splitting them up, but I want to > make > > sure that we don't miss an important nuance here. The first con is > > concerning to me, but I'd argue that another lesson from performance > tuning > > is that we need to monitor the average partition lag over time in the > > management UI for the various consumer groups and ensure that writing > keeps > > up with reading. If we insist on this assertion being true for all > healthy > > metron installations, the primary con goes away in my mind. > > > > Anyway, I'm sure I've missed some pros and cons, so it'd be great to hear > > community feedback here. Thoughts? > > ------------------- > Thank you, > > James Sirota > PPMC- Apache Metron (Incubating) > jsirota AT apache DOT org >
