[DISCUSS] Upgrading HBase and Kafka support

Ryan Merriman Fri, 08 Mar 2019 06:48:17 -0800

I have been researching the effort involved to upgrade to HDP 3.  Along the
way I've found a couple challenging issues that we will need to solve, both
involving our integration testing strategy.


The first issue is Kafka.  We are moving from 0.10.0 to 2.0.0 and there
have been significant changes to the API.  This creates an issue in the
KafkaComponent class, which we use as an in-memory Kafka server in
integration tests.  Most of the classes that were previously used have gone
away, and to the best of my knowledge, were not supported as public APIs.
I also don't see any publicly documented APIs to replace them.

The second issue is HBase.  We are moving from 1.1.2 to 2.0.2 so another
significant change.  This creates an issue in the MockHTable class
becausethe HTableInterface class has changed to Table, essentially
requiring that MockHTable be rewritten to conform to the new interface.
It's my opinion that this class is complicated and difficult to maintain as
it is anyways.

These 2 issues have the potential to add a significant amount of work to
upgrading Metron to HDP 3.  I want to take a step back and review our
options before we move forward.  Here are some initial thoughts I had on
how to approach this.  For HBase:

   1. Update MockHTable to work with the new HBase API.  We would continue
   using a mock server approach for HBase.
   2. Research replacing MockHTable with an in-memory HBase server.
   3. Replace MockHTable with a Docker container running HBase.

For Kafka:

   1. Replace KafkaComponent with a mock server implementation.
   2. Update KafkaComponent to work with the new API.  We would probably
   need to leverage some internal Kafka classes.  I do not see a testing API
   documented publicly.
   3. Replace KafkaComponent with a Docker container running Kafka.

What other options are there?  Whatever we choose I think we should follow
a similar approach for both (mock servers, in memory servers, Docker, other
options I'm not thinking of).

This will not shock anyone but I would be in favor of Docker containers.
They have the advantage of classpath isolation, easy upgrades, and accurate
integration testing.  The downside is we will have to adjusts our tests and
travis script to incorporate these Docker containers into our build
process.  We have discussed this at length in the past and it has generally
stalled for various reasons.  Maybe if we move a few services at a time it
might be more palatable?  As for the other 2 approaches, I think if either
worked well we wouldn't be having this discussion.  Mock servers are hard
to maintain and I don't see in memory testing classes documented in
javadocs for either service.

Thoughts?

[DISCUSS] Upgrading HBase and Kafka support

Reply via email to