[DISCUSS] Modularizing new WAN TX Batching (and Modularization Efforts in General)

Jacob Barrett Mon, 20 Sep 2021 13:48:50 -0700

Devs,

We need to be doing a better job with implementing new features in a modular 
and plugable way. We have had discussions in the past but we haven’t been the 
best at sticking to it. Most recently we began work on a modified version of 
WAN that supports transactional batching. Rather than implement it in a 
plugable model we modified the existing implementation with a new property that 
changes the internal behavior of the implementation. This is a clear smell that 
what we have should be plugable and modularized. I am not suggesting that we 
run out and define clear public SPIs for everything or come up with some 
complicated plan to re-architect everything into modules but rather that we 
take small steps. I will argue that when we are adding functionality to a core 
service that is the point we should consider steps to break it up into clear 
module components. Think to yourself, what would it take for me to implement 
this new functionality as its own module, meaning its own jar and Gradle 
sub-project. As you begin to develop the solution you may find you need some 
clean interfaces for it to extend from the core, that might be the start of an 
internal SPI. You may find that some concrete classes you would have normally 
modified just need to be extended with a few protected methods to implement 
alternative logic.

I think we should focus efforts on extracting an interface to plug in different
WAN gateway implementations so that existing implementations aren’t modified
with new behavior. With proper interface extraction we can more easily unit
test around WAN gateways. By keeping implementations small we can more easily
test them in isolation. Making them all plugable allows distributions of Geode
to deliver specific implementations they would like to support without
impacting the existing implementations. It also frees Geode to release new
experimental or beta implementations of WAN gateways without impacting the
existing implementations rather than delaying releases waiting for modified WAN
gateways to be production ready and fully tested.

In looking at the geode-wan module one might notice that it was already
designed to be plugable. Unfortunately it isn’t that easy. This SPI was
originally there to provide a way for Geode to be donated initially without WAN
support. It turns out that most of the core to WAN is actually still in
geode-core and only some of the “secret sauce” was moved into geode-wan. The
bulk of the geode-core code for WAN is actually in support of the region queues
for WAN and AEQ, so moving it into geod-wan would have cut off AEQ. While it
would be possible to utilize this SPI for providing alternative gateway
implementations it is very high level,so much of the alternative
implementations would be duplications, and it doesn’t allow for both
implementations to sit side by side at runtime. I would actually suggest we
eliminate this public SPI in favor of just the geode-wan core module that it is
and eventually migrate the region queue code into its own module as well, but
these are for another day.

Looking closer at the WAN gateways themselves there is mostly a pluggable
interface already there with the existing interfaces. I spent a little time
pulling apart an internal SPI and it was quite easy. With a small modification
to gfsh and cache xml to specify that alternative implementation by name is all
that needs to be done immediately to configure an alternative. Without
extracting too many of the common implementation out into its own module just a
few of the classes in geode-core can be modified to provide empty
implementation of key overridable plug-in points for the TX batching
implementation. The result is that the TX batching module can be added to the
classpath and be configured as a gateway sender without injecting any of the TX
batching specific code into geode-core or geode-wan. Deeper details can be
found at the end of this discussion.

This solution isn’t perfect but it is a step in the right direction. The more
we continue to extract and extend the closer we will get to a true pluggable
architecture. The key to being successful will be to keep the changes to the
core components small and the interfaces between modules internal. If we all
make an effort towards this, both in our code and in our reviews of others
code, we can keep the efforts small and manageable.

I would love feedback and thoughts on this suggestion.

-Jake

Details on the POC

Before you dig in at the links below to the whole deal you should probably read
the narrative below. Since this was a POC the exploration into already modified
code, it took a few different turns, which may make the diff hard to read as a
whole. Nonetheless here are the links.
All the POC work can be found at:
https://github.com/pivotal-jbarrett/geode/tree/wip/wan-spi
A massive diff can be viewed at:
https://github.com/apache/geode/compare/develop...pivotal-jbarrett:wip/wan-spi

https://github.com/pivotal-jbarrett/geode/tree/wip/wan-spi/geode-wan-spi/src/main/java/org/apache/geode/cache/wan/internal/spi
The key component to all this is the internal SPI, which makes this thing
discoverable and the gateway senders constructable. The main SPI is the
GatwaySenderTypeFactory service interface. Implementations are all discovered
using Java’s service loader pattern. The implementation is responsible for
constructing the alternative GatewaySender implementations. The GatewaySender
interface just extends the InternalGatewaySender interface right now. The goal
would be to eventually flip that relationship, but remember this is baby steps
and a POC.

https://github.com/pivotal-jbarrett/geode/tree/wip/wan-spi/geode-wan-default/src/main
As an example, the current default implementation is extracted out into
geode-wan-default, which is discovered just the same as an alternative
implementation, via
MATA-INF/services/org.apache.geode.cache.wan.internal.spi.GatewaySenderTypeFactory.

https://github.com/pivotal-jbarrett/geode/tree/wip/wan-spi/geode-wan-txbatch/src
The TX batch, for lack of a better short name, is in geode-wan-txbatch. It
largely extends the implementations of the geode-wan-default and geode-core,
though those commonalities could be extracted into a common module later. To
extend the defaults though a few protected methods were added to some of the
classes to allow for extended behavior.

https://github.com/pivotal-jbarrett/geode/tree/wip/wan-spi/geode-wan-txbatch/src/main/java/org/apache/geode/cache/wan/internal/txbatch/parallel
Focusing on the TxBatchParallelGatewaySender, the implementation extends the
default ParallelGatwaySenderImpl and overrides the new createEventProcessor
method to inject its new
TxBatchRemoteConcurrentParallelGatewaySenderEventProcessor. The
TxBatchRemoteConcurrentParallelGatewaySenderEventProcessor extends the default
RemoteConcurrentParallelGatewaySenderEventProcessor and overrides the new
createProcessors method to inject its new
TxBatchRemoteParallelGatewaySenderEventProcessor. The
TxBatchRemoteParallelGatewaySenderEventProcessor extends the default
RemoteParallelGatewaySenderEventProcessor and extends the new
createParallelGatewaySenderQueue and createGatewaySenderEvent methods. The
createGatewaySenderEvent method allows this implementation to add metadata
specific to TX batching to the GatewaySenderEventImpl, like the id and last
event flags. The createParallelGatewaySenderQueue method allows it to inject
its new TxBatchParallelGatewaySenderQueue implementation. The
TxBatchParallelGatewaySenderQueue extends the default
ParallelGatewaySenderQueue but overrides the new postProcessBatch method. The
postProcessBatch is used to group the batches by transaction, which was the
core change made originally in geode-core’s ParallelGatewaySenderQueue class.

The gfsh and cache xml configuration changes are largely the same. It
deprecates the original implementations event batching flag, as well as the
parallel flag, in favor of a new type property. The type can either be the
fully qualified class name or a well known short name for the implementations
discovered through the service loader. For example, where you would previously
have configured a parallel gateway sender as “create gateway-sender …
--parallel …” you would know execute “create gateway-sender …
--type=ParallelGatewaySender …”. For TX batching in parallel you would specify
“create gateway-sender … --type=TxBatchParallelGatewaySender …”. We could
consider more friendly short names too, like simple “serial” and “parallel” for
default implementations and “tx-batch-parallel” and “tx-batch-serial” for the
TX batching versions.

If anyone wants a more detailed live walkthrough I would be happy to arrange
one or two.

[DISCUSS] Modularizing new WAN TX Batching (and Modularization Efforts in General)

Reply via email to