I have a simple trident transactional topology that does something like the
following:

kafka transactional spout (~3000 rec/sec, 6 partitions thus paraHint=6) -->
aggregate with reducerAggregator (paraHint=20) -->
transactional state (I tried MemoryMapState, MemcachedMapState and
CassandraMapState) -->
new Stream -->
print new values

I tried to tune the topology by firstly setting maxSpoutPending=1 and
batchEmitIntervals to a large value (1 sec), and then iteratively improve
those values.
I ended up with maxSpoutPending=20 batchEmitInterval=150ms

However I observed 2 things

1/ Delay in the topology keeps increasing
Even with those "fine-tune" values, or smaller values, it seems that some
transactions fail and that trident replay them (transactional state).
However this replaying process seems to delay the processing of new
incoming data, and storm seems to never catch up after replaying.
The result is that after a few minutes processing is clearly not "real
time" anymore (the aggregate printed in the logs are those from a few min
before, and it increases); even though I don't meet a particular bottleneck
for the calculation (bolt capacity and latency are ok).
Is this behavior normal ? Does it come from KafkaTransactionalSpout ? From
trident transactional mechanism ?

2/ There is an unavoidable bottleneck on $spoutcoord-spout0
Because small failures keeps accumulating, tridents replay more and more
transactions.
"spout0" performances are impacted (more work), but this can be scaled with
more kafka partitions.
However $spoutcoord-spout0 is always a unique thread in trident, whatever
spout we provide, and I clearly observed that $spoutcoord-spout0 goes above
1 after some minutes (and latency is above 10 sec or something).
Is there a way to improve this ? Or is this an unavoidable consequence of
trident's transactional logic that can't be addressed ?

Reply via email to