Github user arunmahadevan commented on a diff in the pull request:
https://github.com/apache/storm/pull/2241#discussion_r130523483
--- Diff: conf/defaults.yaml ---
@@ -253,11 +244,15 @@ topology.trident.batch.emit.interval.millis: 500
topology.testing.always.try.serialize: false
topology.classpath: null
topology.environment: null
-topology.bolts.outgoing.overflow.buffer.enable: false
-topology.disruptor.wait.timeout.millis: 1000
-topology.disruptor.batch.size: 100
-topology.disruptor.batch.timeout.millis: 1
-topology.disable.loadaware.messaging: false
+topology.disruptor.wait.timeout.millis: 1000 # TODO: Roshan: not used,
but we may/not want this behavior
+topology.transfer.buffer.size: 50000
+topology.transfer.batch.size: 10
+topology.executor.receive.buffer.size: 50000
+topology.producer.batch.size: 1000
+topology.flush.tuple.freq.millis: 100
--- End diff --
@roshannaik, why is it 100 ms, is it based on some benchmarks ?
As per [design
doc](https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit#heading=h.gjdgxs)
posted in the JIRA, the JCTools - MPSCArrayQ provides a throughput of 68
Million/Sec with 20 producers and the performance doesn't seem to degrade much
as the number of producers increases. If so why do we need to batch and flush
the tuples to the consumer queue. If the producers enqueue the events directly
into the receivers queue it would simplify the design and address the latency
concerns.
Also I assume if the batch size is set to 1, the events are directly
enqueued and the flush threads are not started?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---