Thanks for the tips Chi,
I'm a little confused about the partitioning. I had thought that the number
of partitions was determined by the amount of parallelism in the topology.
For example if I said .parallelismHint(4), then I would have 4 different
partitions. Is this not the case ?
Is there a set
Raphael,
The number of partitions is defined in your Kafka configuration -
http://kafka.apache.org/documentation.html#brokerconfigs (num.partitions) -
or when you create the topic. The behavior is different for each version
of Kafka, so you should read more documentation. Your topology needs to
Oh ok. Thanks Chi!
Do you have any ideas about why my batch size never seems to get any bigger
than 83K tuples ?
Currently I'm just using a barebones topology that looks like this:
Stream spout = topology.newStream(..., ...)
.parallelismHint()
.groupBy(new Fields(time))
.aggregate(new
Raphael,
You can try tuning your parallelism (and num workers).
For Kafka 0.7, your spout parallelism could max out at: # brokers x #
partitions (for the topic). If you have 4 Kafka brokers, and your topic
has 5 partitions, then you could set the spout parallelism to 20 to
maximize the
I am in the process of optimizing my stream. Currently I expect 5 000 000
tuples to come out of my spout per minute. I am trying to beef up my
topology in order to process this in real time without falling behind.
For some reason my batch size is capping out at 83 thousand tuples. I can't
seem to