Hi -
I have a couple questions about partitioning - I’m trying to have multiple
tasks instances run, each processing a separate partition, and it appears that
only a single task instance runs, processing all partitions. Or else my
partitions are not created properly. This is based on a modified version of
hello-samza, so I’m not sure exactly which config+code steps to take to enable
partitioning of message to multiple instances of the same task.
To route to a partition i use: messageCollector.send(new
OutgoingMessageEnvelope(OUTPUT_STREAM, msgKey, partitionKey, outgoingMap));
- question here: the example in docs of collector.send(new
OutgoingMessageEnvelope(new SystemStream("kafka",
"SomeTopicPartitionedByUserId"), msg.get("user_id"), msg)) seems confusing,
because the OutgoingMessageEnvelope constructor has a key AND partitionKey - I
assume the partitionKey should be msg.get(“user_id”) in this case, but what
should key be? Just a unique value for this message?
I tried the 3 parameter constructor as well and have similar problems, where my
single task instance is used regardless of partitionKey specified in the
OutgoingMessageEnvelope.
Do I need to specify partition manager and yarn.container.count to get multiple
instances of my task working to service separate partitions?
I’m not sure how to tell if my messages are routed to the correct partition in
kafka, or whether the problem is a partition handling config in samza.
Any advice is appreciated!
Thanks
Tyson