Possibly related - I cannot seem to find the source for KafkaPartitionManager - can someone point me to it?
Thanks Tyson On Apr 11, 2014, at 12:15 PM, Tyson Norris <[email protected]> wrote: > Hi - > I have a couple questions about partitioning - I’m trying to have multiple > tasks instances run, each processing a separate partition, and it appears > that only a single task instance runs, processing all partitions. Or else my > partitions are not created properly. This is based on a modified version of > hello-samza, so I’m not sure exactly which config+code steps to take to > enable partitioning of message to multiple instances of the same task. > > To route to a partition i use: messageCollector.send(new > OutgoingMessageEnvelope(OUTPUT_STREAM, msgKey, partitionKey, outgoingMap)); > - question here: the example in docs of collector.send(new > OutgoingMessageEnvelope(new SystemStream("kafka", > "SomeTopicPartitionedByUserId"), msg.get("user_id"), msg)) seems confusing, > because the OutgoingMessageEnvelope constructor has a key AND partitionKey - > I assume the partitionKey should be msg.get(“user_id”) in this case, but what > should key be? Just a unique value for this message? > > I tried the 3 parameter constructor as well and have similar problems, where > my single task instance is used regardless of partitionKey specified in the > OutgoingMessageEnvelope. > > Do I need to specify partition manager and yarn.container.count to get > multiple instances of my task working to service separate partitions? > > I’m not sure how to tell if my messages are routed to the correct partition > in kafka, or whether the problem is a partition handling config in samza. > > Any advice is appreciated! > > Thanks > Tyson
