Hello,

If I have a Kinesis stream split into multiple shards (say 10), can I have, say 
3 executors, subscribed to those shards? I assume that automatic re-balancing 
will be enabled as we add/remove executors for scale up/down or simply failures 
…

If so, can I specify a partition key? If I specify a partition key on the 
Kinesis producer, it will always send (Key=A) to say Shard 4 and (Key=B) to 
Shard 8 and this will be consistent I assume so long as the executors are up 
and no rebalancing occurs.

How can I map the payloads in the first Spark stage/task that receives the 
payload from Kinesis? What I would want to finally achieve is that the 
flatMapGroupWithState() that I would call later in the pipeline should have the 
same (partition) key internally for key lookups in the (RocksDB) state so that 
data locality can be achieved.

Is this redundant or implicit or not possible or am I missing something? Your 
response would be greatly helpful. If there is some documentation or examples 
around this, that would be good too!

Thanks,
Sandip Khanzode

Reply via email to