[jira] [Commented] (KAFKA-12328) Find out partition of a store iterator

Matthias J. Sax (Jira) Wed, 17 Feb 2021 10:16:06 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-12328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286047#comment-17286047
 ]


Matthias J. Sax commented on KAFKA-12328:
-----------------------------------------

It's correct that a task may process multiple partitions from different topics; 
however, if you have two topics A and B with 10 partitions each, task-0 
processed tA-p0 and tB-p0, and task-1 process tA-p1 and tB-p1 and so on, ie, 
there is a 1:1 mapping from partition number to task-id over all input topics. 
A single task would never process multiple partitions with different partition 
numbers.

Also, if you have one task that processes multiple partitions (note, if a task 
processes multiple partitions, it's always partitions of different topics) and 
you have a state store, the task has one state-store shard – you don't have a 
shard per partition within a task. Thus, for an iterator over a state-store, 
it's tied to the task itself, but not tied to a single partition within the 
task (of course it's still tied to a single partition per topic, but it could 
be multiple partitions from different topics, with all those partitions having 
the same partition number).

> Find out partition of a store iterator
> --------------------------------------
>
>                 Key: KAFKA-12328
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12328
>             Project: Kafka
>          Issue Type: Wish
>          Components: streams
>            Reporter: fml2
>            Priority: Major
>
> This question was posted [on 
> stakoverflow|https://stackoverflow.com/questions/66032099/kafka-streams-how-to-get-the-partition-an-iterartor-is-iterating-over]
>  and got an answer but the solution is quite complicated hence this ticket.
>  
> In my Kafka Streams application, I have a task that sets up a scheduled (by 
> the wall time) punctuator. The punctuator iterates over the entries of a 
> store and does something with them. Like this:
> {code:java}
> var store = context().getStateStore("MyStore");
> var iter = store.all();
> while (iter.hasNext()) {
>    var entry = iter.next();
>    // ... do something with the entry
> }
> // Print a summary (now): N entries processed
> // Print a summary (wish): N entries processed in partition P
> {code}
> Is it possible to find out which partition the punctuator operates on? The 
> java docs for {{ProcessorContext.partition()}} states that this method 
> returns {{-1}} within punctuators.
> I've read [Kafka Streams: Punctuate vs 
> Process|https://stackoverflow.com/questions/50776987/kafka-streams-punctuate-vs-process]
>  and the answers there. I can understand that a task is, in general, not tied 
> to a particular partition. But an iterator should be tied IMO.
> How can I find out the partition?
> Or is my assumption that a particular instance of a store iterator is tied to 
> a partion wrong?
> What I need it for: I'd like to include the partition number in some log 
> messages. For now, I have several nearly identical log messages stating that 
> the punctuator does this and that. In order to make those messages "unique" 
> I'd like to include the partition number into them.
> Since I'm working with a single store here (which might be partitioned), I 
> assume that every single execution of the punctuator is bound to a single 
> partition of that store.
>  
> It would be cool if there were a method {{iterator.partition}} (or similar) 
> to get this information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-12328) Find out partition of a store iterator

Reply via email to