re sophie:

The intent here was to include all blocked time (not just `RUNNING`). The
caller can window the total blocked time themselves, and that can be
compared with a timeseries of the state to understand the ratio in
different states. I'll update the KIP to include `committed`. The admin API
calls should be accounted for by the admin client iotime/iowaittime
metrics.

On Tue, Jul 20, 2021 at 11:49 PM Rohan Desai <desai.p.ro...@gmail.com>
wrote:

> > I remember now that we moved the round-trip PID's txn completion logic
> into
> init-transaction and commit/abort-transaction. So I think we'd count time
> as in StreamsProducer#initTransaction as well (admittedly it is in most
> cases a one-time thing).
>
> Makes sense - I'll update the KIP
>
> On Tue, Jul 20, 2021 at 11:48 PM Rohan Desai <desai.p.ro...@gmail.com>
> wrote:
>
>>
>> > I had a question - it seems like from the descriptionsof
>> `txn-commit-time-total` and `offset-commit-time-total` that they measure
>> similar processes for ALOS and EOS, but only `txn-commit-time-total` is
>> included in `blocked-time-total`. Why isn't `offset-commit-time-total` also
>> included?
>>
>> I've updated the KIP to include it.
>>
>> > Aside from `flush-time-total`, `txn-commit-time-total` and
>> `offset-commit-time-total`, which will be producer/consumer client
>> metrics,
>> the rest of the metrics will be streams metrics that will be thread level,
>> is that right?
>>
>> Based on the feedback from Guozhang, I've updated the KIP to reflect that
>> the lower-level metrics are all client metrics that are then summed to
>> compute the blocked time metric, which is a Streams metric.
>>
>> On Tue, Jul 20, 2021 at 11:58 AM Rohan Desai <desai.p.ro...@gmail.com>
>> wrote:
>>
>>> > Similarly, I think "txn-commit-time-total" and
>>> "offset-commit-time-total" may better be inside producer and consumer
>>> clients respectively.
>>>
>>> I agree for offset-commit-time-total. For txn-commit-time-total I'm
>>> proposing we measure `StreamsProducer.commitTransaction`, which wraps
>>> multiple producer calls (sendOffsets, commitTransaction)
>>>
>>> > > For "txn-commit-time-total" specifically, besides
>>> producer.commitTxn.
>>> other txn-related calls may also be blocking, including
>>> producer.beginTxn/abortTxn, I saw you mentioned "txn-begin-time-total"
>>> later in the doc, but did not include it as a separate metric, and
>>> similarly, should we have a `txn-abort-time-total` as well? If yes,
>>> could
>>> you update the KIP page accordingly.
>>>
>>> `beginTransaction` is not blocking - I meant to remove that from that
>>> doc. I'll add something for abort.
>>>
>>> On Mon, Jul 19, 2021 at 11:55 PM Rohan Desai <desai.p.ro...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for the review Guozhang! responding to your feedback inline:
>>>>
>>>> > 1) I agree that the current ratio metrics is just "snapshot in
>>>> point", and
>>>> more flexible metrics that would allow reporters to calculate based on
>>>> window intervals are better. However, the current mechanism of the
>>>> proposed
>>>> metrics assumes the thread->clients mapping as of today, where each
>>>> thread
>>>> would own exclusively one main consumer, restore consumer, producer and
>>>> an
>>>> admin client. But this mapping may be subject to change in the future.
>>>> Have
>>>> you thought about how this metric can be extended when, e.g. the
>>>> embedded
>>>> clients and stream threads are de-coupled?
>>>>
>>>> Of course this depends on how exactly we refactor the runtime -
>>>> assuming that we plan to factor out consumers into an "I/O" layer that is
>>>> responsible for receiving records and enqueuing them to be processed by
>>>> processing threads, then I think it should be reasonable to count the time
>>>> we spend blocked on this internal queue(s) as blocked. The main concern
>>>> there to me is that the I/O layer would be doing something expensive like
>>>> decompression that shouldn't be counted as "blocked". But if that really is
>>>> so expensive that it starts to throw off our ratios then it's probably
>>>> indicative of a larger problem that the "i/o layer" is a bottleneck and it
>>>> would be worth refactoring so that decompression (or insert other expensive
>>>> thing here) can also be done on the processing threads.
>>>>
>>>> > 2) [This and all below are minor comments] The "flush-time-total" may
>>>> better be a producer client metric, as "flush-wait-time-total", than a
>>>> streams metric, though the streams-level "total-blocked" can still
>>>> leverage
>>>> it. Similarly, I think "txn-commit-time-total" and
>>>> "offset-commit-time-total" may better be inside producer and consumer
>>>> clients respectively.
>>>>
>>>> Good call - I'll update the KIP
>>>>
>>>> > 3) The doc was not very clear on how "thread-start-time" would be
>>>> needed
>>>> when calculating streams utilization along with total-blocked time,
>>>> could
>>>> you elaborate a bit more in the KIP?
>>>>
>>>> Yes, will do.
>>>>
>>>> > For "txn-commit-time-total" specifically, besides producer.commitTxn.
>>>> other txn-related calls may also be blocking, including
>>>> producer.beginTxn/abortTxn, I saw you mentioned "txn-begin-time-total"
>>>> later in the doc, but did not include it as a separate metric, and
>>>> similarly, should we have a `txn-abort-time-total` as well? If yes,
>>>> could
>>>> you update the KIP page accordingly.
>>>>
>>>> Ack.
>>>>
>>>> On Mon, Jul 12, 2021 at 11:29 PM Rohan Desai <desai.p.ro...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I'd like to start a discussion on the KIP linked above which proposes
>>>>> some metrics that we would find useful to help measure whether a Kafka
>>>>> Streams application is saturated. The motivation section in the KIP goes
>>>>> into some more detail on why we think this is a useful addition to the
>>>>> metrics already implemented. Thanks in advance for your feedback!
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Rohan
>>>>>
>>>>> On Mon, Jul 12, 2021 at 12:00 PM Rohan Desai <desai.p.ro...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams
>>>>>>
>>>>>

Reply via email to