Hi Christo,
Sorry for the late reply.
> 3. I was thinking that the metric can be emitted while reading of those
records is happening i.e. if it takes a long time then it will just
gradually increase as we read. What do you think?
Yes, sounds good to me.
> Kamal and Luke,
I agree some of the
Hi Christo,
On RemoteDeleteBytesPerSec: I think for Delete operations bytes represent
the same as for Copy operations.
We call copyLogSegment, but the bytes written can be different from the log
segment size.
We could have the same for Delete to get an idea of the amount of data
delete from
Hi Christo,
I think we can start the vote thread on the KIP which is updated with the
finalized metrics. We can have followup KIPs with other metrics if needed
in future.
Thanks,
Satish.
On Fri, 17 Nov 2023 at 22:16, Christo Lolov wrote:
> Heya all!
>
> I have updated the KIP so please have
Heya all!
I have updated the KIP so please have another read through when you have
the time. I know we are cutting it a bit close, but I would be grateful if
I could start a vote early next week in order to get this in 3.7.
re: Satish
104. I envision that ideally we will compare this metric
Thanks Christo for your reply.
101 and 102 We have conclusion on them.
103. I am not strongly opinionated on this. I am fine if it is helpful
for your scenarios.
104. It seems you want to compare this metric with the number of
segments that are copied. Do you have such a metric now?
Kamal and
Heya everyone,
Apologies for the delay in my response and thank you very much for all your
comments! I will start answering in reverse:
*re: Satish*
101. I am happy to scope down this KIP and start off by emitting those
metrics on a topic level. I had a preference to emit them on a partition
Thanks Christo for the KIP and the interesting discussion.
101. Adding metrics at partition level may increase the cardinality of
these metrics. We should be cautious of that and see whether they are
really needed. RLM related operations do not generally affect based on
partition(s) but it is
Hi Christo,
I'd like to add another suggestion:
7. Adding on TS lag formulas, my understanding is that per pertition:
- RemoteCopyLag: difference between: latest local segment candidate for
upload - latest remote segment
- Represents how Remote Log Manager task is handling backlog of segments.
Hi Christo,
Thanks for the KIP!
Some comments:
1. I agree with Kamal that a metric to cover the time taken to read data
from remote storage is helpful.
2. I can see there are some metrics are only on topic level, but some are
on partition level.
Could you explain why some of them are only on
Hi Christo,
Thanks for expanding the scope of the KIP! We should also cover the time
taken to
read data from remote storage. This will give our users a fair idea about
the P99, P95,
and P50 Fetch latency to read data from remote storage.
The Fetch API request metrics currently provides a
Thanks, Christo!
1. Agree. Having a further look into how many latency metrics are included
on the broker side there are only a few of them (e.g. request lifecycle) —
but seems mostly delegated to clients, or plugin in this case, to measure
this.
3.2. Personally, I find the record-based lag less
Heya Jorge,
Thank you for the insightful comments!
1. I see a value in such latency metrics but in my opinion the correct
location for such metrics is in the plugins providing the underlying
functionality. What are your thoughts on the matter?
2. Okay, I will look for and adjust the formatting
Hi Christo,
Thanks for proposing KIP, this metrics will certainly be useful to operate
Kafka Tiered Storage as it becomes production-ready.
1. Given that the scope of the KIPs has grown to cover more metrics, what
do you think about introducing latency metrics for RSM operations?
Copy and delete
Hello all,
Now that 3.6 has been released, I would like to bring back attention to the
following KIP for adding metrics to tiered storage targeting 3.7 -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Add+more+metrics+to+Tiered+Storage
.
Let me know your thoughts about the list of
Heya Gantigmaa,
Apologies for the (very) late reply!
Now that 3.6 has been released and reviewers have a bit more time I will be
picking up this KIP again. I am more than happy to add useful new metrics
to the KIP, I would just ask for a couple of days to review your pull
request and I will come
Hi Christo,
Thank you for writing the KIP.
I recently raised a PR to add metrics for tracking remote segment deletions
(https://github.com/apache/kafka/pull/14375) but realised those metrics
were not mentioned in the original KIP-405 or KIP-930. Do you think these
would make sense to be added to
Heya Kamal,
Thank you for going through the KIP and for the question!
I have been thinking about this and as an operator I might find it the most
useful to know all three of them actually.
I would find knowing the size in bytes useful to determine how much disk I
might need to add temporarily
Hi Christo,
Thanks for the KIP!
The proposed tiered storage metrics are useful. The unit mentioned in the
KIP is the number of records.
Each topic can have varying amounts of records in a segment depending on
the record size.
Do you think having the tier-lag by number of segments (or) size of
Hello all!
I would like to start a discussion for KIP-963: Upload and delete lag
metrics in Tiered Storage (https://cwiki.apache.org/confluence/x/sZGzDw).
The purpose of this KIP is to introduce a couple of metrics to track lag
with respect to remote storage from the point of view of Kafka.
19 matches
Mail list logo