Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-20 Thread Luke Chen
Hi Christo, Sorry for the late reply. > 3. I was thinking that the metric can be emitted while reading of those records is happening i.e. if it takes a long time then it will just gradually increase as we read. What do you think? Yes, sounds good to me. > Kamal and Luke, I agree some of the

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-20 Thread Jorge Esteban Quilcate Otoya
Hi Christo, On RemoteDeleteBytesPerSec: I think for Delete operations bytes represent the same as for Copy operations. We call copyLogSegment, but the bytes written can be different from the log segment size. We could have the same for Delete to get an idea of the amount of data delete from

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-20 Thread Satish Duggana
Hi Christo, I think we can start the vote thread on the KIP which is updated with the finalized metrics. We can have followup KIPs with other metrics if needed in future. Thanks, Satish. On Fri, 17 Nov 2023 at 22:16, Christo Lolov wrote: > Heya all! > > I have updated the KIP so please have

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-17 Thread Christo Lolov
Heya all! I have updated the KIP so please have another read through when you have the time. I know we are cutting it a bit close, but I would be grateful if I could start a vote early next week in order to get this in 3.7. re: Satish 104. I envision that ideally we will compare this metric

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-16 Thread Satish Duggana
Thanks Christo for your reply. 101 and 102 We have conclusion on them. 103. I am not strongly opinionated on this. I am fine if it is helpful for your scenarios. 104. It seems you want to compare this metric with the number of segments that are copied. Do you have such a metric now? Kamal and

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-14 Thread Christo Lolov
Heya everyone, Apologies for the delay in my response and thank you very much for all your comments! I will start answering in reverse: *re: Satish* 101. I am happy to scope down this KIP and start off by emitting those metrics on a topic level. I had a preference to emit them on a partition

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-10 Thread Satish Duggana
Thanks Christo for the KIP and the interesting discussion. 101. Adding metrics at partition level may increase the cardinality of these metrics. We should be cautious of that and see whether they are really needed. RLM related operations do not generally affect based on partition(s) but it is

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-09 Thread Jorge Esteban Quilcate Otoya
Hi Christo, I'd like to add another suggestion: 7. Adding on TS lag formulas, my understanding is that per pertition: - RemoteCopyLag: difference between: latest local segment candidate for upload - latest remote segment - Represents how Remote Log Manager task is handling backlog of segments.

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-09 Thread Luke Chen
Hi Christo, Thanks for the KIP! Some comments: 1. I agree with Kamal that a metric to cover the time taken to read data from remote storage is helpful. 2. I can see there are some metrics are only on topic level, but some are on partition level. Could you explain why some of them are only on

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-02 Thread Kamal Chandraprakash
Hi Christo, Thanks for expanding the scope of the KIP! We should also cover the time taken to read data from remote storage. This will give our users a fair idea about the P99, P95, and P50 Fetch latency to read data from remote storage. The Fetch API request metrics currently provides a

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-01 Thread Jorge Esteban Quilcate Otoya
Thanks, Christo! 1. Agree. Having a further look into how many latency metrics are included on the broker side there are only a few of them (e.g. request lifecycle) — but seems mostly delegated to clients, or plugin in this case, to measure this. 3.2. Personally, I find the record-based lag less

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-10-30 Thread Christo Lolov
Heya Jorge, Thank you for the insightful comments! 1. I see a value in such latency metrics but in my opinion the correct location for such metrics is in the plugins providing the underlying functionality. What are your thoughts on the matter? 2. Okay, I will look for and adjust the formatting

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-10-27 Thread Jorge Esteban Quilcate Otoya
Hi Christo, Thanks for proposing KIP, this metrics will certainly be useful to operate Kafka Tiered Storage as it becomes production-ready. 1. Given that the scope of the KIPs has grown to cover more metrics, what do you think about introducing latency metrics for RSM operations? Copy and delete

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-10-24 Thread Christo Lolov
Hello all, Now that 3.6 has been released, I would like to bring back attention to the following KIP for adding metrics to tiered storage targeting 3.7 - https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Add+more+metrics+to+Tiered+Storage . Let me know your thoughts about the list of

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-10-13 Thread Christo Lolov
Heya Gantigmaa, Apologies for the (very) late reply! Now that 3.6 has been released and reviewers have a bit more time I will be picking up this KIP again. I am more than happy to add useful new metrics to the KIP, I would just ask for a couple of days to review your pull request and I will come

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-09-25 Thread Gantigmaa Selenge
Hi Christo, Thank you for writing the KIP. I recently raised a PR to add metrics for tracking remote segment deletions (https://github.com/apache/kafka/pull/14375) but realised those metrics were not mentioned in the original KIP-405 or KIP-930. Do you think these would make sense to be added to

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-08-09 Thread Christo Lolov
Heya Kamal, Thank you for going through the KIP and for the question! I have been thinking about this and as an operator I might find it the most useful to know all three of them actually. I would find knowing the size in bytes useful to determine how much disk I might need to add temporarily

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-08-08 Thread Kamal Chandraprakash
Hi Christo, Thanks for the KIP! The proposed tiered storage metrics are useful. The unit mentioned in the KIP is the number of records. Each topic can have varying amounts of records in a segment depending on the record size. Do you think having the tier-lag by number of segments (or) size of

[DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-08-08 Thread Christo Lolov
Hello all! I would like to start a discussion for KIP-963: Upload and delete lag metrics in Tiered Storage (https://cwiki.apache.org/confluence/x/sZGzDw). The purpose of this KIP is to introduce a couple of metrics to track lag with respect to remote storage from the point of view of Kafka.