[jira] [Commented] (KAFKA-16414) Inconsistent active segment expiration behavior between retention.ms and retention.bytes

2024-08-22 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875799#comment-17875799
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-16414:
--

Back on this one, I think now it's a good chance to make a change towards 4.0 
and avoid having this inconsistency between retention configurations.

I see there's some agreement on aligning these configs, but my guess is that 
any further change would require a KIP.

A proposal to align these should outline a couple of alternatives at least:
First:
 * retention.bytes will rotate active segment if the condition is breached.
 * local.retention.bytes will inherit this behavior as well, and rotate active 
segment.
 * with these changes we could trigger active segment rotation either by using 
roll.ms/segment.bytes or (local.)retention.(ms|bytes)

an second alternative:
 * remove active segment rotation from local.retention.ms
 * remove active segment rotation from retention.ms
 * let roll.ms and segment.bytes be the configurations to trigger active 
segment rotation

both unfortunately force changing some existing behavior and potentially 
affecting users already counting on them.

Another alternative could be just leaving things as they are, but at least that 
should be explicitly agreed and properly documented.

Personally, I'd prefer the second option. Would these be a good starting point 
to start a KIP discussion?

> Inconsistent active segment expiration behavior between retention.ms and 
> retention.bytes
> 
>
> Key: KAFKA-16414
> URL: https://issues.apache.org/jira/browse/KAFKA-16414
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.6.1
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Major
>
> This is a follow up issue on KAFKA-16385.
> Currently, there's a difference between how retention.ms and retention.bytes 
> handle active segment expiration:
> - retention.ms always expire active segment when max segment timestamp 
> matches the condition.
> - retention.bytes only expire active segment when retention.bytes is 
> configured to zero.
> The behavior should be either rotate active segments for both retention 
> configurations or none at all.
> For more details, see
> https://issues.apache.org/jira/browse/KAFKA-16385?focusedCommentId=17829682&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829682



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16414) Inconsistent active segment expiration behavior between retention.ms and retention.bytes

2024-08-22 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847881#comment-17847881
 ] 

Jorge Esteban Quilcate Otoya edited comment on KAFKA-16414 at 8/22/24 9:45 AM:
---

I agree with this concern; however I'd argue it's the same for segment.ms and 
local.segment.ms.

If that's an issue, then we could consider not inheriting this behavior from 
global to local retention in the first place. So, local.retetion.ms|bytes 
should _not_ rotate active segments.

Then we could discuss aligning retention.ms to retention.bytes without 
affecting local settings.


was (Author: jeqo):
I agree with this concern; however I'd argue it's the same for segment.ms and 
local.segment.ms.

If that's an issue, then we could consider not inheriting this behavior from 
global to local retention in the first place. So, local.retetion.ms|global 
should _not_ rotate active segments.

Then we could discuss aligning retention.ms to retention.bytes without 
affecting local settings.

> Inconsistent active segment expiration behavior between retention.ms and 
> retention.bytes
> 
>
> Key: KAFKA-16414
> URL: https://issues.apache.org/jira/browse/KAFKA-16414
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.6.1
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Major
>
> This is a follow up issue on KAFKA-16385.
> Currently, there's a difference between how retention.ms and retention.bytes 
> handle active segment expiration:
> - retention.ms always expire active segment when max segment timestamp 
> matches the condition.
> - retention.bytes only expire active segment when retention.bytes is 
> configured to zero.
> The behavior should be either rotate active segments for both retention 
> configurations or none at all.
> For more details, see
> https://issues.apache.org/jira/browse/KAFKA-16385?focusedCommentId=17829682&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829682



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16414) Inconsistent active segment expiration behavior between retention.ms and retention.bytes

2024-05-20 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847881#comment-17847881
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-16414:
--

I agree with this concern; however I'd argue it's the same for segment.ms and 
local.segment.ms.

If that's an issue, then we could consider not inheriting this behavior from 
global to local retention in the first place. So, local.retetion.ms|global 
should _not_ rotate active segments.

Then we could discuss aligning retention.ms to retention.bytes without 
affecting local settings.

> Inconsistent active segment expiration behavior between retention.ms and 
> retention.bytes
> 
>
> Key: KAFKA-16414
> URL: https://issues.apache.org/jira/browse/KAFKA-16414
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.6.1
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Major
>
> This is a follow up issue on KAFKA-16385.
> Currently, there's a difference between how retention.ms and retention.bytes 
> handle active segment expiration:
> - retention.ms always expire active segment when max segment timestamp 
> matches the condition.
> - retention.bytes only expire active segment when retention.bytes is 
> configured to zero.
> The behavior should be either rotate active segments for both retention 
> configurations or none at all.
> For more details, see
> https://issues.apache.org/jira/browse/KAFKA-16385?focusedCommentId=17829682&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829682



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16414) Inconsistent active segment expiration behavior between retention.ms and retention.bytes

2024-05-20 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847822#comment-17847822
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-16414:
--

> That is, the current behavior has been there for a long time, I think even if 
> we don't change it, users seem to accept it.

I think inheriting these behaviors on tiered-storage local retention configs is 
what is making them more visible to users.

In any real use-case, setting retention.ms|bytes to 1 is not that common, but 
local.retention.ms|bytes to 1 can be more frequently used (i.e. keep as minimal 
data locally as possible) so my guess is that more users will find this 
inconsistent behavior as tiered storage gets more adopted.

> Inconsistent active segment expiration behavior between retention.ms and 
> retention.bytes
> 
>
> Key: KAFKA-16414
> URL: https://issues.apache.org/jira/browse/KAFKA-16414
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.6.1
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Major
>
> This is a follow up issue on KAFKA-16385.
> Currently, there's a difference between how retention.ms and retention.bytes 
> handle active segment expiration:
> - retention.ms always expire active segment when max segment timestamp 
> matches the condition.
> - retention.bytes only expire active segment when retention.bytes is 
> configured to zero.
> The behavior should be either rotate active segments for both retention 
> configurations or none at all.
> For more details, see
> https://issues.apache.org/jira/browse/KAFKA-16385?focusedCommentId=17829682&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829682



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16414) Inconsistent active segment expiration behavior between retention.ms and retention.bytes

2024-05-16 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847035#comment-17847035
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-16414:
--

Just checking if there's an agreement here and if there's any ongoing work 
towards aligning there retention configs

cc [~ckamal] [~brandboat] 

> Inconsistent active segment expiration behavior between retention.ms and 
> retention.bytes
> 
>
> Key: KAFKA-16414
> URL: https://issues.apache.org/jira/browse/KAFKA-16414
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.6.1
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Major
>
> This is a follow up issue on KAFKA-16385.
> Currently, there's a difference between how retention.ms and retention.bytes 
> handle active segment expiration:
> - retention.ms always expire active segment when max segment timestamp 
> matches the condition.
> - retention.bytes only expire active segment when retention.bytes is 
> configured to zero.
> The behavior should be either rotate active segments for both retention 
> configurations or none at all.
> For more details, see
> https://issues.apache.org/jira/browse/KAFKA-16385?focusedCommentId=17829682&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829682



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16685) RLMTask warning logs do not include parent exception trace

2024-05-08 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya resolved KAFKA-16685.
--
Resolution: Fixed

Merged https://github.com/apache/kafka/pull/15880

> RLMTask warning logs do not include parent exception trace
> --
>
> Key: KAFKA-16685
> URL: https://issues.apache.org/jira/browse/KAFKA-16685
> Project: Kafka
>  Issue Type: Improvement
>  Components: Tiered-Storage
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> When RLMTask warning exceptions happen and are logged, it only includes the 
> exception message, but we lose the stack trace.
> See 
> [https://github.com/apache/kafka/blob/70b8c5ae8e9336dbf4792e2d3cf100bbd70d6480/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L821-L831]
> This makes it difficult to troubleshoot issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16691) Support for nested structures: TimestampConverter

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya reassigned KAFKA-16691:


Assignee: Jorge Esteban Quilcate Otoya

> Support for nested structures: TimestampConverter
> -
>
> Key: KAFKA-16691
> URL: https://issues.apache.org/jira/browse/KAFKA-16691
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16690) Support for nested structures: HeaderFrom

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya reassigned KAFKA-16690:


Assignee: Jorge Esteban Quilcate Otoya

> Support for nested structures: HeaderFrom
> -
>
> Key: KAFKA-16690
> URL: https://issues.apache.org/jira/browse/KAFKA-16690
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16691) Support for nested structures: TimestampConverter

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-16691:


 Summary: Support for nested structures: TimestampConverter
 Key: KAFKA-16691
 URL: https://issues.apache.org/jira/browse/KAFKA-16691
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16690) Support for nested structures: HeaderFrom

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-16690:


 Summary: Support for nested structures: HeaderFrom
 Key: KAFKA-16690
 URL: https://issues.apache.org/jira/browse/KAFKA-16690
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14226) Introduce support for nested structures

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-14226:
-
Description: 
Abstraction for FieldPath and initial SMTs:
 * ExtractField

  was:
Abstraction for FieldPath and initial SMTs:
 * ExtractField
 * HeaderFrom
 * TimestampConverter


> Introduce support for nested structures
> ---
>
> Key: KAFKA-14226
> URL: https://issues.apache.org/jira/browse/KAFKA-14226
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> Abstraction for FieldPath and initial SMTs:
>  * ExtractField



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14226) Introduce support for nested structures

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya resolved KAFKA-14226.
--
Resolution: Fixed

Merged: https://github.com/apache/kafka/pull/15379

> Introduce support for nested structures
> ---
>
> Key: KAFKA-14226
> URL: https://issues.apache.org/jira/browse/KAFKA-14226
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> Abstraction for FieldPath and initial SMTs:
>  * ExtractField
>  * HeaderFrom
>  * TimestampConverter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16264) Expose `producer.id.expiration.check.interval.ms` as dynamic broker configuration

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844484#comment-17844484
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-16264:
--

Thanks [~jolshan] ! Finally got some time to look into this one. Please, have a 
look at [https://github.com/apache/kafka/pull/15888]

> Expose `producer.id.expiration.check.interval.ms` as dynamic broker 
> configuration
> -
>
> Key: KAFKA-16264
> URL: https://issues.apache.org/jira/browse/KAFKA-16264
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> Dealing with a scenario where too many producer ids lead to issues (e.g. high 
> cpu utilization, see KAFKA-16229) put operators in need to flush producer ids 
> more promptly than usual.
> Currently, only the expiration timeout `producer.id.expiration.ms` is exposed 
> as dynamic config. This is helpful (e.g. by reducing the timeout, less 
> producer would eventually be kept in memory), but not enough if the 
> evaluation frequency is not sufficiently short to flush producer ids before 
> becoming an issue. Only by tuning both, the issue could be workaround.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16685) RLMTask warning logs do not include parent exception trace

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-16685:
-
Summary: RLMTask warning logs do not include parent exception trace  (was: 
RLMTask warn logs do not include parent exception trace)

> RLMTask warning logs do not include parent exception trace
> --
>
> Key: KAFKA-16685
> URL: https://issues.apache.org/jira/browse/KAFKA-16685
> Project: Kafka
>  Issue Type: Improvement
>  Components: Tiered-Storage
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> When RSMTask exceptions happen and are logged, it only includes the exception 
> message, but we lose the stack trace.
> See 
> [https://github.com/apache/kafka/blob/70b8c5ae8e9336dbf4792e2d3cf100bbd70d6480/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L821-L831]
> This makes it difficult to troubleshoot issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16685) RLMTask warn logs do not include parent exception trace

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-16685:
-
Summary: RLMTask warn logs do not include parent exception trace  (was: RSM 
Task warn logs do not include parent exception trace)

> RLMTask warn logs do not include parent exception trace
> ---
>
> Key: KAFKA-16685
> URL: https://issues.apache.org/jira/browse/KAFKA-16685
> Project: Kafka
>  Issue Type: Improvement
>  Components: Tiered-Storage
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> When RSMTask exceptions happen and are logged, it only includes the exception 
> message, but we lose the stack trace.
> See 
> [https://github.com/apache/kafka/blob/70b8c5ae8e9336dbf4792e2d3cf100bbd70d6480/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L821-L831]
> This makes it difficult to troubleshoot issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16685) RLMTask warning logs do not include parent exception trace

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-16685:
-
Description: 
When RLMTask warning exceptions happen and are logged, it only includes the 
exception message, but we lose the stack trace.

See 
[https://github.com/apache/kafka/blob/70b8c5ae8e9336dbf4792e2d3cf100bbd70d6480/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L821-L831]

This makes it difficult to troubleshoot issues.

  was:
When RSMTask exceptions happen and are logged, it only includes the exception 
message, but we lose the stack trace.

See 
[https://github.com/apache/kafka/blob/70b8c5ae8e9336dbf4792e2d3cf100bbd70d6480/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L821-L831]

This makes it difficult to troubleshoot issues.


> RLMTask warning logs do not include parent exception trace
> --
>
> Key: KAFKA-16685
> URL: https://issues.apache.org/jira/browse/KAFKA-16685
> Project: Kafka
>  Issue Type: Improvement
>  Components: Tiered-Storage
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> When RLMTask warning exceptions happen and are logged, it only includes the 
> exception message, but we lose the stack trace.
> See 
> [https://github.com/apache/kafka/blob/70b8c5ae8e9336dbf4792e2d3cf100bbd70d6480/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L821-L831]
> This makes it difficult to troubleshoot issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16685) RSM Task warn logs do not include parent exception trace

2024-05-07 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-16685:


 Summary: RSM Task warn logs do not include parent exception trace
 Key: KAFKA-16685
 URL: https://issues.apache.org/jira/browse/KAFKA-16685
 Project: Kafka
  Issue Type: Improvement
  Components: Tiered-Storage
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


When RSMTask exceptions happen and are logged, it only includes the exception 
message, but we lose the stack trace.

See 
[https://github.com/apache/kafka/blob/70b8c5ae8e9336dbf4792e2d3cf100bbd70d6480/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L821-L831]

This makes it difficult to troubleshoot issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16414) Inconsistent active segment expiration behavior between retention.ms and retention.bytes

2024-03-27 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831327#comment-17831327
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-16414:
--

Just got the new messages while posting my last comment.

As Kamal mentioned, Tiered Storage has been a feature where there were 
assumptions about how active segment rotation works and aligning these 
behaviors will potentially break some integration tests. This is why I was 
hesitating to suggest a KIP for this change.

> Inconsistent active segment expiration behavior between retention.ms and 
> retention.bytes
> 
>
> Key: KAFKA-16414
> URL: https://issues.apache.org/jira/browse/KAFKA-16414
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.6.1
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Major
>
> This is a follow up issue on KAFKA-16385.
> Currently, there's a difference between how retention.ms and retention.bytes 
> handle active segment expiration:
> - retention.ms always expire active segment when max segment timestamp 
> matches the condition.
> - retention.bytes only expire active segment when retention.bytes is 
> configured to zero.
> The behavior should be either rotate active segments for both retention 
> configurations or none at all.
> For more details, see
> https://issues.apache.org/jira/browse/KAFKA-16385?focusedCommentId=17829682&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829682



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16414) Inconsistent active segment expiration behavior between retention.ms and retention.bytes

2024-03-27 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831324#comment-17831324
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-16414:
--

As well, I'm +1 on including the active segment on size-based retention.

I guess this change of behavior does not require a KIP (?), but let's make sure 
it's mentioned/acknowledged on the Release notes so if there's any user relying 
on how the mix of retention by size and active segment rotation currently work, 
they will be aware when upgrading.

> Inconsistent active segment expiration behavior between retention.ms and 
> retention.bytes
> 
>
> Key: KAFKA-16414
> URL: https://issues.apache.org/jira/browse/KAFKA-16414
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.6.1
>Reporter: Kuan Po Tseng
>Assignee: Kuan Po Tseng
>Priority: Major
>
> This is a follow up issue on KAFKA-16385.
> Currently, there's a difference between how retention.ms and retention.bytes 
> handle active segment expiration:
> - retention.ms always expire active segment when max segment timestamp 
> matches the condition.
> - retention.bytes only expire active segment when retention.bytes is 
> configured to zero.
> The behavior should be either rotate active segments for both retention 
> configurations or none at all.
> For more details, see
> https://issues.apache.org/jira/browse/KAFKA-16385?focusedCommentId=17829682&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829682



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached

2024-03-21 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829682#comment-17829682
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-16385:
--

> I admin my original understanding of `retention.ms` is it only take affects 
> to the inactive segments. 

I'm happy that I'm in great company on this one and I wasn't the only me who 
believed that active segment was not considered in retention cleanups :)

 

I started a discussion thread[1] earlier today on a related topic: yes, 
retention.ms takes precedence to segment.ms and log.roll.ms and rolls a new 
(empty) active segment when max segment timestamp matches the condition. But, 
retention.bytes doesn't follow a similar path: given the current condition[2] 
for size-based rotation, it always forces to have at least 1 (active) segment 
_unless_ (and this is something I discovered on this thread[3] mentioned by 
[~chia7712] ) the segment size is equal to zero.

I, as well, agree that we should include the active segment on the retention 
checks; but would like to also discuss whether we should align active segment 
rotation for size-based retention as well.

 

[1] [https://lists.apache.org/thread/s9xp17dpx21wqh9gp42kbvb4m93vvb23]

[2] 
[https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1575-L1583]
 

[3] 
https://issues.apache.org/jira/browse/KAFKA-16385?focusedCommentId=17828281&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17828281

> Segment is rolled before segment.ms or segment.bytes breached
> -
>
> Key: KAFKA-16385
> URL: https://issues.apache.org/jira/browse/KAFKA-16385
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.5.1, 3.7.0
>Reporter: Luke Chen
>Assignee: Kuan Po Tseng
>Priority: Major
>
> Steps to reproduce:
> 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up 
> the test.
> 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, 
> retention.ms=1sec .
> 2. Send a record "aaa" to the topic
> 3. Wait for 1 second
> Will this segment will rolled? I thought no.
> But what I have tested is it will roll:
> {code:java}
> [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, 
> dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. 
> (kafka.log.LocalLog)
> [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote 
> producer snapshot at offset 1 with 1 producer ids in 1 ms. 
> (org.apache.kafka.storage.internals.log.ProducerStateManager)
> [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, 
> dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, 
> lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to 
> log retention time 1000ms breach based on the largest record timestamp in the 
> segment (kafka.log.UnifiedLog)
> {code}
> The segment is rolled due to log retention time 1000ms breached, which is 
> unexpected.
> Tested in v3.5.1, it has the same issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10409) Refactor Kafka Streams RocksDb iterators

2024-03-19 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828250#comment-17828250
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-10409:
--

[~dhawalkapil] thanks for reaching out, and sorry for the late reply.

I haven't follow the developments here unfortunately. Though the PR you 
mentioned is a reference, not actually implementing this ticket.

> Refactor Kafka Streams RocksDb iterators 
> -
>
> Key: KAFKA-10409
> URL: https://issues.apache.org/jira/browse/KAFKA-10409
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jorge Esteban Quilcate Otoya
>Priority: Minor
>  Labels: newbie
>
> From [https://github.com/apache/kafka/pull/9137#discussion_r470345513] :
> [~ableegoldman] : 
> > Kind of unrelated, but WDYT about renaming {{RocksDBDualCFIterator}} to 
> > {{RocksDBDualCFAllIterator}} or something on the side? I feel like these 
> > iterators could be cleaned up a bit in general to be more understandable – 
> > for example, it's weird that we do the {{iterator#seek}}-ing in the actual 
> > {{all()}} method but for range queries we do the seeking inside the 
> > iterator constructor.
> and [https://github.com/apache/kafka/pull/9137#discussion_r470361726] :
> > Personally I found the {{RocksDBDualCFIterator}} logic a bit difficult to 
> > follow even before the reverse iteration, so it would be nice to have some 
> > tests specifically covering reverse iterators over multi-column-family 
> > timestamped stores



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16264) Expose `producer.id.expiration.check.interval.ms` as dynamic broker configuration

2024-02-16 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817867#comment-17817867
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-16264:
--

cc [~jolshan] – as related to https://issues.apache.org/jira/browse/KAFKA-16229

> Expose `producer.id.expiration.check.interval.ms` as dynamic broker 
> configuration
> -
>
> Key: KAFKA-16264
> URL: https://issues.apache.org/jira/browse/KAFKA-16264
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> Dealing with a scenario where too many producer ids lead to issues (e.g. high 
> cpu utilization, see KAFKA-16229) put operators in need to flush producer ids 
> more promptly than usual.
> Currently, only the expiration timeout `producer.id.expiration.ms` is exposed 
> as dynamic config. This is helpful (e.g. by reducing the timeout, less 
> producer would eventually be kept in memory), but not enough if the 
> evaluation frequency is not sufficiently short to flush producer ids before 
> becoming an issue. Only by tuning both, the issue could be workaround.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16264) Expose `producer.id.expiration.check.interval.ms` as dynamic broker configuration

2024-02-16 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-16264:


 Summary: Expose `producer.id.expiration.check.interval.ms` as 
dynamic broker configuration
 Key: KAFKA-16264
 URL: https://issues.apache.org/jira/browse/KAFKA-16264
 Project: Kafka
  Issue Type: Improvement
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Dealing with a scenario where too many producer ids lead to issues (e.g. high 
cpu utilization, see KAFKA-16229) put operators in need to flush producer ids 
more promptly than usual.

Currently, only the expiration timeout `producer.id.expiration.ms` is exposed 
as dynamic config. This is helpful (e.g. by reducing the timeout, less producer 
would eventually be kept in memory), but not enough if the evaluation frequency 
is not sufficiently short to flush producer ids before becoming an issue. Only 
by tuning both, the issue could be workaround.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16229) Slow expiration of Producer IDs leading to high CPU usage

2024-02-06 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-16229:
-
Description: 
Expiration of ProducerIds is implemented with a slow removal of map keys:

```
        producers.keySet().removeAll(keys);
```

Unnecessarily going through all producer ids and then throw all expired keys to 
be removed.
This leads to exponential time on worst case when most/all keys need to be 
removed:

```
Benchmark                                        (numProducerIds)  Mode  Cnt    
       Score            Error  Units
ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3    
    9164.043 ±      10647.877  ns/op
ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3    
  341561.093 ±      20283.211  ns/op
ProducerStateManagerBench.testDeleteExpiringIds             1  avgt    3    
44957983.550 ±    9389011.290  ns/op
ProducerStateManagerBench.testDeleteExpiringIds            10  avgt    3  
5683374164.167 ± 1446242131.466  ns/op
```

A simple fix is to use map#remove(key) instead, leading to a more linear growth:

```

Benchmark                                        (numProducerIds)  Mode  Cnt    
    Score         Error  Units
ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3    
 5779.056 ±     651.389  ns/op
ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3    
61430.530 ±   21875.644  ns/op
ProducerStateManagerBench.testDeleteExpiringIds             1  avgt    3   
643887.031 ±  600475.302  ns/op
ProducerStateManagerBench.testDeleteExpiringIds            10  avgt    3  
7741689.539 ± 3218317.079  ns/op

```

  was:
Expiration of ProducerIds is implemented with a slow removal of map keys:

```
        producers.keySet().removeAll(keys);
```
 
Unnecessarily going through all producer ids and then throw all expired keys to 
be removed.
This leads to exponential time on worst case when most/all keys need to be 
removed:

```
Benchmark                                        (numProducerIds)  Mode  Cnt    
       Score            Error  Units
ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3    
    9164.043 ±      10647.877  ns/op
ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3    
  341561.093 ±      20283.211  ns/op
ProducerStateManagerBench.testDeleteExpiringIds             1  avgt    3    
44957983.550 ±    9389011.290  ns/op
ProducerStateManagerBench.testDeleteExpiringIds            10  avgt    3  
5683374164.167 ± 1446242131.466  ns/op
```

A simple fix is to use map#remove(key) instead, leading to a more linear growth:

```
Benchmark(numProducerIds)  Mode  Cnt
Score Error  Units
ProducerStateManagerBench.testDeleteExpiringIds   100  avgt3
 5779.056 ± 651.389  ns/op
ProducerStateManagerBench.testDeleteExpiringIds  1000  avgt3
61430.530 ±   21875.644  ns/op
ProducerStateManagerBench.testDeleteExpiringIds 1  avgt3   
643887.031 ±  600475.302  ns/op
ProducerStateManagerBench.testDeleteExpiringIds10  avgt3  
7741689.539 ± 3218317.079  ns/op
```


> Slow expiration of Producer IDs leading to high CPU usage
> -
>
> Key: KAFKA-16229
> URL: https://issues.apache.org/jira/browse/KAFKA-16229
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> Expiration of ProducerIds is implemented with a slow removal of map keys:
> ```
>         producers.keySet().removeAll(keys);
> ```
> Unnecessarily going through all producer ids and then throw all expired keys 
> to be removed.
> This leads to exponential time on worst case when most/all keys need to be 
> removed:
> ```
> Benchmark                                        (numProducerIds)  Mode  Cnt  
>          Score            Error  Units
> ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3  
>       9164.043 ±      10647.877  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3  
>     341561.093 ±      20283.211  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds             1  avgt    3  
>   44957983.550 ±    9389011.290  ns/op
> ProducerStateManagerBench.testDeleteExpiringIds            10  avgt    3  
> 5683374164.167 ± 1446242131.466  ns/op
> ```
> A simple fix is to use map#remove(key) instead, leading to a more linear 
> growth:
> ```
> Benchmark                                        (numProducerIds)  Mode  Cnt  
>       Score         Error  Units
> ProducerStateManagerBench.testDeleteExpiringIds      

[jira] [Created] (KAFKA-16229) Slow expiration of Producer IDs leading to high CPU usage

2024-02-06 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-16229:


 Summary: Slow expiration of Producer IDs leading to high CPU usage
 Key: KAFKA-16229
 URL: https://issues.apache.org/jira/browse/KAFKA-16229
 Project: Kafka
  Issue Type: Bug
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Expiration of ProducerIds is implemented with a slow removal of map keys:

```
        producers.keySet().removeAll(keys);
```
 
Unnecessarily going through all producer ids and then throw all expired keys to 
be removed.
This leads to exponential time on worst case when most/all keys need to be 
removed:

```
Benchmark                                        (numProducerIds)  Mode  Cnt    
       Score            Error  Units
ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3    
    9164.043 ±      10647.877  ns/op
ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3    
  341561.093 ±      20283.211  ns/op
ProducerStateManagerBench.testDeleteExpiringIds             1  avgt    3    
44957983.550 ±    9389011.290  ns/op
ProducerStateManagerBench.testDeleteExpiringIds            10  avgt    3  
5683374164.167 ± 1446242131.466  ns/op
```

A simple fix is to use map#remove(key) instead, leading to a more linear growth:

```
Benchmark(numProducerIds)  Mode  Cnt
Score Error  Units
ProducerStateManagerBench.testDeleteExpiringIds   100  avgt3
 5779.056 ± 651.389  ns/op
ProducerStateManagerBench.testDeleteExpiringIds  1000  avgt3
61430.530 ±   21875.644  ns/op
ProducerStateManagerBench.testDeleteExpiringIds 1  avgt3   
643887.031 ±  600475.302  ns/op
ProducerStateManagerBench.testDeleteExpiringIds10  avgt3  
7741689.539 ± 3218317.079  ns/op
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-4759) Add support for subnet masks in SimpleACLAuthorizer

2024-01-29 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812144#comment-17812144
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-4759:
-

Thanks [~theszym] ! I wasn't aware of the largely discussed KIP backing this 
issue and PR.

I pretty much agree with Tom[1] on how to approach further discussion, though.

As the KIP has been dormant for a while, I'd suggest either check if author 
(Sönke Liebau) has any intention to progress the original discussion, or/and 
creating a new KIP with a proposal including how to address the challenges open 
on the original KIP (e.g. backward compatibility issues).


PS. It may be a good time to bump this discussion given the current 
KRaft/Zookeeper-removal.

[1] https://lists.apache.org/thread/jp974bd05fsdwrws7kbprykdw9g65dt1

> Add support for subnet masks in SimpleACLAuthorizer
> ---
>
> Key: KAFKA-4759
> URL: https://issues.apache.org/jira/browse/KAFKA-4759
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Reporter: Shun Takebayashi
>Assignee: Shun Takebayashi
>Priority: Major
>
> SimpleACLAuthorizer currently accepts only single IP addresses.
> Supporting subnet masks with SimpleACLAuthorizer can make ACL configurations 
> simpler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15776) Update delay timeout for DelayedRemoteFetch request

2024-01-29 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812112#comment-17812112
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15776:
--

Totally agree – was just trying to highlight the implications for not 
interrupting the threads; but I'm also in favor of improving configurations to 
deal with remote fetches.

> Update delay timeout for DelayedRemoteFetch request
> ---
>
> Key: KAFKA-15776
> URL: https://issues.apache.org/jira/browse/KAFKA-15776
> Project: Kafka
>  Issue Type: Task
>Reporter: Kamal Chandraprakash
>Assignee: Kamal Chandraprakash
>Priority: Major
>
> We are reusing the {{fetch.max.wait.ms}} config as a delay timeout for 
> DelayedRemoteFetchPurgatory. {{fetch.max.wait.ms}} purpose is to wait for the 
> given amount of time when there is no data available to serve the FETCH 
> request.
> {code:java}
> The maximum amount of time the server will block before answering the fetch 
> request if there isn't sufficient data to immediately satisfy the requirement 
> given by fetch.min.bytes.
> {code}
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/DelayedRemoteFetch.scala#L41]
> Using the same timeout in the DelayedRemoteFetchPurgatory can confuse the 
> user on how to configure optimal value for each purpose. Moreover, the config 
> is of *LOW* importance and most of the users won't configure it and use the 
> default value of 500 ms.
> Having the delay timeout of 500 ms in DelayedRemoteFetchPurgatory can lead to 
> higher number of expired delayed remote fetch requests when the remote 
> storage have any degradation.
> We should introduce one {{fetch.remote.max.wait.ms}} config (preferably 
> server config) to define the delay timeout for DelayedRemoteFetch requests 
> (or) take it from client similar to {{request.timeout.ms}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-4759) Add support for subnet masks in SimpleACLAuthorizer

2024-01-27 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811577#comment-17811577
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-4759:
-

[~rgo] thanks for continuing with this effort! I think this is a useful 
enhancement and should be considered to be merged.

Though, given that it's changing a public API, I'm starting to think a KIP 
would be required for this change to be accepted. cc [~showuon] [~divijvaidya] 
to confirm if this is the case.

A benefit of opening a KIP for this is that there will be a formal discussion 
and acceptance of the change and inclusion of a new dependency (which I think 
should be mentioned explicitly in the KIP).

If a KIP is needed, I'll be happy to review and give a (non-binding) vote.

> Add support for subnet masks in SimpleACLAuthorizer
> ---
>
> Key: KAFKA-4759
> URL: https://issues.apache.org/jira/browse/KAFKA-4759
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Reporter: Shun Takebayashi
>Assignee: Shun Takebayashi
>Priority: Major
>
> SimpleACLAuthorizer currently accepts only single IP addresses.
> Supporting subnet masks with SimpleACLAuthorizer can make ACL configurations 
> simpler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15776) Update delay timeout for DelayedRemoteFetch request

2024-01-25 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811071#comment-17811071
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15776:
--

Thanks [~showuon]!

Sure, I can prepare a KIP if there's an initial agreement on the path to 
follow. Will prepare something for next week.

On not interrupting the thread:

My understanding is that currently on consumer remote fetch, requests are 
submitted to the thread pool and cancelled on timeout – only then retried. This 
means only 1 task is submitted per consumer-partition remote fetch at any time.

If we opt for not cancelling the tasks, then future would be cancelled but the 
thread will still be running until completion. On timeout, consumer will retry 
fetching, allocating yet another task on the thread pool. Potentially, we would 
have more than one task submitted per consumer-partition remote fetch, holding 
more resources than needed to deal with a single consumer-partition consumption 
from remote storage.

Let me know if it make sense. This is mostly speculation, so can dive further 
if some of my reasoning is incorrect.

> Update delay timeout for DelayedRemoteFetch request
> ---
>
> Key: KAFKA-15776
> URL: https://issues.apache.org/jira/browse/KAFKA-15776
> Project: Kafka
>  Issue Type: Task
>Reporter: Kamal Chandraprakash
>Assignee: Kamal Chandraprakash
>Priority: Major
>
> We are reusing the {{fetch.max.wait.ms}} config as a delay timeout for 
> DelayedRemoteFetchPurgatory. {{fetch.max.wait.ms}} purpose is to wait for the 
> given amount of time when there is no data available to serve the FETCH 
> request.
> {code:java}
> The maximum amount of time the server will block before answering the fetch 
> request if there isn't sufficient data to immediately satisfy the requirement 
> given by fetch.min.bytes.
> {code}
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/DelayedRemoteFetch.scala#L41]
> Using the same timeout in the DelayedRemoteFetchPurgatory can confuse the 
> user on how to configure optimal value for each purpose. Moreover, the config 
> is of *LOW* importance and most of the users won't configure it and use the 
> default value of 500 ms.
> Having the delay timeout of 500 ms in DelayedRemoteFetchPurgatory can lead to 
> higher number of expired delayed remote fetch requests when the remote 
> storage have any degradation.
> We should introduce one {{fetch.remote.max.wait.ms}} config (preferably 
> server config) to define the delay timeout for DelayedRemoteFetch requests 
> (or) take it from client similar to {{request.timeout.ms}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15776) Update delay timeout for DelayedRemoteFetch request

2024-01-22 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809592#comment-17809592
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15776:
--

Agree with [~fvisconte] that tweaking an existing config on the consumer side 
it's undesired given that Tiered Storage aims to be transparent to clients.

An additional issue even when caching fetch requests is that remote fetch 
doesn't only fetch the log segment but potentially also the offset index. 
Considering that RemoteIndexCache is a sync cache, interrupted fetches are not 
cached and potentially block consumer's progress. This has [pushed 
us|https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/pull/472] to 
build an additional async cache for indexes as a workaround.

 

Some additional thoughts on how to approach this issue:

On not interrupting the thread:

This would help on removing the flooding exceptions; but will have the effect 
of pilling up threads by potentially having more than one thread per consumer 
fetching a partition caused by retries, potentially running out of threads on 
the reader thread pool (default size = 10), causing other issues.

By the way, I can see that delayed remote fetch operation has a fixed purge 
interval of 1000 without config. Should we have a config for this one? Or 
because there is a thread pool size, there there is no need to have this 
configuration?

 

On the timeout configuration semantics:

Based on [https://github.com/apache/kafka/pull/14778#issuecomment-1820588080]

We should update and make the expectations about `fetch.max.wait.ms` explicit 
on our docs, that it should only apply to data available on local log and if 
topics are tiered, then larger latencies may apply.

We could also consider adding a new exception type for interrupted remote fetch 
operations; this way we can use it on RLM to consider the proper logging level. 
We would need to document RSM interface and request implementations to report 
interrupted exceptions properly.

 

On the remote fetch timeout configuration:

An additional configuration seems certainly needed to redefine this delayed 
operation. But only having a different configuration with a larger default 
value would help to a certain point.

Instead of a fixed timeout config, we could consider having a backoff configs 
to set the boundaries on when to start interrupting remote fetches while 
bumping (e.g. +100ms) these timeouts until an upper bound where failures will 
start to be reported to consumers. This way operators can have better 
configurations to tune. e.g. something around a lower bound of 2 seconds to 
start interrupting remote fetches and 10sec to start failing consumer requests.

Having both, these configs and a new exception type, can enable to have a 
proper handling of these exceptions by reporting them as e.g. WARN/DEBUG level 
when bellow max timeout, and fail consumer requests and log as WARN/ERROR when 
hitting upper bound.

Also, as degradation happens between broker and remote storage, this 
configuration should not be a consumer one – as consumers can't have all the 
context on how to update these values. Instead, these configurations can be on 
the broker side for operators to set them. 

 

cc [~showuon] [~satishd] 

> Update delay timeout for DelayedRemoteFetch request
> ---
>
> Key: KAFKA-15776
> URL: https://issues.apache.org/jira/browse/KAFKA-15776
> Project: Kafka
>  Issue Type: Task
>Reporter: Kamal Chandraprakash
>Assignee: Kamal Chandraprakash
>Priority: Major
>
> We are reusing the {{fetch.max.wait.ms}} config as a delay timeout for 
> DelayedRemoteFetchPurgatory. {{fetch.max.wait.ms}} purpose is to wait for the 
> given amount of time when there is no data available to serve the FETCH 
> request.
> {code:java}
> The maximum amount of time the server will block before answering the fetch 
> request if there isn't sufficient data to immediately satisfy the requirement 
> given by fetch.min.bytes.
> {code}
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/DelayedRemoteFetch.scala#L41]
> Using the same timeout in the DelayedRemoteFetchPurgatory can confuse the 
> user on how to configure optimal value for each purpose. Moreover, the config 
> is of *LOW* importance and most of the users won't configure it and use the 
> default value of 500 ms.
> Having the delay timeout of 500 ms in DelayedRemoteFetchPurgatory can lead to 
> higher number of expired delayed remote fetch requests when the remote 
> storage have any degradation.
> We should introduce one {{fetch.remote.max.wait.ms}} config (preferably 
> server config) to define the delay timeout for DelayedRemoteFetch requests 
> (or) take it from client similar to {{request.tim

[jira] [Commented] (KAFKA-15931) Cached transaction index gets closed if tiered storage read is interrupted

2024-01-21 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809148#comment-17809148
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15931:
--

[~satish.duggana] [~ckamal] I've opened a new PR to deal with this – have a 
look: https://github.com/apache/kafka/pull/15241

> Cached transaction index gets closed if tiered storage read is interrupted
> --
>
> Key: KAFKA-15931
> URL: https://issues.apache.org/jira/browse/KAFKA-15931
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.6.0
>Reporter: Ivan Yurchenko
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Minor
>
> This reproduces when reading from remote storage with the default 
> {{fetch.max.wait.ms}} (500) or lower. {{isolation.level=read_committed}} is 
> needed to trigger this.
> It's not easy to reproduce on local-only setups, unfortunately, because reads 
> are fast and aren't interrupted.
> This error is logged
> {noformat}
> [2023-11-29 14:01:01,166] ERROR Error occurred while reading the remote data 
> for topic1-0 (kafka.log.remote.RemoteLogReader)
> org.apache.kafka.common.KafkaException: Failed read position from the 
> transaction index 
>     at 
> org.apache.kafka.storage.internals.log.TransactionIndex$1.hasNext(TransactionIndex.java:235)
>     at 
> org.apache.kafka.storage.internals.log.TransactionIndex.collectAbortedTxns(TransactionIndex.java:171)
>     at 
> kafka.log.remote.RemoteLogManager.collectAbortedTransactions(RemoteLogManager.java:1359)
>     at 
> kafka.log.remote.RemoteLogManager.addAbortedTransactions(RemoteLogManager.java:1341)
>     at kafka.log.remote.RemoteLogManager.read(RemoteLogManager.java:1310)
>     at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:62)
>     at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:31)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.nio.channels.ClosedChannelException
>     at 
> java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:150)
>     at java.base/sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:325)
>     at 
> org.apache.kafka.storage.internals.log.TransactionIndex$1.hasNext(TransactionIndex.java:233)
>     ... 10 more
> {noformat}
> and after that this txn index becomes unusable until the process is restarted.
> I suspect, it's caused by the reading thread being interrupted due to the 
> fetch timeout. At least [this 
> code|https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/19fb8f93c59dfd791f62d41f332db9e306bc1422/src/java.base/share/classes/java/nio/channels/spi/AbstractInterruptibleChannel.java#L159-L160]
>  in {{AbstractInterruptibleChannel}} is called.
> Fixing may be easy: reopen the channel in {{TransactionIndex}} if it's close. 
> However, off the top of my head I can't say if there are some less obvious 
> implications of this change.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15931) Cached transaction index gets closed if tiered storage read is interrupted

2024-01-21 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya reassigned KAFKA-15931:


Assignee: Jorge Esteban Quilcate Otoya

> Cached transaction index gets closed if tiered storage read is interrupted
> --
>
> Key: KAFKA-15931
> URL: https://issues.apache.org/jira/browse/KAFKA-15931
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.6.0
>Reporter: Ivan Yurchenko
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Minor
>
> This reproduces when reading from remote storage with the default 
> {{fetch.max.wait.ms}} (500) or lower. {{isolation.level=read_committed}} is 
> needed to trigger this.
> It's not easy to reproduce on local-only setups, unfortunately, because reads 
> are fast and aren't interrupted.
> This error is logged
> {noformat}
> [2023-11-29 14:01:01,166] ERROR Error occurred while reading the remote data 
> for topic1-0 (kafka.log.remote.RemoteLogReader)
> org.apache.kafka.common.KafkaException: Failed read position from the 
> transaction index 
>     at 
> org.apache.kafka.storage.internals.log.TransactionIndex$1.hasNext(TransactionIndex.java:235)
>     at 
> org.apache.kafka.storage.internals.log.TransactionIndex.collectAbortedTxns(TransactionIndex.java:171)
>     at 
> kafka.log.remote.RemoteLogManager.collectAbortedTransactions(RemoteLogManager.java:1359)
>     at 
> kafka.log.remote.RemoteLogManager.addAbortedTransactions(RemoteLogManager.java:1341)
>     at kafka.log.remote.RemoteLogManager.read(RemoteLogManager.java:1310)
>     at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:62)
>     at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:31)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.nio.channels.ClosedChannelException
>     at 
> java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:150)
>     at java.base/sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:325)
>     at 
> org.apache.kafka.storage.internals.log.TransactionIndex$1.hasNext(TransactionIndex.java:233)
>     ... 10 more
> {noformat}
> and after that this txn index becomes unusable until the process is restarted.
> I suspect, it's caused by the reading thread being interrupted due to the 
> fetch timeout. At least [this 
> code|https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/19fb8f93c59dfd791f62d41f332db9e306bc1422/src/java.base/share/classes/java/nio/channels/spi/AbstractInterruptibleChannel.java#L159-L160]
>  in {{AbstractInterruptibleChannel}} is called.
> Fixing may be easy: reopen the channel in {{TransactionIndex}} if it's close. 
> However, off the top of my head I can't say if there are some less obvious 
> implications of this change.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15132) Implement disable & re-enablement for Tiered Storage

2023-11-23 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789243#comment-17789243
 ] 

Jorge Esteban Quilcate Otoya edited comment on KAFKA-15132 at 11/23/23 7:42 PM:


Just realized that the current behavior allows updating this configuration and 
silently logs a warning[1] that disabling is not supported.

Is this the expected behavior or should be treated as a bug until KIP-950 is 
supported? As a user, I'd expect the config  update to actually fail if a 
feature is not supported instead of getting into an unsupported state.

cc [~satishd] [~showuon] 

[1] 
[https://github.com/apache/kafka/blob/6c232b542cff9db3ecc1d0bdf6e9174960c93798/core/src/main/scala/kafka/server/ConfigHandler.scala#L73-L91]


was (Author: jeqo):
Just realized that the current behavior allows updating this configuration and 
silently logs a warning[1] that disabling is not supported.

Is this the expected behavior or should be treated as a bug? As a user, I'd 
expect the config  update to actually fail if a feature is not supported 
instead of getting into an unsupported state.

cc [~satishd] [~showuon] 

[1] 
https://github.com/apache/kafka/blob/6c232b542cff9db3ecc1d0bdf6e9174960c93798/core/src/main/scala/kafka/server/ConfigHandler.scala#L73-L91

> Implement disable & re-enablement for Tiered Storage
> 
>
> Key: KAFKA-15132
> URL: https://issues.apache.org/jira/browse/KAFKA-15132
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Divij Vaidya
>Assignee: Divij Vaidya
>Priority: Major
>  Labels: kip
> Fix For: 3.7.0
>
>
> KIP-405 [1] introduces the Tiered Storage feature in Apache Kafka. One of the 
> limitations mentioned in the KIP is inability to re-enable TS on a topic 
> after it has been disabled.
> {quote}Once tier storage is enabled for a topic, it can not be disabled. We 
> will add this feature in future versions. One possible workaround is to 
> create a new topic and copy the data from the desired offset and delete the 
> old topic. 
> {quote}
> This task will propose a new KIP which extends on KIP-405 to describe the 
> behaviour on on disablement and re-enablement of tiering storage for a topic. 
> The solution will apply for both Zk and KRaft variants.
> [1] KIP-405 - 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15132) Implement disable & re-enablement for Tiered Storage

2023-11-23 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789243#comment-17789243
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15132:
--

Just realized that the current behavior allows updating this configuration and 
silently logs a warning[1] that disabling is not supported.

Is this the expected behavior or should be treated as a bug? As a user, I'd 
expect the config  update to actually fail if a feature is not supported 
instead of getting into an unsupported state.

cc [~satishd] [~showuon] 

[1] 
https://github.com/apache/kafka/blob/6c232b542cff9db3ecc1d0bdf6e9174960c93798/core/src/main/scala/kafka/server/ConfigHandler.scala#L73-L91

> Implement disable & re-enablement for Tiered Storage
> 
>
> Key: KAFKA-15132
> URL: https://issues.apache.org/jira/browse/KAFKA-15132
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Divij Vaidya
>Assignee: Divij Vaidya
>Priority: Major
>  Labels: kip
> Fix For: 3.7.0
>
>
> KIP-405 [1] introduces the Tiered Storage feature in Apache Kafka. One of the 
> limitations mentioned in the KIP is inability to re-enable TS on a topic 
> after it has been disabled.
> {quote}Once tier storage is enabled for a topic, it can not be disabled. We 
> will add this feature in future versions. One possible workaround is to 
> create a new topic and copy the data from the desired offset and delete the 
> old topic. 
> {quote}
> This task will propose a new KIP which extends on KIP-405 to describe the 
> behaviour on on disablement and re-enablement of tiering storage for a topic. 
> The solution will apply for both Zk and KRaft variants.
> [1] KIP-405 - 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15805) Fetch Remote Indexes at once

2023-11-10 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-15805:
-
Description: 
Reduce Tiered Storage latency when fetching indexes by allowing to fetch many 
indexes at once.

KIP: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-1002%3A+Fetch+remote+segment+indexes+at+once]

  was:
Reduce Tiered Storage latency when fetching indexes by allowing to fetch many 
indexes at once.

 


> Fetch Remote Indexes at once
> 
>
> Key: KAFKA-15805
> URL: https://issues.apache.org/jira/browse/KAFKA-15805
> Project: Kafka
>  Issue Type: Improvement
>  Components: Tiered-Storage
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: kip
>
> Reduce Tiered Storage latency when fetching indexes by allowing to fetch many 
> indexes at once.
> KIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-1002%3A+Fetch+remote+segment+indexes+at+once]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15806) Signal next segment when remote fetching

2023-11-10 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-15806:
-
Description: 
Improve remote fetching performance when fetching across segment by signaling 
the next segment and allow Remote Storage Manager implementations to optimize 
their pre-fetching.

[https://cwiki.apache.org/confluence/display/KAFKA/KIP-1003%3A+Signal+next+segment+when+remote+fetching]

  was:Improve remote fetching performance when fetching across segment by 
signaling the next segment and allow Remote Storage Manager implementations to 
optimize their pre-fetching.


> Signal next segment when remote fetching 
> -
>
> Key: KAFKA-15806
> URL: https://issues.apache.org/jira/browse/KAFKA-15806
> Project: Kafka
>  Issue Type: Improvement
>  Components: Tiered-Storage
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: kip
>
> Improve remote fetching performance when fetching across segment by signaling 
> the next segment and allow Remote Storage Manager implementations to optimize 
> their pre-fetching.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-1003%3A+Signal+next+segment+when+remote+fetching]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15806) Signal next segment when remote fetching

2023-11-10 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15806:


 Summary: Signal next segment when remote fetching 
 Key: KAFKA-15806
 URL: https://issues.apache.org/jira/browse/KAFKA-15806
 Project: Kafka
  Issue Type: Improvement
  Components: Tiered-Storage
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Improve remote fetching performance when fetching across segment by signaling 
the next segment and allow Remote Storage Manager implementations to optimize 
their pre-fetching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15805) Fetch Remote Indexes at once

2023-11-10 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15805:


 Summary: Fetch Remote Indexes at once
 Key: KAFKA-15805
 URL: https://issues.apache.org/jira/browse/KAFKA-15805
 Project: Kafka
  Issue Type: Improvement
  Components: Tiered-Storage
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Reduce Tiered Storage latency when fetching indexes by allowing to fetch many 
indexes at once.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15802) Trying to access uncopied segments metadata on listOffsets

2023-11-09 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784606#comment-17784606
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15802:
--

Hi [~divijvaidya] , see [https://github.com/apache/kafka/pull/14727] for a test 
case and a quick implementation of the option 1.

We may consider option 1 fix for 3.6.1 and work on a KIP to extend RLMM API for 
3.7, what do you think?

> Trying to access uncopied segments metadata on listOffsets
> --
>
> Key: KAFKA-15802
> URL: https://issues.apache.org/jira/browse/KAFKA-15802
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.6.0
>Reporter: Francois Visconte
>Priority: Major
>
> We have a tiered storage cluster running with Aiven s3 plugin. 
> On our cluster, we have a process doing regular listOffsets requests. 
> This triggers the following exception:
> {code:java}
> org.apache.kafka.common.KafkaException: 
> org.apache.kafka.server.log.remote.storage.RemoteResourceNotFoundException: 
> Requested remote resource was not found
> at 
> org.apache.kafka.storage.internals.log.RemoteIndexCache.lambda$createCacheEntry$6(RemoteIndexCache.java:355)
> at 
> org.apache.kafka.storage.internals.log.RemoteIndexCache.loadIndexFile(RemoteIndexCache.java:318)
> Nov 09, 2023 1:42:01 PM com.github.benmanes.caffeine.cache.LocalAsyncCache 
> lambda$handleCompletion$7
> WARNING: Exception thrown during asynchronous load
> java.util.concurrent.CompletionException: 
> io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key 
> cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest
>  does not exists in storage S3Storage{bucketName='bucket', partSize=16777216}
> at 
> com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:107)
> at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
> at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760)
> at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
> at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
> at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
> at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
> at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
> Caused by: io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key 
> cluster/topic-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest
>  does not exists in storage S3Storage{bucketName='bucket', partSize=16777216}
> at io.aiven.kafka.tieredstorage.storage.s3.S3Storage.fetch(S3Storage.java:80)
> at 
> io.aiven.kafka.tieredstorage.manifest.SegmentManifestProvider.lambda$new$1(SegmentManifestProvider.java:59)
> at 
> com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:103)
> ... 7 more
> Caused by: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The 
> specified key does not exist. (Service: S3, Status Code: 404, Request ID: 
> CFMP27PVC9V2NNEM, Extended Request ID: 
> F5qqlV06qQJ5qCuWl91oueBaha0QLMBURJudnOnFDQk+YbgFcAg70JBATcARDxN44DGo+PpfZHAsum+ioYMoOw==)
> at 
> software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125)
> at 
> software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82)
> at 
> software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60)
> at 
> software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41)
> at 
> software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
> at 
> software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
> at 
> software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
> at 
> software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72)
> at 
> software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
> at 
> software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
> at 
> software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute

[jira] [Created] (KAFKA-15314) No Quota applied if client-id is null or empty

2023-08-08 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15314:


 Summary: No Quota applied if client-id is null or empty
 Key: KAFKA-15314
 URL: https://issues.apache.org/jira/browse/KAFKA-15314
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


When Quotas where proposed, KIP-13[1] stated:

>  In addition, there will be a quota reserved for clients not presenting a 
>client id (for e.g. simple consumers not setting the id). This will default to 
>an empty client id ("") and all such clients will share the quota for that 
>empty id (which should be the default quota).

Though, seems that when client-id is null or empty and a default quota for 
client-id is present, no quota is applied.

Even though Java clients set a default value [2][3], the protocol accepts null 
client-id[4], and other clients implementations could send a null value to 
by-pass a quota.

Related code[5][6] shows that preparing metric pair for quotas with client-id 
(potentially null) and setting quota to null when both client-id and (sanitize) 
user are null.

Adding some tests to showcase this: 
[https://github.com/apache/kafka/pull/14165] 

 

Is it expected for client-id=null to by-pass quotas? If it is, then KIP or 
documentation to clarify this; otherwise we should amend this behavior bug. e.g 
we could "sanitize" client-id similar to user name to be empty string when 
input is null or empty.


 

As a side-note, similar behavior could happen with user I guess. Even though 
value is default to ANONYMOUS, if a client implementation sends empty value, it 
may as well by-pass the default quota – though I need to further test this once 
this is considered a bug.

 

[1]: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas] 


[2]: 
[https://github.com/apache/kafka/blob/e98508747acc8972ac5ceb921e0fd3a7d7bd5e9c/clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java#L498-L508]
 


[3]: 
[https://github.com/apache/kafka/blob/ab71c56973518bac8e1868eccdc40b17d7da35c1/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java#L616-L628]

[4]: 
[https://github.com/apache/kafka/blob/9f26906fcc2fd095b7d27c504e342b9a8d619b4b/clients/src/main/resources/common/message/RequestHeader.json#L34-L40]
 


[5]: 
[https://github.com/apache/kafka/blob/322ac86ba282f35373382854cc9e790e4b7fb5fc/core/src/main/scala/kafka/server/ClientQuotaManager.scala#L588-L628]
 


[6]: 
[https://github.com/apache/kafka/blob/322ac86ba282f35373382854cc9e790e4b7fb5fc/core/src/main/scala/kafka/server/ClientQuotaManager.scala#L651-L652]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15231) Add ability to pause/resume Remote Log Manager tasks

2023-07-21 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745620#comment-17745620
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15231:
--

Thanks [~divijvaidya]. Of course if remote tier becomes unavailable, pausing 
won't be of much help.

I'm thinking more of scenarios where uploading wants to be paused, but fetching 
should continue. For instance: network/storage saturation where reads want to 
be prioritized over writes, or storage migration where writing wants to be 
paused until new storage is provided.

I may be jumping to far ahead here, but seems like a potentially useful feature 
to consider.

> Add ability to pause/resume Remote Log Manager tasks 
> -
>
> Key: KAFKA-15231
> URL: https://issues.apache.org/jira/browse/KAFKA-15231
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> Once Tiered Storage is enabled, there may be situations where needed to pause 
> uploading tasks to a remote-tier. e.g. remote storage maintenance, 
> troubleshooting, etc.
> An RSM implementation may not be able to do this by itself without throwing 
> exceptions, polluting the logs, etc.
> Could we consider adding this ability to the Tiered Storage framework? Remote 
> Log Manager seems like a good candidate place for this; though I'm wondering 
> on how to expose it.
> Would be interested to hear if this sounds like a good idea, and what options 
> we have to include these.
> We have been considering extending RLM tasks with a pause flag, and having an 
> MBean to switch them on demand. Another option may be to extend the Kafka 
> protocol to expose this – but seems much moved involved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15231) Add ability to pause/resume Remote Log Manager tasks

2023-07-21 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745600#comment-17745600
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15231:
--

cc [~satishd] [~showuon] [~divijvaidya]

> Add ability to pause/resume Remote Log Manager tasks 
> -
>
> Key: KAFKA-15231
> URL: https://issues.apache.org/jira/browse/KAFKA-15231
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> Once Tiered Storage is enabled, there may be situations where needed to pause 
> uploading tasks to a remote-tier. e.g. remote storage maintenance, 
> troubleshooting, etc.
> An RSM implementation may not be able to do this by itself without throwing 
> exceptions, polluting the logs, etc.
> Could we consider adding this ability to the Tiered Storage framework? Remote 
> Log Manager seems like a good candidate place for this; though I'm wondering 
> on how to expose it.
> Would be interested to hear if this sounds like a good idea, and what options 
> we have to include these.
> We have been considering extending RLM tasks with a pause flag, and having an 
> MBean to switch them on demand. Another option may be to extend the Kafka 
> protocol to expose this – but seems much moved involved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15231) Add ability to pause/resume Remote Log Manager tasks

2023-07-21 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15231:


 Summary: Add ability to pause/resume Remote Log Manager tasks 
 Key: KAFKA-15231
 URL: https://issues.apache.org/jira/browse/KAFKA-15231
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


Once Tiered Storage is enabled, there may be situations where needed to pause 
uploading tasks to a remote-tier. e.g. remote storage maintenance, 
troubleshooting, etc.

An RSM implementation may not be able to do this by itself without throwing 
exceptions, polluting the logs, etc.

Could we consider adding this ability to the Tiered Storage framework? Remote 
Log Manager seems like a good candidate place for this; though I'm wondering on 
how to expose it.

Would be interested to hear if this sounds like a good idea, and what options 
we have to include these.

We have been considering extending RLM tasks with a pause flag, and having an 
MBean to switch them on demand. Another option may be to extend the Kafka 
protocol to expose this – but seems much moved involved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15158) Add metrics for RemoteRequestsPerSec

2023-07-17 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-15158:
-
Description: 
Add the following metrics for better observability into the RemoteLog related 
activities inside the broker.

1. RemoteWriteRequestsPerSec

2. RemoteDeleteRequestsPerSec

3. BuildRemoteLogAuxStateRequestsPerSec

 

These metrics will be calculated at topic level (we can add them at 
brokerTopicStats)

-*RemoteWriteRequestsPerSec* will be marked on every call to RemoteLogManager#-
-copyLogSegmentsToRemote()-  already covered by KAFKA-14953
 
*RemoteDeleteRequestsPerSec* will be marked on every call to 
RemoteLogManager#cleanupExpiredRemoteLogSegments(). This method is introduced 
in [https://github.com/apache/kafka/pull/13561] 

*BuildRemoteLogAuxStateRequestsPerSec* will be marked on every call to 
ReplicaFetcherTierStateMachine#buildRemoteLogAuxState()

 

 

(Note that this requires a change in KIP-405 and hence, must be approved by KIP 
author [~satishd] )
 

  was:
Add the following metrics for better observability into the RemoteLog related 
activities inside the broker.

1. RemoteWriteRequestsPerSec

2. RemoteDeleteRequestsPerSec

3. BuildRemoteLogAuxStateRequestsPerSec

 

These metrics will be calculated at topic level (we can add them at 
brokerTopicStats)

*RemoteWriteRequestsPerSec* will be marked on every call to RemoteLogManager#
copyLogSegmentsToRemote()
 
*RemoteDeleteRequestsPerSec* will be marked on every call to 
RemoteLogManager#cleanupExpiredRemoteLogSegments(). This method is introduced 
in [https://github.com/apache/kafka/pull/13561] 

*BuildRemoteLogAuxStateRequestsPerSec* will be marked on every call to 
ReplicaFetcherTierStateMachine#buildRemoteLogAuxState()

 

 

(Note that this requires a change in KIP-405 and hence, must be approved by KIP 
author [~satishd] )
 


> Add metrics for RemoteRequestsPerSec
> 
>
> Key: KAFKA-15158
> URL: https://issues.apache.org/jira/browse/KAFKA-15158
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Priority: Major
> Fix For: 3.6.0
>
>
> Add the following metrics for better observability into the RemoteLog related 
> activities inside the broker.
> 1. RemoteWriteRequestsPerSec
> 2. RemoteDeleteRequestsPerSec
> 3. BuildRemoteLogAuxStateRequestsPerSec
>  
> These metrics will be calculated at topic level (we can add them at 
> brokerTopicStats)
> -*RemoteWriteRequestsPerSec* will be marked on every call to 
> RemoteLogManager#-
> -copyLogSegmentsToRemote()-  already covered by KAFKA-14953
>  
> *RemoteDeleteRequestsPerSec* will be marked on every call to 
> RemoteLogManager#cleanupExpiredRemoteLogSegments(). This method is introduced 
> in [https://github.com/apache/kafka/pull/13561] 
> *BuildRemoteLogAuxStateRequestsPerSec* will be marked on every call to 
> ReplicaFetcherTierStateMachine#buildRemoteLogAuxState()
>  
>  
> (Note that this requires a change in KIP-405 and hence, must be approved by 
> KIP author [~satishd] )
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15181) Race condition on partition assigned to TopicBasedRemoteLogMetadataManager

2023-07-17 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743662#comment-17743662
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15181:
--

Sure. The TBRLMM is not the one not recovering, but the Replica Fetcher. 

My understanding is that this issue happens when a Replica is recovering its 
state after being offline. TBRLMM receives partitions assigned, starts managed, 
and is marked as initialized and open to receive requests; however the 
consumers may have not catch up yet and cache is not ready.

As this calls to the cache happen on the replica fetcher; not the consumer 
fetch; exceptions/retries may not be covering this.

The suggested approach is to respond to partitions assigned the same way 
writing events is handled, by acknowledging the consumer has catch up 
(therefore cache is ready).

> Race condition on partition assigned to TopicBasedRemoteLogMetadataManager 
> ---
>
> Key: KAFKA-15181
> URL: https://issues.apache.org/jira/browse/KAFKA-15181
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> TopicBasedRemoteLogMetadataManager (TBRLMM) uses a cache to be prepared 
> whever partitions are assigned.
> When partitions are assigned to the TBRLMM instance, a consumer is started to 
> keep the cache up to date.
> If the cache hasn't finalized to build, TBRLMM fails to return remote 
> metadata about partitions that are store on the backing topic. TBRLMM may not 
> recover from this failing state.
> A proposal to fix this issue would be wait after a partition is assigned for 
> the consumer to catch up. A similar logic is used at the moment when TBRLMM 
> writes to the topic, and uses send callback to wait for consumer to catch up. 
> This logic can be reused whever a partition is assigned, so when TBRLMM is 
> marked as initialized, cache is ready to serve requests.
> Reference: https://github.com/aiven/kafka/issues/33



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15181) Race condition on partition assigned to TopicBasedRemoteLogMetadataManager

2023-07-14 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-15181:
-
Parent: KAFKA-7739
Issue Type: Sub-task  (was: Bug)

> Race condition on partition assigned to TopicBasedRemoteLogMetadataManager 
> ---
>
> Key: KAFKA-15181
> URL: https://issues.apache.org/jira/browse/KAFKA-15181
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> TopicBasedRemoteLogMetadataManager (TBRLMM) uses a cache to be prepared 
> whever partitions are assigned.
> When partitions are assigned to the TBRLMM instance, a consumer is started to 
> keep the cache up to date.
> If the cache hasn't finalized to build, TBRLMM fails to return remote 
> metadata about partitions that are store on the backing topic. TBRLMM may not 
> recover from this failing state.
> A proposal to fix this issue would be wait after a partition is assigned for 
> the consumer to catch up. A similar logic is used at the moment when TBRLMM 
> writes to the topic, and uses send callback to wait for consumer to catch up. 
> This logic can be reused whever a partition is assigned, so when TBRLMM is 
> marked as initialized, cache is ready to serve requests.
> Reference: https://github.com/aiven/kafka/issues/33



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15181) Race condition on partition assigned to TopicBasedRemoteLogMetadataManager

2023-07-12 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15181:


 Summary: Race condition on partition assigned to 
TopicBasedRemoteLogMetadataManager 
 Key: KAFKA-15181
 URL: https://issues.apache.org/jira/browse/KAFKA-15181
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


TopicBasedRemoteLogMetadataManager (TBRLMM) uses a cache to be prepared whever 
partitions are assigned.

When partitions are assigned to the TBRLMM instance, a consumer is started to 
keep the cache up to date.

If the cache hasn't finalized to build, TBRLMM fails to return remote metadata 
about partitions that are store on the backing topic. TBRLMM may not recover 
from this failing state.

A proposal to fix this issue would be wait after a partition is assigned for 
the consumer to catch up. A similar logic is used at the moment when TBRLMM 
writes to the topic, and uses send callback to wait for consumer to catch up. 
This logic can be reused whever a partition is assigned, so when TBRLMM is 
marked as initialized, cache is ready to serve requests.


Reference: https://github.com/aiven/kafka/issues/33



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15147) Measure pending and outstanding Remote Segment operations

2023-07-07 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740952#comment-17740952
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15147:
--

[~satish.duggana] thanks! Yes, something like tier-lag could help us to achieve 
something similar, though it would only account for outstanding uploads. If lag 
doesn't reduce over period of time, then trigger alert.

Would it make sense to have a tier-retention-lag to account for outstanding 
remote delete operations?

Is tier-lag metric documented on the KIP? I can't find it on the current 
version or Jira tickets. Would be ideal to have it tagged by topic-partition.

> Measure pending and outstanding Remote Segment operations
> -
>
> Key: KAFKA-15147
> URL: https://issues.apache.org/jira/browse/KAFKA-15147
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> Remote Log Segment operations (copy/delete) are executed by the Remote 
> Storage Manager, and recorded by Remote Log Metadata Manager (e.g. default 
> TopicBasedRLMM writes to the internal Kafka topic state changes on remote log 
> segments).
> As executions run, fail, and retry; it will be important to know how many 
> operations are pending and outstanding over time to alert operators.
> Pending operations are not enough to alert, as values can oscillate closer to 
> zero. An additional condition needs to apply (running time > threshold) to 
> consider an operation outstanding.
> Proposal:
> RemoteLogManager could be extended with 2 concurrent maps 
> (pendingSegmentCopies, pendingSegmentDeletes) `Map[Uuid, Long]` to measure 
> segmentId time when operation started, and based on this expose 2 metrics per 
> operation:
>  * pendingSegmentCopies: gauge of pendingSegmentCopies map
>  * outstandingSegmentCopies: loop over pending ops, and if now - startedTime 
> > timeout, then outstanding++ (maybe on debug level?)
> Is this a valuable metric to add to Tiered Storage? or better to solve on a 
> custom RLMM implementation?
> Also, does it require a KIP?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15158) Add metrics for RemoteRequestsPerSec

2023-07-06 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740890#comment-17740890
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15158:
--

Thanks [~divijvaidya] , this cover most of the metrics proposed here 
https://issues.apache.org/jira/browse/KAFKA-14953 on the latest comments.

Could we also consider adding `RemoteDeleteErrorPerSec` here?

About discussing the naming issues with `RemoteBytesIn|Out` I can create a new 
ticket to discuss it. wdyt?

> Add metrics for RemoteRequestsPerSec
> 
>
> Key: KAFKA-15158
> URL: https://issues.apache.org/jira/browse/KAFKA-15158
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Priority: Major
> Fix For: 3.6.0
>
>
> Add the following metrics for better observability into the RemoteLog related 
> activities inside the broker.
> 1. RemoteWriteRequestsPerSec
> 2. RemoteDeleteRequestsPerSec
> 3. BuildRemoteLogAuxStateRequestsPerSec
>  
> These metrics will be calculated at topic level (we can add them at 
> brokerTopicStats)
> *RemoteWriteRequestsPerSec* will be marked on every call to RemoteLogManager#
> copyLogSegmentsToRemote()
>  
> *RemoteDeleteRequestsPerSec* will be marked on every call to 
> RemoteLogManager#cleanupExpiredRemoteLogSegments(). This method is introduced 
> in [https://github.com/apache/kafka/pull/13561] 
> *BuildRemoteLogAuxStateRequestsPerSec* will be marked on every call to 
> ReplicaFetcherTierStateMachine#buildRemoteLogAuxState()
>  
>  
> (Note that this requires a change in KIP-405 and hence, must be approved by 
> KIP author [~satishd] )
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15147) Measure pending and outstanding Remote Segment operations

2023-07-05 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740169#comment-17740169
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15147:
--

cc [~showuon] [~divijvaidya] [~satishd] , would appreciate your feedback. Many 
thanks!

> Measure pending and outstanding Remote Segment operations
> -
>
> Key: KAFKA-15147
> URL: https://issues.apache.org/jira/browse/KAFKA-15147
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> Remote Log Segment operations (copy/delete) are executed by the Remote 
> Storage Manager, and recorded by Remote Log Metadata Manager (e.g. default 
> TopicBasedRLMM writes to the internal Kafka topic state changes on remote log 
> segments).
> As executions run, fail, and retry; it will be important to know how many 
> operations are pending and outstanding over time to alert operators.
> Pending operations are not enough to alert, as values can oscillate closer to 
> zero. An additional condition needs to apply (running time > threshold) to 
> consider an operation outstanding.
> Proposal:
> RemoteLogManager could be extended with 2 concurrent maps 
> (pendingSegmentCopies, pendingSegmentDeletes) `Map[Uuid, Long]` to measure 
> segmentId time when operation started, and based on this expose 2 metrics per 
> operation:
>  * pendingSegmentCopies: gauge of pendingSegmentCopies map
>  * outstandingSegmentCopies: loop over pending ops, and if now - startedTime 
> > timeout, then outstanding++ (maybe on debug level?)
> Is this a valuable metric to add to Tiered Storage? or better to solve on a 
> custom RLMM implementation?
> Also, does it require a KIP?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15147) Measure pending and outstanding Remote Segment operations

2023-07-05 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15147:


 Summary: Measure pending and outstanding Remote Segment operations
 Key: KAFKA-15147
 URL: https://issues.apache.org/jira/browse/KAFKA-15147
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


Remote Log Segment operations (copy/delete) are executed by the Remote Storage 
Manager, and recorded by Remote Log Metadata Manager (e.g. default 
TopicBasedRLMM writes to the internal Kafka topic state changes on remote log 
segments).

As executions run, fail, and retry; it will be important to know how many 
operations are pending and outstanding over time to alert operators.

Pending operations are not enough to alert, as values can oscillate closer to 
zero. An additional condition needs to apply (running time > threshold) to 
consider an operation outstanding.

Proposal:

RemoteLogManager could be extended with 2 concurrent maps 
(pendingSegmentCopies, pendingSegmentDeletes) `Map[Uuid, Long]` to measure 
segmentId time when operation started, and based on this expose 2 metrics per 
operation:
 * pendingSegmentCopies: gauge of pendingSegmentCopies map
 * outstandingSegmentCopies: loop over pending ops, and if now - startedTime > 
timeout, then outstanding++ (maybe on debug level?)

Is this a valuable metric to add to Tiered Storage? or better to solve on a 
custom RLMM implementation?

Also, does it require a KIP?

Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14953) Add metrics for tiered storage

2023-07-03 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739661#comment-17739661
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-14953:
--

Finally, it may be worth adding metrics for the retention related actions as 
well: `RemoteDeleteRequestsPerSec`, `RemoteDeleteErrorPerSec`, though we may 
add this within or after https://issues.apache.org/jira/browse/KAFKA-14888.

> Add metrics for tiered storage
> --
>
> Key: KAFKA-14953
> URL: https://issues.apache.org/jira/browse/KAFKA-14953
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Abhijeet Kumar
>Priority: Minor
>
> Not just for expired fetch. We also need to add all the metrics described in 
> KIP-405
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-NewMetrics
>  
> ref: [https://github.com/apache/kafka/pull/13535#discussion_r1180286031] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14953) Add metrics for tiered storage

2023-07-03 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739655#comment-17739655
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-14953:
--

I also find confusing the semantics of `RemoteBytesIn|Out`:

In regular Kafka, BytesIn means bytes written to the log, and BytesOut are 
bytes read from the log; but with TS:
 * RemoteBytesIn means "Number of bytes read from remote storage per second." 
and
 * RemoteBytesOut means "Number of bytes copied to remote storage per second."

which is the opposite.

Would it make sense to align these metric names?

Another option may be to not trying to reuse the names, and use more specific 
metric names: "RemoteCopyBytes", "RemoteFetchBytes" instead?

> Add metrics for tiered storage
> --
>
> Key: KAFKA-14953
> URL: https://issues.apache.org/jira/browse/KAFKA-14953
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Abhijeet Kumar
>Priority: Minor
>
> Not just for expired fetch. We also need to add all the metrics described in 
> KIP-405
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-NewMetrics
>  
> ref: [https://github.com/apache/kafka/pull/13535#discussion_r1180286031] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14953) Add metrics for tiered storage

2023-07-03 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739647#comment-17739647
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-14953:
--

[~abhijeetkumar] thanks for working on this PR!

Looking into the metrics available, seems that we are missing 
`{color:#00}RemoteWriteRequestsPerSec{color}` (there is 
`{color:#00}RemoteWriteErrorPerSec{color}`and 
`{color:#00}RemoteReadRequestsPerSec{color}`, I don't see why not adding 
it). I think we can make a small amend to the KIP, and ship ti on the same PR, 
wdyt? cc [~showuon] 

> Add metrics for tiered storage
> --
>
> Key: KAFKA-14953
> URL: https://issues.apache.org/jira/browse/KAFKA-14953
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Abhijeet Kumar
>Priority: Minor
>
> Not just for expired fetch. We also need to add all the metrics described in 
> KIP-405
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-NewMetrics
>  
> ref: [https://github.com/apache/kafka/pull/13535#discussion_r1180286031] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15142) Add Client Metadata to RemoteStorageFetchInfo

2023-07-03 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739600#comment-17739600
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15142:
--

cc [~showuon] [~divijvaidya] [~satishd] 

> Add Client Metadata to RemoteStorageFetchInfo
> -
>
> Key: KAFKA-15142
> URL: https://issues.apache.org/jira/browse/KAFKA-15142
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> Once Tiered Storage is deployed, it will be important to understand how 
> remote data is accessed and what consumption patterns emerge on each 
> deployment.
> To do this, tiered storage logs/metrics could provide more context about 
> which client is fetching which partition/offset range and when.
> At the moment, Client metadata is not propagated to the tiered-storage 
> framework. To fix this, {{RemoteStorageFetchInfo}} can be extended with 
> {{Optional[ClientMetadata]}} available on {{{}FetchParams{}}}, and have this 
> bits of data available to improve the logging/metrics when fetching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15142) Add Client Metadata to RemoteStorageFetchInfo

2023-07-03 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15142:


 Summary: Add Client Metadata to RemoteStorageFetchInfo
 Key: KAFKA-15142
 URL: https://issues.apache.org/jira/browse/KAFKA-15142
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Once Tiered Storage is deployed, it will be important to understand how remote 
data is accessed and what consumption patterns emerge on each deployment.

To do this, tiered storage logs/metrics could provide more context about which 
client is fetching which partition/offset range and when.

At the moment, Client metadata is not propagated to the tiered-storage 
framework. To fix this, {{RemoteStorageFetchInfo}} can be extended with 
{{Optional[ClientMetadata]}} available on {{{}FetchParams{}}}, and have this 
bits of data available to improve the logging/metrics when fetching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15131) Improve RemoteStorageManager exception handling documentation

2023-07-03 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya resolved KAFKA-15131.
--
Resolution: Fixed

> Improve RemoteStorageManager exception handling documentation
> -
>
> Key: KAFKA-15131
> URL: https://issues.apache.org/jira/browse/KAFKA-15131
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> As discussed here[1], RemoteStorageManager javadocs requires clarification 
> regarding error handling:
>  * Remove ambiguity on `RemoteResourceNotFoundException` description
>  * Describe when `RemoteResourceNotFoundException` can/should be thrown
>  * Describe expectations for idempotent operations when copying/deleting
>  
> [1] 
> https://issues.apache.org/jira/browse/KAFKA-7739?focusedCommentId=17720936&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17720936



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15135) RLM listener configurations passed but ignored by RLMM

2023-06-30 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15135:


 Summary: RLM listener configurations passed but ignored by RLMM
 Key: KAFKA-15135
 URL: https://issues.apache.org/jira/browse/KAFKA-15135
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


As describe here [1] properties captured from listener are passed but ignored 
by TopicBasedRLMM.

 

[1] https://github.com/apache/kafka/pull/13828#issuecomment-1611155345



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15135) RLM listener configurations passed but ignored by RLMM

2023-06-30 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya reassigned KAFKA-15135:


Assignee: Jorge Esteban Quilcate Otoya

> RLM listener configurations passed but ignored by RLMM
> --
>
> Key: KAFKA-15135
> URL: https://issues.apache.org/jira/browse/KAFKA-15135
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> As describe here [1] properties captured from listener are passed but ignored 
> by TopicBasedRLMM.
>  
> [1] https://github.com/apache/kafka/pull/13828#issuecomment-1611155345



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15083) Passing "remote.log.metadata.*" configs into RLMM

2023-06-30 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738945#comment-17738945
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-15083:
--

[~showuon] wondering if this issue[1] is meant to be resolved by this ticket.

[1] https://github.com/apache/kafka/pull/13828#issuecomment-1611155345

> Passing "remote.log.metadata.*" configs into RLMM
> -
>
> Key: KAFKA-15083
> URL: https://issues.apache.org/jira/browse/KAFKA-15083
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Major
>
> Based on the 
> [KIP-405|https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-Configs]:
> |_{color:#00}remote.log.metadata.*{color}_|{color:#00}Default RLMM 
> implementation creates producer and consumer instances. Common client 
> propoerties can be configured with `remote.log.metadata.common.client.` 
> prefix.  User can also pass properties specific to 
> {color}{color:#00}producer/consumer with `remote.log.metadata.producer.` 
> and `remote.log.metadata.consumer.` prefixes. These will override properties 
> with `remote.log.metadata.common.client.` prefix.{color}
> {color:#00}Any other properties should be prefixed with 
> "remote.log.metadata." and these will be passed to 
> RemoteLogMetadataManager#configure(Map props).{color}
> {color:#00}For ex: Security configuration to connect to the local broker 
> for the listener name configured are passed with props.{color}|
>  
> This is missed from current implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15131) Improve RemoteStorageManager exception handling documentation

2023-06-28 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-15131:
-
Summary: Improve RemoteStorageManager exception handling documentation  
(was: Improve RemoteStorageManager exception handling)

> Improve RemoteStorageManager exception handling documentation
> -
>
> Key: KAFKA-15131
> URL: https://issues.apache.org/jira/browse/KAFKA-15131
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> As discussed here[1], RemoteStorageManager javadocs requires clarification 
> regarding error handling:
>  * Remove ambiguity on `RemoteResourceNotFoundException` description
>  * Describe when `RemoteResourceNotFoundException` can/should be thrown
>  * Describe expectations for idempotent operations when copying/deleting
>  
> [1] 
> https://issues.apache.org/jira/browse/KAFKA-7739?focusedCommentId=17720936&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17720936



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14993) Improve TransactionIndex instance handling while copying to and fetching from RSM.

2023-06-28 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738033#comment-17738033
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-14993:
--

[~ckamal] just checking if you have done any progress on this one already. If 
still open, I'd like to help contributing this fix. 
Let me now, cheers!

> Improve TransactionIndex instance handling while copying to and fetching from 
> RSM.
> --
>
> Key: KAFKA-14993
> URL: https://issues.apache.org/jira/browse/KAFKA-14993
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Reporter: Satish Duggana
>Assignee: Kamal Chandraprakash
>Priority: Major
>
> RSM should throw a ResourceNotFoundException if it does not have 
> TransactionIndex. Currently, it expects an empty InputStream and creates an 
> unnecessary file in the cache. This can be avoided by catching 
> ResourceNotFoundException and not creating an instance. There are minor 
> cleanups needed in RemoteIndexCache and other TransactionIndex usages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15131) Improve RemoteStorageManager exception handling

2023-06-28 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15131:


 Summary: Improve RemoteStorageManager exception handling
 Key: KAFKA-15131
 URL: https://issues.apache.org/jira/browse/KAFKA-15131
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


As discussed here[1], RemoteStorageManager javadocs requires clarification 
regarding error handling:
 * Remove ambiguity on `RemoteResourceNotFoundException` description
 * Describe when `RemoteResourceNotFoundException` can/should be thrown
 * Describe expectations for idempotent operations when copying/deleting

 

[1] 
https://issues.apache.org/jira/browse/KAFKA-7739?focusedCommentId=17720936&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17720936



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-7739) Kafka Tiered Storage

2023-06-21 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735962#comment-17735962
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-7739:
-

Awesome, will keep an eye on that one. Thanks [~satish.duggana] !

> Kafka Tiered Storage
> 
>
> Key: KAFKA-7739
> URL: https://issues.apache.org/jira/browse/KAFKA-7739
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Harsha
>Assignee: Satish Duggana
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.6.0
>
>
> KIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-7739) Kafka Tiered Storage

2023-06-20 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735299#comment-17735299
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-7739:
-

Thanks [~showuon]. My impression was that there's some work already planned for 
this. Will wait for [~satish.duggana] feedback to confirm this. Cheers!

> Kafka Tiered Storage
> 
>
> Key: KAFKA-7739
> URL: https://issues.apache.org/jira/browse/KAFKA-7739
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Harsha
>Assignee: Satish Duggana
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.6.0
>
>
> KIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14953) Add metrics for tiered storage

2023-06-20 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735298#comment-17735298
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-14953:
--

[~showuon] I pressed add to soon, I got confused with the PR referred on the 
ticket description that is merged already, but it's only a reference.

Anyway, thanks [~abhijeetkumar] for the update!

> Add metrics for tiered storage
> --
>
> Key: KAFKA-14953
> URL: https://issues.apache.org/jira/browse/KAFKA-14953
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Abhijeet Kumar
>Priority: Minor
>
> Not just for expired fetch. We also need to add all the metrics described in 
> KIP-405
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-NewMetrics
>  
> ref: [https://github.com/apache/kafka/pull/13535#discussion_r1180286031] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14953) Add metrics for tiered storage

2023-06-20 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735200#comment-17735200
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-14953:
--

I think this one is merged, should we close this ticket?

> Add metrics for tiered storage
> --
>
> Key: KAFKA-14953
> URL: https://issues.apache.org/jira/browse/KAFKA-14953
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Abhijeet Kumar
>Priority: Minor
>
> Not just for expired fetch. We also need to add all the metrics described in 
> KIP-405
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-NewMetrics
>  
> ref: [https://github.com/apache/kafka/pull/13535#discussion_r1180286031] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-7739) Kafka Tiered Storage

2023-06-20 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735197#comment-17735197
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-7739:
-

[~satish.duggana] I remember a follow up PR related to remote fetching: 
[https://github.com/apache/kafka/pull/13535#discussion_r1171250580]

Thinking about 3.6 release, are these improvement going to be considered? Also, 
do we have a Jira ticket to track it?

Many thanks!

> Kafka Tiered Storage
> 
>
> Key: KAFKA-7739
> URL: https://issues.apache.org/jira/browse/KAFKA-7739
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Harsha
>Assignee: Satish Duggana
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.6.0
>
>
> KIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15051) docs: add missing connector plugin endpoint to documentation

2023-06-02 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15051:


 Summary: docs: add missing connector plugin endpoint to 
documentation
 Key: KAFKA-15051
 URL: https://issues.apache.org/jira/browse/KAFKA-15051
 Project: Kafka
  Issue Type: Task
  Components: docs, documentation
Reporter: Jorge Esteban Quilcate Otoya


GET /plugin/config endpoint added in 
[KIP-769|https://cwiki.apache.org/confluence/display/KAFKA/KIP-769%3A+Connect+APIs+to+list+all+connector+plugins+and+retrieve+their+configuration+definitions]
 is not included in the connect documentation page: 
https://kafka.apache.org/documentation/#connect_rest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15013) KIP-934: Add DeleteTopicPolicy

2023-05-22 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-15013:
-
Labels: kip  (was: )

> KIP-934: Add DeleteTopicPolicy
> --
>
> Key: KAFKA-15013
> URL: https://issues.apache.org/jira/browse/KAFKA-15013
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: kip
>
> KIP: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-934%3A+Add+DeleteTopicPolicy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15014) KIP-935: Extend AlterConfigPolicy with existing configurations

2023-05-22 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-15014:
-
Labels: kip  (was: )

>  KIP-935: Extend AlterConfigPolicy with existing configurations 
> 
>
> Key: KAFKA-15014
> URL: https://issues.apache.org/jira/browse/KAFKA-15014
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: kip
>
> KIP: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-935%3A+Extend+AlterConfigPolicy+with+existing+configurations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15014) KIP-935: Extend AlterConfigPolicy with existing configurations

2023-05-22 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15014:


 Summary:  KIP-935: Extend AlterConfigPolicy with existing 
configurations 
 Key: KAFKA-15014
 URL: https://issues.apache.org/jira/browse/KAFKA-15014
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-935%3A+Extend+AlterConfigPolicy+with+existing+configurations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15013) KIP-934: Add DeleteTopicPolicy

2023-05-22 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15013:


 Summary: KIP-934: Add DeleteTopicPolicy
 Key: KAFKA-15013
 URL: https://issues.apache.org/jira/browse/KAFKA-15013
 Project: Kafka
  Issue Type: New Feature
  Components: core
Reporter: Jorge Esteban Quilcate Otoya


KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-934%3A+Add+DeleteTopicPolicy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-7739) Kafka Tiered Storage

2023-05-15 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722747#comment-17722747
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-7739:
-

Thanks all!

Another question regarding copy/delete RSM operations: I couldn't found any 
expectation on idempotency for these operations.

`copyLogSegmentData` mentions something related:

> Invoker of this API should always send a unique id as part of \{@link 
> RemoteLogSegmentMetadata#remoteLogSegmentId()} even when it retries to invoke 
> this method for the same log segment data.

Though this only confirms that retries will contain the same id.

`deleteLogSegmentData` is more ambiguous:

> Deletion is considered as successful if this call returns successfully 
> without any errors.

but

> RemoteResourceNotFoundException if the requested resource is not found

So, if retried should it fail?

My guess is that both should be idempotent, but could be worth call out these 
expectations explicitly in the KIP/API.

> Kafka Tiered Storage
> 
>
> Key: KAFKA-7739
> URL: https://issues.apache.org/jira/browse/KAFKA-7739
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Harsha
>Assignee: Satish Duggana
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.6.0
>
>
> KIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-7739) Kafka Tiered Storage

2023-05-09 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720955#comment-17720955
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-7739:
-

Yes, we know transactional index is optional when copying but not when 
fetching. I agree throwing the exception should be correct, but the javadoc is 
ambiguous as it refers to "no resources associated with the given 
remoteLogSegmentMetadata" that may mean _all_ resources, instead of "the 
requested resource is not found in the remote storage" or similar.

> Kafka Tiered Storage
> 
>
> Key: KAFKA-7739
> URL: https://issues.apache.org/jira/browse/KAFKA-7739
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Harsha
>Assignee: Satish Duggana
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.6.0
>
>
> KIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-7739) Kafka Tiered Storage

2023-05-09 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720936#comment-17720936
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-7739:
-

[~satish.duggana] could you help clarifying the following:

RemoteStorageManager#fetchIndex and RemoteStorageManager#fetchLogSegment state:

>      * @throws RemoteResourceNotFoundException when there are no resources 
> associated with the given remoteLogSegmentMetadata.

Though when fetching indexes: Transactional Index is optional. So when 
returning a response should it be null? or throw 
RemoteResourceNotFoundException?

>From the javadoc is unclear whether resources means a specific object/index or 
>the whole segment log + metadata.

> Kafka Tiered Storage
> 
>
> Key: KAFKA-7739
> URL: https://issues.apache.org/jira/browse/KAFKA-7739
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Harsha
>Assignee: Satish Duggana
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.6.0
>
>
> KIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14843) Connector plugins config endpoint does not include Common configs

2023-03-23 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14843:


 Summary: Connector plugins config endpoint does not include Common 
configs
 Key: KAFKA-14843
 URL: https://issues.apache.org/jira/browse/KAFKA-14843
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 3.2.0
Reporter: Jorge Esteban Quilcate Otoya


Connector plugins GET config endpoint introduced in 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-769%3A+Connect+APIs+to+list+all+connector+plugins+and+retrieve+their+configuration+definitions|https://cwiki.apache.org/confluence/display/KAFKA/KIP-769%3A+Connect+APIs+to+list+all+connector+plugins+and+retrieve+their+configuration+definitions,]
 allows to get plugin configuration from the rest endpoint.

This configuration only includes the plugin configuration, but not the base 
configuration of the Sink/Source Connector.

For instance, when validating the configuration of a plugin, _all_ configs are 
returned:

```

curl -s 
$CONNECT_URL/connector-plugins/io.aiven.kafka.connect.http.HttpSinkConnector/config
 | jq -r '.[].name' | sort -u | wc -l     
21

curl -s 
$CONNECT_URL/connector-plugins/io.aiven.kafka.connect.http.HttpSinkConnector/config/validate
 -XPUT -H 'Content-type: application/json' --data "\{\"connector.class\": 
\"io.aiven.kafka.connect.http.HttpSinkConnector\", \"topics\": 
\"example-topic-name\"}" | jq -r '.configs[].definition.name' | sort -u | wc -l
39

```

and the missing configs are all from base config:

```

diff validate.txt config.txt                                                    
                                                
6,14d5
< config.action.reload
< connector.class
< errors.deadletterqueue.context.headers.enable
< errors.deadletterqueue.topic.name
< errors.deadletterqueue.topic.replication.factor
< errors.log.enable
< errors.log.include.messages
< errors.retry.delay.max.ms
< errors.retry.timeout
16d6
< header.converter
24d13
< key.converter
26d14
< name
33d20
< predicates
35,39d21
< tasks.max
< topics
< topics.regex
< transforms
< value.converter

```

Would be great to get the base configs from the same endpoint as well, so we 
could rely on it instead of using the validate endpoint to get all configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14815) Move Kafka documentation to Markdown/Hugo

2023-03-16 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14815:


 Summary: Move Kafka documentation to Markdown/Hugo 
 Key: KAFKA-14815
 URL: https://issues.apache.org/jira/browse/KAFKA-14815
 Project: Kafka
  Issue Type: Task
  Components: documentation
Reporter: Jorge Esteban Quilcate Otoya


Follow up from https://issues.apache.org/jira/browse/KAFKA-2967

Creating this task to discuss the adoption of Markdown and Hugo, and replace 
the HTML code.

The reasons to move away from HTML are outlined in KAFKA-2967.

Markdown and Asciidoc are both alternatives, but Markdown has been used, as I 
found some blockers when trying to migrate to Asciidoc:
 * Hugo requires Asciidoctor as additional binary (Markdown supported ootb)
 * To use Asciidoctor some security policies need to be opened: 
[https://stackoverflow.com/questions/71058236/hugo-with-asciidoctor]
 * I haven't managed to use shortcodes in Asciidoctor to inject versions, 
tables, etc.

Given the lower friction of Markdown and default support from Hugo, I would 
like to propose using these. Though I'm open to collaborate if someone has 
experience with Asciidoc and Hugo to make the migration as there is an existing 
interest to use Asciidoc if possible.

Draft repo: https://github.com/jeqo/ak-docs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-2967) Move Kafka documentation to ReStructuredText

2023-03-09 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698597#comment-17698597
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-2967:
-

[~tombentley] , to alleviate a bit the broken links issue, we could have some 
of the main headers in the index page to reproduce the anchors and have the 
link of each section below. Something like this:

!https://raw.githubusercontent.com/jeqo/ak-docs/main/index_redirects.png!

> Move Kafka documentation to ReStructuredText
> 
>
> Key: KAFKA-2967
> URL: https://issues.apache.org/jira/browse/KAFKA-2967
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>Priority: Major
>  Labels: documentation
>
> Storing documentation as HTML is kind of BS :)
> * Formatting is a pain, and making it look good is even worse
> * Its just HTML, can't generate PDFs
> * Reading and editting is painful
> * Validating changes is hard because our formatting relies on all kinds of 
> Apache Server features.
> I suggest:
> * Move to RST
> * Generate HTML and PDF during build using Sphinx plugin for Gradle.
> Lots of Apache projects are doing this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-2967) Move Kafka documentation to ReStructuredText

2023-03-08 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698118#comment-17698118
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-2967:
-

thanks Tom!

Agree. The fragmentation brings the cost of breaking the old references to 
`/documentation#...` . Probably some redirection will be possible (e.g. to H2 
headers), but some of these will start getting complex to map (e.g. 
docs#brokerconfigs_bootstrap.servers to 
docs/configuration#brokerconfigs_bootstrap.servers).

It goes outside my experience on how the Apache server we use to host the 
documentation is able to cope with these redirections. If someone from the 
community has experience tweaking this for our website, I'd be happy to pair 
and see how we can make this work as close as possible to the current links.

Another consideration is that static-site will have links to the specific 
version (e.g. /35/documentation), so as soon as you get into /documentation all 
internal links will refer to //documentation. – something 
similar to Flink docs [https://nightlies.apache.org/flink/flink-docs-stable/]

 

About Markdown and Asciidoc, I did a quick try migrating docs from Markdown to 
Asciidoc using Pandoc and worked fine, though the website with Hugo had some 
small glitches -- but all in all seems easy to migrate. The approach to start 
with Markdown and let the doors open for Asciidoc if we found enough reasons to 
move sounds good to me as well.

> Move Kafka documentation to ReStructuredText
> 
>
> Key: KAFKA-2967
> URL: https://issues.apache.org/jira/browse/KAFKA-2967
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>Priority: Major
>  Labels: documentation
>
> Storing documentation as HTML is kind of BS :)
> * Formatting is a pain, and making it look good is even worse
> * Its just HTML, can't generate PDFs
> * Reading and editting is painful
> * Validating changes is hard because our formatting relies on all kinds of 
> Apache Server features.
> I suggest:
> * Move to RST
> * Generate HTML and PDF during build using Sphinx plugin for Gradle.
> Lots of Apache projects are doing this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-2967) Move Kafka documentation to ReStructuredText

2023-02-27 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694156#comment-17694156
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-2967:
-

Thanks [~mimaison]! Haven't tried it, but now that docs are in markdown it 
should be straightforward to use something like pandoc to migrate them to 
asciidoc.

I picked Markdown based on the latest comments that seem to agree using it, and 
asciidoc interest sparked on 2018 without much follow up. Now that there is a 
tool that support both may be worth revisiting this decision. 

We already use markdown within the project for readmes and so, so adding 
another markup would need to be weighed if it really adds value. 

> Move Kafka documentation to ReStructuredText
> 
>
> Key: KAFKA-2967
> URL: https://issues.apache.org/jira/browse/KAFKA-2967
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>Priority: Major
>  Labels: documentation
>
> Storing documentation as HTML is kind of BS :)
> * Formatting is a pain, and making it look good is even worse
> * Its just HTML, can't generate PDFs
> * Reading and editting is painful
> * Validating changes is hard because our formatting relies on all kinds of 
> Apache Server features.
> I suggest:
> * Move to RST
> * Generate HTML and PDF during build using Sphinx plugin for Gradle.
> Lots of Apache projects are doing this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-2967) Move Kafka documentation to ReStructuredText

2023-02-25 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693546#comment-17693546
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-2967:
-

Hi everyone. I'd love to see this migration happening to improve contributions 
and user-experience around Kafka documentation.

I took James's idea of using Pandoc a bit further and put it together with Hugo 
(similar to what Flink does), and migrated most of the documentation under 
`docs/` to Markdown: [https://github.com/jeqo/ak-docs]

Hopefully the readme is good enough for anyone who wants to test it. Did some 
initial integration with kafka-site to check how that would look.

There are some style minor details (e.g. URLs, image sizing, kafka streams 
navigation, generate JSON instead of HTML on Config/Metrics, etc) that could be 
discussed in a following issue, but I'd like to get feedback at this point to 
see if the migration seems promising and what the next steps could be.

> Move Kafka documentation to ReStructuredText
> 
>
> Key: KAFKA-2967
> URL: https://issues.apache.org/jira/browse/KAFKA-2967
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
>Priority: Major
>  Labels: documentation
>
> Storing documentation as HTML is kind of BS :)
> * Formatting is a pain, and making it look good is even worse
> * Its just HTML, can't generate PDFs
> * Reading and editting is painful
> * Validating changes is hard because our formatting relies on all kinds of 
> Apache Server features.
> I suggest:
> * Move to RST
> * Generate HTML and PDF during build using Sphinx plugin for Gradle.
> Lots of Apache projects are doing this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14441) Benchmark performance impact of metrics library

2022-12-04 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14441:


 Summary: Benchmark performance impact of metrics library
 Key: KAFKA-14441
 URL: https://issues.apache.org/jira/browse/KAFKA-14441
 Project: Kafka
  Issue Type: Task
  Components: metrics
Reporter: Jorge Esteban Quilcate Otoya


While discussing KIP-864 
([https://lists.apache.org/thread/k6rh2mr7pg94935fgpqw8b5fj308f2n7]) there is a 
concern on how much impact there is on sampling metric values, particularly 
when adding metrics that record values per-record instead of per-batch.

By adding benchmarks for sampling values, there will be more confidence whether 
to design metrics to be exposed at a DEBUG or INFO level depending on their 
impact.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14409) Clean ProcessorParameters from casting

2022-11-19 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14409:


 Summary: Clean ProcessorParameters from casting
 Key: KAFKA-14409
 URL: https://issues.apache.org/jira/browse/KAFKA-14409
 Project: Kafka
  Issue Type: Improvement
Reporter: Jorge Esteban Quilcate Otoya


ProcessorParameters currently includes a set of methods to cast to specific 
supplier types:
 * kTableSourceSupplier
 * kTableProcessorSupplier
 * kTableKTableJoinMergerProcessorSupplier

As most of these are used on specific classes, and the usage assumptions may 
vary (some expect nulls and other don't), this ticket proposes to remove these 
methods and move the casting into the class using it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14408) Consider enabling DEBUG log level on tests for Streams

2022-11-19 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14408:


 Summary: Consider enabling DEBUG log level on tests for Streams
 Key: KAFKA-14408
 URL: https://issues.apache.org/jira/browse/KAFKA-14408
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Jorge Esteban Quilcate Otoya


Ideally logging should not trigger any side effect, though we have found that 
it did for https://issues.apache.org/jira/browse/KAFKA-14325.

This ticket is to request if we should consider enabling higher logging levels 
(currently INFO) during tests to validate these paths.

There may be some additional costs on log file sizes and verbosity, so it's 
open to discuss if this is worth it or not, and whether to expand this to other 
components as well.

Additional discussion: 
https://github.com/apache/kafka/pull/12859#discussion_r1027007714



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14325) NullPointer in ProcessorParameters.toString

2022-11-15 Thread Jorge Esteban Quilcate Otoya (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634529#comment-17634529
 ] 

Jorge Esteban Quilcate Otoya commented on KAFKA-14325:
--

Hi there. Of course, looking at this now.

Looking at why this wasn't triggered before:
 * Logging is on debug level  
[https://github.com/apache/kafka/blob/21a15c6b1f1ee80f163633ba617ad381f5edc0c1/streams/src/main/java/org/apache/kafka/streams/kstream/internals/InternalStreamsBuilder.java#L292-L294]
 ** but logging in tests is only at INFO level: 
[https://github.com/apache/kafka/blob/46bee5bcf3e6877079111cbfd91a2fbaf3975a98/streams/src/test/resources/log4j.properties#L30]
 

Turning this config to DEBUG is causing a bunch of tests to fail: 6216 tests 
completed, 256 failed, 1 skipped (locally)

MockApiProcessorSupplier is causing some strange issues, e.g. 
org.apache.kafka.streams.StreamsBuilderTest#shouldMergeStreams: 
java.lang.AssertionError: expected:<1> but was:<5>

Will have to debug further.

Also, added a couple of additional castings to this class: 
https://github.com/apache/kafka/pull/12859/files

> NullPointer in ProcessorParameters.toString
> ---
>
> Key: KAFKA-14325
> URL: https://issues.apache.org/jira/browse/KAFKA-14325
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Martin Hyldgaard
>Assignee: A. Sophie Blee-Goldman
>Priority: Minor
>
> After switching from {{Transformer}} to using {{FixedKeyProcessor}} I get 
> some NullPointer exceptions logged. It seems to be due to 
> {{ProcessorParameters::toString}} using it's field {{processorSupplier}} 
> [without a null 
> check|https://github.com/apache/kafka/blob/0a045d4ef7d67dbe35b8fd2e1c51df87af19a0ab/streams/src/main/java/org/apache/kafka/streams/kstream/internals/graph/ProcessorParameters.java#L133].
>  After KAFKA-13654: Extend KStream process with new Processor API 
> ([#11993|https://github.com/apache/kafka/pull/11993]) this field can be null 
> and instead the field {{fixedKeyProcessorSupplier}} is set.
>  
> Stack trace
> {code:java}
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.kafka.streams.processor.api.ProcessorSupplier.get()" because 
> "this.processorSupplier" is null
>     at 
> org.apache.kafka.streams.kstream.internals.graph.ProcessorParameters.toString(ProcessorParameters.java:133)
>     at java.base/java.lang.String.valueOf(String.java:4218)
>     at java.base/java.lang.StringBuilder.append(StringBuilder.java:173)
>     at 
> org.apache.kafka.streams.kstream.internals.graph.ProcessorGraphNode.toString(ProcessorGraphNode.java:52)
>     at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:277)
>     at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:249)
>     at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:211)
>     at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:161)
>     at 
> ch.qos.logback.classic.spi.LoggingEvent.getFormattedMessage(LoggingEvent.java:293)
>     at 
> ch.qos.logback.classic.spi.LoggingEvent.prepareForDeferredProcessing(LoggingEvent.java:206)
>     at 
> ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:223)
>     at 
> ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:102)
>     at 
> ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
>     at 
> ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
>     at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
>     at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
>     at 
> ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
>     at ch.qos.logback.classic.Logger.filterAndLog_2(Logger.java:414)
>     at ch.qos.logback.classic.Logger.debug(Logger.java:490)
>     at 
> org.apache.kafka.streams.kstream.internals.InternalStreamsBuilder.buildAndOptimizeTopology(InternalStreamsBuilder.java:293)
>     at org.apache.kafka.streams.StreamsBuilder.build(StreamsBuilder.java:628) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-13656) Connect Transforms support for nested structures

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya reassigned KAFKA-13656:


Assignee: Jorge Esteban Quilcate Otoya

> Connect Transforms support for nested structures
> 
>
> Key: KAFKA-13656
> URL: https://issues.apache.org/jira/browse/KAFKA-13656
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: connect-transformation, needs-kip
>
> Single Message Transforms (SMT), 
> [KIP-66|https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect],
>  have greatly improved Connector's usability by enabling processing 
> input/output data without the need for additional streaming applications. 
> These benefits have been limited as most SMT implementations are limited to 
> fields available on the root structure:
>  * https://issues.apache.org/jira/browse/KAFKA-7624
>  * https://issues.apache.org/jira/browse/KAFKA-10640
> Therefore, this KIP is aimed to include support for nested structures on the 
> existing SMTs — where this make sense —, and to include the abstractions to 
> reuse this in future SMTs.
>  
> KIP: https://cwiki.apache.org/confluence/x/BafkCw



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14231) Support for nested structures: ReplaceField

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14231:


 Summary: Support for nested structures: ReplaceField
 Key: KAFKA-14231
 URL: https://issues.apache.org/jira/browse/KAFKA-14231
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14230) Support for nested structures: Cast

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14230:


 Summary: Support for nested structures: Cast
 Key: KAFKA-14230
 URL: https://issues.apache.org/jira/browse/KAFKA-14230
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14232) Support for nested structures: InsertField

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14232:


 Summary: Support for nested structures: InsertField
 Key: KAFKA-14232
 URL: https://issues.apache.org/jira/browse/KAFKA-14232
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14228) Support for nested structures: ValueToKey

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14228:


 Summary: Support for nested structures: ValueToKey
 Key: KAFKA-14228
 URL: https://issues.apache.org/jira/browse/KAFKA-14228
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14229) Support for nested structures: HoistValue

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14229:


 Summary: Support for nested structures: HoistValue
 Key: KAFKA-14229
 URL: https://issues.apache.org/jira/browse/KAFKA-14229
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14227) Support for nested structures: MaskField

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14227:


 Summary: Support for nested structures: MaskField
 Key: KAFKA-14227
 URL: https://issues.apache.org/jira/browse/KAFKA-14227
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14226) Introduce support for nested structures

2022-09-13 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14226:


 Summary: Introduce support for nested structures
 Key: KAFKA-14226
 URL: https://issues.apache.org/jira/browse/KAFKA-14226
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Abstraction for FieldPath and initial SMTs:
 * ExtractField
 * HeaderFrom
 * TimestampConverter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14191) Add end-to-end latency metrics to Connectors

2022-08-30 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-14191:


 Summary: Add end-to-end latency metrics to Connectors
 Key: KAFKA-14191
 URL: https://issues.apache.org/jira/browse/KAFKA-14191
 Project: Kafka
  Issue Type: Improvement
  Components: metrics
Reporter: Jorge Esteban Quilcate Otoya


Request to add latency metrics to connectors to measure transformation latency 
and e2e latency on the sink side.

KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-13822) Update Kafka Streams Adjust Thread Count tests to new Processor API

2022-04-12 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-13822:


 Summary: Update Kafka Streams Adjust Thread Count tests to new 
Processor API
 Key: KAFKA-13822
 URL: https://issues.apache.org/jira/browse/KAFKA-13822
 Project: Kafka
  Issue Type: Task
Reporter: Jorge Esteban Quilcate Otoya


h4. 
Once KIP-820 is merged and release, AdjustStreamThreadCountTest[1] will be 
using deprecated APIs: 
[https://github.com/apache/kafka/pull/11993#discussion_r847769618|https://github.com/apache/kafka/pull/11993#discussion_r847744046]

[1] 
[https://github.com/apache/kafka/blob/0d518aaed158896ee9ee6949b8f38128d1d73634/streams/src/test/java/org/apache/kafka/streams/integration/AdjustStreamThreadCountTest.java|https://github.com/apache/kafka/blob/0d518aaed158896ee9ee6949b8f38128d1d73634/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountTransformerDemo.java]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13821) Update Kafka Streams WordCount demo to new Processor API

2022-04-12 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-13821:
-
Priority: Minor  (was: Major)

> Update Kafka Streams WordCount demo to new Processor API
> 
>
> Key: KAFKA-13821
> URL: https://issues.apache.org/jira/browse/KAFKA-13821
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Affects Versions: 3.3.0
>Reporter: Jorge Esteban Quilcate Otoya
>Priority: Minor
>
> Once KIP-820 is merged and release, WordCount[1] demo will be using 
> deprecated APIs: 
> [https://github.com/apache/kafka/pull/11993#discussion_r847744046]
> [1] 
> https://github.com/apache/kafka/blob/0d518aaed158896ee9ee6949b8f38128d1d73634/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountTransformerDemo.java



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13821) Update Kafka Streams WordCount demo to new Processor API

2022-04-12 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-13821:
-
Summary: Update Kafka Streams WordCount demo to new Processor API  (was: 
Update Kafka Streams demo to new Processor API)

> Update Kafka Streams WordCount demo to new Processor API
> 
>
> Key: KAFKA-13821
> URL: https://issues.apache.org/jira/browse/KAFKA-13821
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Affects Versions: 3.3.0
>Reporter: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> Once KIP-820 is merged and release, WordCount[1] demo will be using 
> deprecated APIs: 
> [https://github.com/apache/kafka/pull/11993#discussion_r847744046]
> [1] 
> https://github.com/apache/kafka/blob/0d518aaed158896ee9ee6949b8f38128d1d73634/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountTransformerDemo.java



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13821) Update Kafka Streams demo to new Processor API

2022-04-12 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya updated KAFKA-13821:
-
Description: 
Once KIP-820 is merged and release, WordCount[1] demo will be using deprecated 
APIs: [https://github.com/apache/kafka/pull/11993#discussion_r847744046]

[1] 
https://github.com/apache/kafka/blob/0d518aaed158896ee9ee6949b8f38128d1d73634/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountTransformerDemo.java

  was:Once KIP-820 is merged and release, demo will be using deprecated APIs: 
https://github.com/apache/kafka/pull/11993#discussion_r847744046


> Update Kafka Streams demo to new Processor API
> --
>
> Key: KAFKA-13821
> URL: https://issues.apache.org/jira/browse/KAFKA-13821
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Affects Versions: 3.3.0
>Reporter: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> Once KIP-820 is merged and release, WordCount[1] demo will be using 
> deprecated APIs: 
> [https://github.com/apache/kafka/pull/11993#discussion_r847744046]
> [1] 
> https://github.com/apache/kafka/blob/0d518aaed158896ee9ee6949b8f38128d1d73634/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountTransformerDemo.java



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13821) Update Kafka Streams demo to new Processor API

2022-04-12 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-13821:


 Summary: Update Kafka Streams demo to new Processor API
 Key: KAFKA-13821
 URL: https://issues.apache.org/jira/browse/KAFKA-13821
 Project: Kafka
  Issue Type: Task
  Components: streams
Affects Versions: 3.3.0
Reporter: Jorge Esteban Quilcate Otoya


Once KIP-820 is merged and release, demo will be using deprecated APIs: 
https://github.com/apache/kafka/pull/11993#discussion_r847744046



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   >