[jira] [Commented] (KAFKA-16538) UpdateFeatures for kraft.version

2024-08-23 Thread zhengke zhou (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876305#comment-17876305
 ] 

zhengke zhou commented on KAFKA-16538:
--

Hi @José Armando García Sancio, I'm currently tracking this ticket for a while 
which appears to belong to

supported 
features[[https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes#KIP853:KRaftControllerMembershipChanges-Supportedfeatures].]
 If no one starts, I hope to pick up, and I will submit the idea soon.

> UpdateFeatures for kraft.version
> 
>
> Key: KAFKA-16538
> URL: https://issues.apache.org/jira/browse/KAFKA-16538
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: José Armando García Sancio
>Priority: Major
>
> Should:
>  # Route request to cluster metadata kraft client.
>  # KRaft leader should check the supported version of all voters and observers
>  ## voter information comes from VoterSet
>  ## observer information is push down to kraft by the metadata controller
>  # Persist both the kraft.version and voter set in one control batch
> We need to allow for the kraft.version to succeed while the metadata 
> controller changes may fail. This is needed because there will be two batches 
> for this updates. One control record batch which includes kraft.version and 
> voter set, and one metadata batch which includes the feature records.
>  
> This change should also improve the handling of UpdateVoter to allow the 
> request when the kraft.version is 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-17264) Audit validation of the new RPCs (AddVoter, RemoveVoter and UpdateVoter)

2024-08-13 Thread zhengke zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengke zhou reassigned KAFKA-17264:


Assignee: zhengke zhou

> Audit validation of the new RPCs (AddVoter, RemoveVoter and UpdateVoter)
> 
>
> Key: KAFKA-17264
> URL: https://issues.apache.org/jira/browse/KAFKA-17264
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: José Armando García Sancio
>Assignee: zhengke zhou
>Priority: Major
>
> It is possible the the kafka raft client driver thread throws uncaught 
> exceptions for some inputs. KRaft should instead be catching this validation 
> errors and returning INVALID_REQUEST errors in the response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-17264) Audit validation of the new RPCs (AddVoter, RemoveVoter and UpdateVoter)

2024-08-12 Thread zhengke zhou (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872852#comment-17872852
 ] 

zhengke zhou commented on KAFKA-17264:
--

[~jsancio] , 

In KPI-853, for the AddVoter#request, RemoveVoter#request, and 
UpdateVoter#request, I've noticed that many fields currently do not exist in 
the Kafka design. For example:
{code:java}
// AddVoter#request
{
  "apiKey": 77,
  "type": "request",
  "listeners": ["controller", "broker"],
  "name": "RemoveVoterRequest",
  "validVersions": "0",
  "flexibleVersions": "0+",
  "fields": [
...
{ "name": "TopicName", "type": "string", "versions": "0+", "entityType": 
"topicName",
  "about": "The name of the topic" },
{ "name": "TopicId", "type": "uuid", "versions": "0+",
  "about": "The unique topic ID" },
{ "name": "Partition", "type": "int32", "versions": "0+",
  "about": "The partition index" },
...
  ]
} {code}
I believe it would be prudent to add validation to ensure consistency, such as 
checking that the listeners specified in the add, remove, and update requests 
actually exist. Additionally, for fields like {{{}Listener#port{}}}, there 
should be validation to ensure values fall within valid ranges (e.g., 
{{Listener#port}} should be between 0 and 65535).

Could you please assign this task to me? Thank you!

> Audit validation of the new RPCs (AddVoter, RemoveVoter and UpdateVoter)
> 
>
> Key: KAFKA-17264
> URL: https://issues.apache.org/jira/browse/KAFKA-17264
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: José Armando García Sancio
>Priority: Major
>
> It is possible the the kafka raft client driver thread throws uncaught 
> exceptions for some inputs. KRaft should instead be catching this validation 
> errors and returning INVALID_REQUEST errors in the response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16263) Add Kafka Streams docs about available listeners/callback

2024-07-29 Thread zhengke zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengke zhou reassigned KAFKA-16263:


Assignee: zhengke zhou  (was: Kuan Po Tseng)

> Add Kafka Streams docs about available listeners/callback
> -
>
> Key: KAFKA-16263
> URL: https://issues.apache.org/jira/browse/KAFKA-16263
> Project: Kafka
>  Issue Type: Task
>  Components: docs, streams
>Reporter: Matthias J. Sax
>Assignee: zhengke zhou
>Priority: Minor
>  Labels: beginner, newbie
>
> Kafka Streams allows to register all kind of listeners and callback (eg, 
> uncaught-exception-handler, restore-listeners, etc) but those are not in the 
> documentation.
> A good place might be 
> [https://kafka.apache.org/documentation/streams/developer-guide/running-app.html]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-17103) MockClient tight loops when no metadata is present in KafkaProducerTest

2024-07-24 Thread zhengke zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengke zhou reassigned KAFKA-17103:


Assignee: zhengke zhou

> MockClient tight loops when no metadata is present in KafkaProducerTest
> ---
>
> Key: KAFKA-17103
> URL: https://issues.apache.org/jira/browse/KAFKA-17103
> Project: Kafka
>  Issue Type: Test
>  Components: clients
>Reporter: Greg Harris
>Assignee: zhengke zhou
>Priority: Minor
>  Labels: newbie
>
> MockClient can throw this exception:
> {noformat}
> [2024-07-09 13:19:33,574] ERROR [Producer clientId=producer-1] Uncaught error 
> in kafka producer I/O thread:  
> (org.apache.kafka.clients.producer.internals.Sender:253)
> java.lang.IllegalStateException: No previous metadata update to use
>     at 
> org.apache.kafka.clients.MockClient$DefaultMockMetadataUpdater.updateWithCurrentMetadata(MockClient.java:700)
>     at org.apache.kafka.clients.MockClient.poll(MockClient.java:323)
>     at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:349)
>     at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:251)
>     at java.base/java.lang.Thread.run(Thread.java:829){noformat}
> This happens whenever the MockClient.DefaultMockMetadataUpdater#lastUpdate 
> variable is null, which is the case whenever there has not been any mock 
> metadata update.
> In practice, this appears to happen in 
> KafkaProducerTest#testTopicExpiryInMetadata, 
> KafkaProducerTest#testTopicRefreshInMetadata, 
> KafkaProducerTest#testTopicNotExistingInMetadata. These three tests also use 
> busy-waiting to have another thread wait for the metadata update request and 
> then provide the metadata update.
> We should find some mechanism to allow mocking these metadata updates that 
> avoids this busy waiting/tight looping pattern, as it introduces an 
> opportunity for nondeterminism and wastes CPU cycles.
> The solution should remove any usage of Thread.sleep/Utils.sleep/Time.SYSTEM.
> Ideally the solution wouldn't require an additional test thread, but this is 
> optional.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-17103) MockClient tight loops when no metadata is present in KafkaProducerTest

2024-07-24 Thread zhengke zhou (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868458#comment-17868458
 ] 

zhengke zhou commented on KAFKA-17103:
--

Hi [~gharris1727] I would like to try this.

> MockClient tight loops when no metadata is present in KafkaProducerTest
> ---
>
> Key: KAFKA-17103
> URL: https://issues.apache.org/jira/browse/KAFKA-17103
> Project: Kafka
>  Issue Type: Test
>  Components: clients
>Reporter: Greg Harris
>Priority: Minor
>  Labels: newbie
>
> MockClient can throw this exception:
> {noformat}
> [2024-07-09 13:19:33,574] ERROR [Producer clientId=producer-1] Uncaught error 
> in kafka producer I/O thread:  
> (org.apache.kafka.clients.producer.internals.Sender:253)
> java.lang.IllegalStateException: No previous metadata update to use
>     at 
> org.apache.kafka.clients.MockClient$DefaultMockMetadataUpdater.updateWithCurrentMetadata(MockClient.java:700)
>     at org.apache.kafka.clients.MockClient.poll(MockClient.java:323)
>     at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:349)
>     at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:251)
>     at java.base/java.lang.Thread.run(Thread.java:829){noformat}
> This happens whenever the MockClient.DefaultMockMetadataUpdater#lastUpdate 
> variable is null, which is the case whenever there has not been any mock 
> metadata update.
> In practice, this appears to happen in 
> KafkaProducerTest#testTopicExpiryInMetadata, 
> KafkaProducerTest#testTopicRefreshInMetadata, 
> KafkaProducerTest#testTopicNotExistingInMetadata. These three tests also use 
> busy-waiting to have another thread wait for the metadata update request and 
> then provide the metadata update.
> We should find some mechanism to allow mocking these metadata updates that 
> avoids this busy waiting/tight looping pattern, as it introduces an 
> opportunity for nondeterminism and wastes CPU cycles.
> The solution should remove any usage of Thread.sleep/Utils.sleep/Time.SYSTEM.
> Ideally the solution wouldn't require an additional test thread, but this is 
> optional.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16263) Add Kafka Streams docs about available listeners/callback

2024-07-24 Thread zhengke zhou (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868370#comment-17868370
 ] 

zhengke zhou commented on KAFKA-16263:
--

Hi [~mjsax], Can i pick up this ticket?

> Add Kafka Streams docs about available listeners/callback
> -
>
> Key: KAFKA-16263
> URL: https://issues.apache.org/jira/browse/KAFKA-16263
> Project: Kafka
>  Issue Type: Task
>  Components: docs, streams
>Reporter: Matthias J. Sax
>Assignee: Kuan Po Tseng
>Priority: Minor
>  Labels: beginner, newbie
>
> Kafka Streams allows to register all kind of listeners and callback (eg, 
> uncaught-exception-handler, restore-listeners, etc) but those are not in the 
> documentation.
> A good place might be 
> [https://kafka.apache.org/documentation/streams/developer-guide/running-app.html]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16765) NioEchoServer leaks accepted SocketChannel instances due to race condition

2024-06-30 Thread zhengke zhou (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860996#comment-17860996
 ] 

zhengke zhou commented on KAFKA-16765:
--

[Greg 
Harris|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gharris1727] 
In this ticket, we want to avoid the situation where the Producer: 
acceptorThread creates a channel and adds it to SocketChannels that will never 
be consumed by the main thread. To resolve this, we can clear up 
closeSocketChannels after acceptorThread exited.

> NioEchoServer leaks accepted SocketChannel instances due to race condition
> --
>
> Key: KAFKA-16765
> URL: https://issues.apache.org/jira/browse/KAFKA-16765
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 3.8.0
>Reporter: Greg Harris
>Assignee: zhengke zhou
>Priority: Minor
>  Labels: newbie
>
> The NioEchoServer has an AcceptorThread that calls accept() to open new 
> SocketChannel instances and insert them into the `newChannels` List, and a 
> main thread that drains the `newChannels` List and moves them to the 
> `socketChannels` List.
> During shutdown, the serverSocketChannel is closed, which causes both threads 
> to exit their while loops. It is possible for the NioEchoServer main thread 
> to sense the serverSocketChannel close and terminate before the Acceptor 
> thread does, and for the Acceptor thread to put a SocketChannel in 
> `newChannels` before terminating. This instance is never closed by either 
> thread, because it is never moved to `socketChannels`.
> A precise execution order that has this leak is:
> 1. NioEchoServer thread locks `newChannels`.
> 2. Acceptor thread accept() completes, and the SocketChannel is created
> 3. Acceptor thread blocks waiting for the `newChannels` lock
> 4. NioEchoServer thread releases the `newChannels` lock and does some 
> processing
> 5. NioEchoServer#close() is called, which closes the serverSocketChannel
> 6. NioEchoServer thread checks serverSocketChannel.isOpen() and then 
> terminates
> 7. Acceptor thread acquires the `newChannels` lock and adds the SocketChannel 
> to `newChannels`.
> 8. Acceptor thread checks serverSocketChannel.isOpen() and then terminates.
> 9. NioEchoServer#close() stops blocking now that both other threads have 
> terminated.
> The end result is that the leaked socket is left open in the `newChannels` 
> list at the end of close(), which is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (KAFKA-16765) NioEchoServer leaks accepted SocketChannel instances due to race condition

2024-06-30 Thread zhengke zhou (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-16765 ]


zhengke zhou deleted comment on KAFKA-16765:
--

was (Author: JIRAUSER305940):
[~gharris1727]  We want to avoid the situation where the Producer: 
acceptorThread creates a channel and adds it to SocketChannels that will never 
be consumed by the main thread.
To resolve this, we can clear up closeSocketChannels after acceptorThread 
exited.

> NioEchoServer leaks accepted SocketChannel instances due to race condition
> --
>
> Key: KAFKA-16765
> URL: https://issues.apache.org/jira/browse/KAFKA-16765
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 3.8.0
>Reporter: Greg Harris
>Assignee: zhengke zhou
>Priority: Minor
>  Labels: newbie
>
> The NioEchoServer has an AcceptorThread that calls accept() to open new 
> SocketChannel instances and insert them into the `newChannels` List, and a 
> main thread that drains the `newChannels` List and moves them to the 
> `socketChannels` List.
> During shutdown, the serverSocketChannel is closed, which causes both threads 
> to exit their while loops. It is possible for the NioEchoServer main thread 
> to sense the serverSocketChannel close and terminate before the Acceptor 
> thread does, and for the Acceptor thread to put a SocketChannel in 
> `newChannels` before terminating. This instance is never closed by either 
> thread, because it is never moved to `socketChannels`.
> A precise execution order that has this leak is:
> 1. NioEchoServer thread locks `newChannels`.
> 2. Acceptor thread accept() completes, and the SocketChannel is created
> 3. Acceptor thread blocks waiting for the `newChannels` lock
> 4. NioEchoServer thread releases the `newChannels` lock and does some 
> processing
> 5. NioEchoServer#close() is called, which closes the serverSocketChannel
> 6. NioEchoServer thread checks serverSocketChannel.isOpen() and then 
> terminates
> 7. Acceptor thread acquires the `newChannels` lock and adds the SocketChannel 
> to `newChannels`.
> 8. Acceptor thread checks serverSocketChannel.isOpen() and then terminates.
> 9. NioEchoServer#close() stops blocking now that both other threads have 
> terminated.
> The end result is that the leaked socket is left open in the `newChannels` 
> list at the end of close(), which is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16765) NioEchoServer leaks accepted SocketChannel instances due to race condition

2024-06-30 Thread zhengke zhou (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860952#comment-17860952
 ] 

zhengke zhou edited comment on KAFKA-16765 at 6/30/24 1:20 PM:
---

[~gharris1727]  We want to avoid the situation where the Producer: 
acceptorThread creates a channel and adds it to SocketChannels that will never 
be consumed by the main thread.
To resolve this, we can clear up closeSocketChannels after acceptorThread 
exited.


was (Author: JIRAUSER305940):
[~gharris1727]  
We want to avoid the situation where the Producer: acceptorThread creates a 
channel and adds it to SocketChannels that will never be consumed by the main 
thread.
To resolve this, we can clear up closeSocketChannels after acceptorThread 
exited.

> NioEchoServer leaks accepted SocketChannel instances due to race condition
> --
>
> Key: KAFKA-16765
> URL: https://issues.apache.org/jira/browse/KAFKA-16765
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 3.8.0
>Reporter: Greg Harris
>Assignee: zhengke zhou
>Priority: Minor
>  Labels: newbie
>
> The NioEchoServer has an AcceptorThread that calls accept() to open new 
> SocketChannel instances and insert them into the `newChannels` List, and a 
> main thread that drains the `newChannels` List and moves them to the 
> `socketChannels` List.
> During shutdown, the serverSocketChannel is closed, which causes both threads 
> to exit their while loops. It is possible for the NioEchoServer main thread 
> to sense the serverSocketChannel close and terminate before the Acceptor 
> thread does, and for the Acceptor thread to put a SocketChannel in 
> `newChannels` before terminating. This instance is never closed by either 
> thread, because it is never moved to `socketChannels`.
> A precise execution order that has this leak is:
> 1. NioEchoServer thread locks `newChannels`.
> 2. Acceptor thread accept() completes, and the SocketChannel is created
> 3. Acceptor thread blocks waiting for the `newChannels` lock
> 4. NioEchoServer thread releases the `newChannels` lock and does some 
> processing
> 5. NioEchoServer#close() is called, which closes the serverSocketChannel
> 6. NioEchoServer thread checks serverSocketChannel.isOpen() and then 
> terminates
> 7. Acceptor thread acquires the `newChannels` lock and adds the SocketChannel 
> to `newChannels`.
> 8. Acceptor thread checks serverSocketChannel.isOpen() and then terminates.
> 9. NioEchoServer#close() stops blocking now that both other threads have 
> terminated.
> The end result is that the leaked socket is left open in the `newChannels` 
> list at the end of close(), which is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16765) NioEchoServer leaks accepted SocketChannel instances due to race condition

2024-06-30 Thread zhengke zhou (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860952#comment-17860952
 ] 

zhengke zhou edited comment on KAFKA-16765 at 6/30/24 1:20 PM:
---

[~gharris1727]  
We want to avoid the situation where the Producer: acceptorThread creates a 
channel and adds it to SocketChannels that will never be consumed by the main 
thread.
To resolve this, we can clear up closeSocketChannels after acceptorThread 
exited.


was (Author: JIRAUSER305940):
If we can make *main* thread always close after *AcceptorThread* that seems 
like can be fixed.

I will try to fix this.

> NioEchoServer leaks accepted SocketChannel instances due to race condition
> --
>
> Key: KAFKA-16765
> URL: https://issues.apache.org/jira/browse/KAFKA-16765
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 3.8.0
>Reporter: Greg Harris
>Assignee: zhengke zhou
>Priority: Minor
>  Labels: newbie
>
> The NioEchoServer has an AcceptorThread that calls accept() to open new 
> SocketChannel instances and insert them into the `newChannels` List, and a 
> main thread that drains the `newChannels` List and moves them to the 
> `socketChannels` List.
> During shutdown, the serverSocketChannel is closed, which causes both threads 
> to exit their while loops. It is possible for the NioEchoServer main thread 
> to sense the serverSocketChannel close and terminate before the Acceptor 
> thread does, and for the Acceptor thread to put a SocketChannel in 
> `newChannels` before terminating. This instance is never closed by either 
> thread, because it is never moved to `socketChannels`.
> A precise execution order that has this leak is:
> 1. NioEchoServer thread locks `newChannels`.
> 2. Acceptor thread accept() completes, and the SocketChannel is created
> 3. Acceptor thread blocks waiting for the `newChannels` lock
> 4. NioEchoServer thread releases the `newChannels` lock and does some 
> processing
> 5. NioEchoServer#close() is called, which closes the serverSocketChannel
> 6. NioEchoServer thread checks serverSocketChannel.isOpen() and then 
> terminates
> 7. Acceptor thread acquires the `newChannels` lock and adds the SocketChannel 
> to `newChannels`.
> 8. Acceptor thread checks serverSocketChannel.isOpen() and then terminates.
> 9. NioEchoServer#close() stops blocking now that both other threads have 
> terminated.
> The end result is that the leaked socket is left open in the `newChannels` 
> list at the end of close(), which is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16765) NioEchoServer leaks accepted SocketChannel instances due to race condition

2024-06-29 Thread zhengke zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengke zhou reassigned KAFKA-16765:


Assignee: zhengke zhou

> NioEchoServer leaks accepted SocketChannel instances due to race condition
> --
>
> Key: KAFKA-16765
> URL: https://issues.apache.org/jira/browse/KAFKA-16765
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 3.8.0
>Reporter: Greg Harris
>Assignee: zhengke zhou
>Priority: Minor
>  Labels: newbie
>
> The NioEchoServer has an AcceptorThread that calls accept() to open new 
> SocketChannel instances and insert them into the `newChannels` List, and a 
> main thread that drains the `newChannels` List and moves them to the 
> `socketChannels` List.
> During shutdown, the serverSocketChannel is closed, which causes both threads 
> to exit their while loops. It is possible for the NioEchoServer main thread 
> to sense the serverSocketChannel close and terminate before the Acceptor 
> thread does, and for the Acceptor thread to put a SocketChannel in 
> `newChannels` before terminating. This instance is never closed by either 
> thread, because it is never moved to `socketChannels`.
> A precise execution order that has this leak is:
> 1. NioEchoServer thread locks `newChannels`.
> 2. Acceptor thread accept() completes, and the SocketChannel is created
> 3. Acceptor thread blocks waiting for the `newChannels` lock
> 4. NioEchoServer thread releases the `newChannels` lock and does some 
> processing
> 5. NioEchoServer#close() is called, which closes the serverSocketChannel
> 6. NioEchoServer thread checks serverSocketChannel.isOpen() and then 
> terminates
> 7. Acceptor thread acquires the `newChannels` lock and adds the SocketChannel 
> to `newChannels`.
> 8. Acceptor thread checks serverSocketChannel.isOpen() and then terminates.
> 9. NioEchoServer#close() stops blocking now that both other threads have 
> terminated.
> The end result is that the leaked socket is left open in the `newChannels` 
> list at the end of close(), which is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16765) NioEchoServer leaks accepted SocketChannel instances due to race condition

2024-06-29 Thread zhengke zhou (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860952#comment-17860952
 ] 

zhengke zhou edited comment on KAFKA-16765 at 6/29/24 4:51 PM:
---

If we can make *main* thread always close after *AcceptorThread* that seems 
like can be fixed.

I will try to fix this.


was (Author: JIRAUSER305940):
If we can make *main* thread always close after *AcceptorThread* that ** seems 
like can be fixed.

I will try to fix this.

> NioEchoServer leaks accepted SocketChannel instances due to race condition
> --
>
> Key: KAFKA-16765
> URL: https://issues.apache.org/jira/browse/KAFKA-16765
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 3.8.0
>Reporter: Greg Harris
>Priority: Minor
>  Labels: newbie
>
> The NioEchoServer has an AcceptorThread that calls accept() to open new 
> SocketChannel instances and insert them into the `newChannels` List, and a 
> main thread that drains the `newChannels` List and moves them to the 
> `socketChannels` List.
> During shutdown, the serverSocketChannel is closed, which causes both threads 
> to exit their while loops. It is possible for the NioEchoServer main thread 
> to sense the serverSocketChannel close and terminate before the Acceptor 
> thread does, and for the Acceptor thread to put a SocketChannel in 
> `newChannels` before terminating. This instance is never closed by either 
> thread, because it is never moved to `socketChannels`.
> A precise execution order that has this leak is:
> 1. NioEchoServer thread locks `newChannels`.
> 2. Acceptor thread accept() completes, and the SocketChannel is created
> 3. Acceptor thread blocks waiting for the `newChannels` lock
> 4. NioEchoServer thread releases the `newChannels` lock and does some 
> processing
> 5. NioEchoServer#close() is called, which closes the serverSocketChannel
> 6. NioEchoServer thread checks serverSocketChannel.isOpen() and then 
> terminates
> 7. Acceptor thread acquires the `newChannels` lock and adds the SocketChannel 
> to `newChannels`.
> 8. Acceptor thread checks serverSocketChannel.isOpen() and then terminates.
> 9. NioEchoServer#close() stops blocking now that both other threads have 
> terminated.
> The end result is that the leaked socket is left open in the `newChannels` 
> list at the end of close(), which is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16765) NioEchoServer leaks accepted SocketChannel instances due to race condition

2024-06-29 Thread zhengke zhou (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860952#comment-17860952
 ] 

zhengke zhou commented on KAFKA-16765:
--

If we can make *main* thread always close after *AcceptorThread* that ** seems 
like can be fixed.

I will try to fix this.

> NioEchoServer leaks accepted SocketChannel instances due to race condition
> --
>
> Key: KAFKA-16765
> URL: https://issues.apache.org/jira/browse/KAFKA-16765
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 3.8.0
>Reporter: Greg Harris
>Priority: Minor
>  Labels: newbie
>
> The NioEchoServer has an AcceptorThread that calls accept() to open new 
> SocketChannel instances and insert them into the `newChannels` List, and a 
> main thread that drains the `newChannels` List and moves them to the 
> `socketChannels` List.
> During shutdown, the serverSocketChannel is closed, which causes both threads 
> to exit their while loops. It is possible for the NioEchoServer main thread 
> to sense the serverSocketChannel close and terminate before the Acceptor 
> thread does, and for the Acceptor thread to put a SocketChannel in 
> `newChannels` before terminating. This instance is never closed by either 
> thread, because it is never moved to `socketChannels`.
> A precise execution order that has this leak is:
> 1. NioEchoServer thread locks `newChannels`.
> 2. Acceptor thread accept() completes, and the SocketChannel is created
> 3. Acceptor thread blocks waiting for the `newChannels` lock
> 4. NioEchoServer thread releases the `newChannels` lock and does some 
> processing
> 5. NioEchoServer#close() is called, which closes the serverSocketChannel
> 6. NioEchoServer thread checks serverSocketChannel.isOpen() and then 
> terminates
> 7. Acceptor thread acquires the `newChannels` lock and adds the SocketChannel 
> to `newChannels`.
> 8. Acceptor thread checks serverSocketChannel.isOpen() and then terminates.
> 9. NioEchoServer#close() stops blocking now that both other threads have 
> terminated.
> The end result is that the leaked socket is left open in the `newChannels` 
> list at the end of close(), which is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16943) Synchronously verify Connect worker startup failure in InternalTopicsIntegrationTest

2024-06-25 Thread zhengke zhou (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859846#comment-17859846
 ] 

zhengke zhou commented on KAFKA-16943:
--

[~ChrisEgerton] hi, I'm new to this community and hoping to begin this as my 
first step

> Synchronously verify Connect worker startup failure in 
> InternalTopicsIntegrationTest
> 
>
> Key: KAFKA-16943
> URL: https://issues.apache.org/jira/browse/KAFKA-16943
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Reporter: Chris Egerton
>Priority: Minor
>  Labels: newbie
> Attachments: code-diff.png
>
>
> Created after PR discussion 
> [here|https://github.com/apache/kafka/pull/16288#discussion_r1636615220].
> In some of our integration tests, we want to verify that a Connect worker 
> cannot start under poor conditions (such as when its internal topics do not 
> yet exist and it is configured to create them with a higher replication 
> factor than the number of available brokers, or when its internal topics 
> already exist but they do not have the compaction cleanup policy).
> This is currently not possible, and presents a possible gap in testing 
> coverage, especially for the test cases 
> {{testFailToCreateInternalTopicsWithMoreReplicasThanBrokers}} and 
> {{{}testFailToStartWhenInternalTopicsAreNotCompacted{}}}. It'd be nice if we 
> could have some way of synchronously awaiting the completion or failure of 
> worker startup in our integration tests in order to guarantee that worker 
> startup fails under sufficiently adverse conditions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16943) Synchronously verify Connect worker startup failure in InternalTopicsIntegrationTest

2024-06-25 Thread zhengke zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengke zhou reassigned KAFKA-16943:


Assignee: zhengke zhou

> Synchronously verify Connect worker startup failure in 
> InternalTopicsIntegrationTest
> 
>
> Key: KAFKA-16943
> URL: https://issues.apache.org/jira/browse/KAFKA-16943
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Reporter: Chris Egerton
>Assignee: zhengke zhou
>Priority: Minor
>  Labels: newbie
> Attachments: code-diff.png
>
>
> Created after PR discussion 
> [here|https://github.com/apache/kafka/pull/16288#discussion_r1636615220].
> In some of our integration tests, we want to verify that a Connect worker 
> cannot start under poor conditions (such as when its internal topics do not 
> yet exist and it is configured to create them with a higher replication 
> factor than the number of available brokers, or when its internal topics 
> already exist but they do not have the compaction cleanup policy).
> This is currently not possible, and presents a possible gap in testing 
> coverage, especially for the test cases 
> {{testFailToCreateInternalTopicsWithMoreReplicasThanBrokers}} and 
> {{{}testFailToStartWhenInternalTopicsAreNotCompacted{}}}. It'd be nice if we 
> could have some way of synchronously awaiting the completion or failure of 
> worker startup in our integration tests in order to guarantee that worker 
> startup fails under sufficiently adverse conditions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)