[jira] [Commented] (KAFKA-9173) StreamsPartitionAssignor assigns partitions to only one worker

2020-05-22 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113958#comment-17113958
 ] 

Bruno Cadonna commented on KAFKA-9173:
--

The issue that all 10 tasks are assigned to the same node is fixed with KIP-441 
and the following PR adds unit tests that verify it:
https://github.com/apache/kafka/pull/8689

For the other issues, new tickets should be created as [~mjsax] proposed.

I will close this ticket as fixed in 2.6.

> StreamsPartitionAssignor assigns partitions to only one worker
> --
>
> Key: KAFKA-9173
> URL: https://issues.apache.org/jira/browse/KAFKA-9173
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Oleg Muravskiy
>Priority: Critical
>  Labels: user-experience
> Attachments: StreamsPartitionAssignor.log
>
>
> I'm running a distributed KafkaStreams application on 10 worker nodes, 
> subscribed to 21 topics with 10 partitions in each. I'm only using a 
> Processor interface, and a persistent state store.
> However, only one worker gets assigned partitions, all other workers get 
> nothing. Restarting the application, or cleaning local state stores does not 
> help. StreamsPartitionAssignor migrates to other nodes, and eventually picks 
> up other node to assign partitions to, but still only one node.
> It's difficult to figure out where to look for the signs of problems, I'm 
> attaching the log messages from the StreamsPartitionAssignor. Let me know 
> what else I could provide to help resolve this.
> [^StreamsPartitionAssignor.log]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9173) StreamsPartitionAssignor assigns partitions to only one worker

2020-05-22 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna resolved KAFKA-9173.
--
Fix Version/s: 2.6.0
   Resolution: Fixed

> StreamsPartitionAssignor assigns partitions to only one worker
> --
>
> Key: KAFKA-9173
> URL: https://issues.apache.org/jira/browse/KAFKA-9173
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Oleg Muravskiy
>Priority: Critical
>  Labels: user-experience
> Fix For: 2.6.0
>
> Attachments: StreamsPartitionAssignor.log
>
>
> I'm running a distributed KafkaStreams application on 10 worker nodes, 
> subscribed to 21 topics with 10 partitions in each. I'm only using a 
> Processor interface, and a persistent state store.
> However, only one worker gets assigned partitions, all other workers get 
> nothing. Restarting the application, or cleaning local state stores does not 
> help. StreamsPartitionAssignor migrates to other nodes, and eventually picks 
> up other node to assign partitions to, but still only one node.
> It's difficult to figure out where to look for the signs of problems, I'm 
> attaching the log messages from the StreamsPartitionAssignor. Let me know 
> what else I could provide to help resolve this.
> [^StreamsPartitionAssignor.log]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10555) Improve client state machine

2020-10-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210968#comment-17210968
 ] 

Bruno Cadonna commented on KAFKA-10555:
---

[~ableegoldman] Thank you for your detailed analysis!

First of all, I agree that transitioning to the ERROR state should not leave 
the Kafka Streams client in a zombie state. The client should be closed and all 
its threads should be shutdown.

I think, operators can achieve safety first also without an ERROR state. They 
could set the appropriate signals (e.g. do not restart this client) from the 
uncaught exception handler without relying on the state of the Kafka Streams 
client. I can imagine that taking the appropriate measures in the uncaught 
exception handler is even more flexible for the operators, because they can 
directly call existing APIs of their cluster environment. 

If we still decide to stick to the ERROR state, I agree with you on cases #1 
and #2. 

For case #3, the current proposal in KIP-663 is to stay in RUNNING. I would not 
go to NOT_RUNNING if we define NOT_RUNNING to be the state the client ends up 
after it is closed and all its threads -- including cleanup thread etc. -- are 
shutdown as implied by case #1. Maybe we need another state NOT_PROCESSING 
([~wcarlson5]'s idea). In that case I would propose to call the state 
NOT_POLLING to be more exact, because you could be not processing records also 
with 20 alive stream threads that continuously poll records.

Case #4 is a sign to me that we should probably get rid of the ERROR state and 
users should take care of this case in the uncaught exception handler.

Case #5 is similar to case #3. Should we even merge this two cases? 

For case #6, we first need to exactly define NOT_RUNNING. If it is the state 
the client ends up after calling close(), we cannot start a new stream thread 
anymore.
  
I think, we need to define the states in more detail before we discuss to which 
state we transit. I propose the following states:
CREATED: The Kafka Streams client is created but not started
REBALANCING: The stream threads of the Kafka Streams client are rebalancing
RUNNING: At least one stream thread is alive or only the global stream thread 
is alive and polls records.
NOT_POLLING: The client does not poll because no stream threads and no global 
thread is alive. A stream thread can be added. (Should we stop -- not shutdown 
-- the cleanup thread in this state and resume it when the client transits to 
RUNNING?)   
PENDING_SHUTDOWN: The Kafka Streams client is closing
NOT_RUNNING (terminal state): The Kafka Streams client is closed, i.e., all 
stream threads are shutdown and all maintenance threads (e.g. cleanup thread) 
are also shutdown.
(ERROR (terminal state): Same as NOT_RUNNING, but it also signals "restart at 
your own risk")




> Improve client state machine
> 
>
> Key: KAFKA-10555
> URL: https://issues.apache.org/jira/browse/KAFKA-10555
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>  Labels: needs-kip
>
> The KafkaStreams client exposes its state to the user for monitoring purpose 
> (ie, RUNNING, REBALANCING etc). The state of the client depends on the 
> state(s) of the internal StreamThreads that have their own states.
> Furthermore, the client state has impact on what the user can do with the 
> client. For example, active task can only be queried in RUNNING state and 
> similar.
> With KIP-671 and KIP-663 we improved error handling capabilities and allow to 
> add/remove stream thread dynamically. We allow adding/removing threads only 
> in RUNNING and REBALANCING state. This puts us in a "weird" position, because 
> if we enter ERROR state (ie, if the last thread dies), we cannot add new 
> threads and longer. However, if we have multiple threads and one dies, we 
> don't enter ERROR state and do allow to recover the thread.
> Before the KIPs the definition of ERROR state was clear, however, with both 
> KIPs it seem that we should revisit its semantics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10631) ProducerFencedException is not Handled on Offest Commit

2020-10-22 Thread Bruno Cadonna (Jira)
Bruno Cadonna created KAFKA-10631:
-

 Summary: ProducerFencedException is not Handled on Offest Commit
 Key: KAFKA-10631
 URL: https://issues.apache.org/jira/browse/KAFKA-10631
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 2.7.0
Reporter: Bruno Cadonna


The transaction manager does currently not handle producer fenced errors 
returned from a offset commit request.

We found this bug because we saw the following exception in our soak cluster:

{code:java}
org.apache.kafka.streams.errors.StreamsException: Error encountered trying to 
commit a transaction [stream-thread [i-037c09b3c48522d8d-StreamThread-3] task 
[0_0]]
at 
org.apache.kafka.streams.processor.internals.StreamsProducer.commitTransaction(StreamsProducer.java:256)
at 
org.apache.kafka.streams.processor.internals.TaskManager.commitOffsetsOrTransaction(TaskManager.java:1050)
at 
org.apache.kafka.streams.processor.internals.TaskManager.commit(TaskManager.java:1013)
at 
org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:886)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:677)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:512)
[2020-10-22T04:09:54+02:00] 
(streams-soak-2-7-eos-alpha_soak_i-037c09b3c48522d8d_streamslog) Caused by: 
org.apache.kafka.common.KafkaException: Unexpected error in 
TxnOffsetCommitResponse: There is a newer producer with the same 
transactionalId which fences the current one.
at 
org.apache.kafka.clients.producer.internals.TransactionManager$TxnOffsetCommitHandler.handleResponse(TransactionManager.java:1726)
at 
org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1278)
at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:584)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:576)
at 
org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:415)
at 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:313)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:240)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10631) ProducerFencedException is not Handled on Offest Commit

2020-10-22 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10631:
--
Description: 
The transaction manager does currently not handle producer fenced errors 
returned from a offset commit request.

We found this bug because we encountered the following exception in our soak 
cluster:
{code:java}
org.apache.kafka.streams.errors.StreamsException: Error encountered trying to 
commit a transaction [stream-thread [i-037c09b3c48522d8d-StreamThread-3] task 
[0_0]]
at 
org.apache.kafka.streams.processor.internals.StreamsProducer.commitTransaction(StreamsProducer.java:256)
at 
org.apache.kafka.streams.processor.internals.TaskManager.commitOffsetsOrTransaction(TaskManager.java:1050)
at 
org.apache.kafka.streams.processor.internals.TaskManager.commit(TaskManager.java:1013)
at 
org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:886)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:677)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:512)
[2020-10-22T04:09:54+02:00] 
(streams-soak-2-7-eos-alpha_soak_i-037c09b3c48522d8d_streamslog) Caused by: 
org.apache.kafka.common.KafkaException: Unexpected error in 
TxnOffsetCommitResponse: There is a newer producer with the same 
transactionalId which fences the current one.
at 
org.apache.kafka.clients.producer.internals.TransactionManager$TxnOffsetCommitHandler.handleResponse(TransactionManager.java:1726)
at 
org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1278)
at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:584)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:576)
at 
org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:415)
at 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:313)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:240)
at java.lang.Thread.run(Thread.java:748)
{code}

  was:
The transaction manager does currently not handle producer fenced errors 
returned from a offset commit request.

We found this bug because we saw the following exception in our soak cluster:

{code:java}
org.apache.kafka.streams.errors.StreamsException: Error encountered trying to 
commit a transaction [stream-thread [i-037c09b3c48522d8d-StreamThread-3] task 
[0_0]]
at 
org.apache.kafka.streams.processor.internals.StreamsProducer.commitTransaction(StreamsProducer.java:256)
at 
org.apache.kafka.streams.processor.internals.TaskManager.commitOffsetsOrTransaction(TaskManager.java:1050)
at 
org.apache.kafka.streams.processor.internals.TaskManager.commit(TaskManager.java:1013)
at 
org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:886)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:677)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:512)
[2020-10-22T04:09:54+02:00] 
(streams-soak-2-7-eos-alpha_soak_i-037c09b3c48522d8d_streamslog) Caused by: 
org.apache.kafka.common.KafkaException: Unexpected error in 
TxnOffsetCommitResponse: There is a newer producer with the same 
transactionalId which fences the current one.
at 
org.apache.kafka.clients.producer.internals.TransactionManager$TxnOffsetCommitHandler.handleResponse(TransactionManager.java:1726)
at 
org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1278)
at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:584)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:576)
at 
org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:415)
at 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:313)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:240)
at java.lang.Thread.run(Thread.java:748)
{code}


> ProducerFencedException is not Handled on Offest Commit
> ---
>
> Key: KAFKA-10631
> URL: https://issues.apache.org/jira/browse/KAFKA-106

[jira] [Updated] (KAFKA-10635) Streams application fails with OutOfOrderSequenceException after rolling restarts of brokers

2020-10-26 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10635:
--
Component/s: (was: streams)
 producer 
 core

> Streams application fails with OutOfOrderSequenceException after rolling 
> restarts of brokers
> 
>
> Key: KAFKA-10635
> URL: https://issues.apache.org/jira/browse/KAFKA-10635
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 2.5.1
>Reporter: Peeraya Maetasatidsuk
>Priority: Blocker
>
> We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a 
> rolling restart of the brokers after installing the new version. After the 
> restarts we notice one of our streams app (client version 2.4.1) fails with 
> OutOfOrderSequenceException:
>  
> {code:java}
> ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected 
> error. Record: a_record, destination topic: 
> topic-name-Aggregation-repartition 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> ERROR [2020-10-13 22:52:21,413] 
> [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread 
> [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the 
> following error: org.apache.kafka.streams.errors.StreamsException: task 
> [1_39] Abort sending since an error caught with a previous record (timestamp 
> 1602654659000) to topic topic-name-Aggregation-repartition due to 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204)
> at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:716) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:674)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:596)
> at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:798)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)   
>  at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:569)
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)   
>  at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)   
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> {code}
> We see a corresponding error on the broker side:
> {code:java}
> [2020-10-13 22:52:21,398] ERROR [ReplicaManager broker=137636348] Error 
> processing append operation on partition 
> topic-name-Aggregation-repartition-52  
> (kafka.server.ReplicaManager)org.apache.kafka.common.errors.OutOfOrderSequenceException:
>  Out of order sequence number for producerId 2819098 at offset 1156041 in 
> partition topic-name-Aggregation-repartition-52: 29 (incoming seq. number), 
> -1 (current end sequence number)
> {code}
> We are able to reproduce this many times and it happens regardless of whether 
> the broker shutdown (at restart) is clean or unclean. However, when we 
> rollback the broker version to 2.3.1 from 2.5.1 and perform similar rolling 
> restarts, we don't see this error on the streams application at all. This is 
> blocking us from upgrading our broker version. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10633) Constant probing rebalances in Streams 2.6

2020-11-02 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224555#comment-17224555
 ] 

Bruno Cadonna commented on KAFKA-10633:
---

[~thebearmayor] [~eran-levy] FYI: The 2.6.1 release is currently ongoing. Here 
the plan: 

[https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.6.1]

It is planned to be released this month.

> Constant probing rebalances in Streams 2.6
> --
>
> Key: KAFKA-10633
> URL: https://issues.apache.org/jira/browse/KAFKA-10633
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Bradley Peterson
>Priority: Major
> Attachments: Discover 2020-10-21T23 34 03.867Z - 2020-10-21T23 44 
> 46.409Z.csv
>
>
> We are seeing a few issues with the new rebalancing behavior in Streams 2.6. 
> This ticket is for constant probing rebalances on one StreamThread, but I'll 
> mention the other issues, as they may be related.
> First, when we redeploy the application we see tasks being moved, even though 
> the task assignment was stable before redeploying. We would expect to see 
> tasks assigned back to the same instances and no movement. The application is 
> in EC2, with persistent EBS volumes, and we use static group membership to 
> avoid rebalancing. To redeploy the app we terminate all EC2 instances. The 
> new instances will reattach the EBS volumes and use the same group member id.
> After redeploying, we sometimes see the group leader go into a tight probing 
> rebalance loop. This doesn't happen immediately, it could be several hours 
> later. Because the redeploy caused task movement, we see expected probing 
> rebalances every 10 minutes. But, then one thread will go into a tight loop 
> logging messages like "Triggering the followup rebalance scheduled for 
> 1603323868771 ms.", handling the partition assignment (which doesn't change), 
> then "Requested to schedule probing rebalance for 1603323868771 ms." This 
> repeats several times a second until the app is restarted again. I'll attach 
> a log export from one such incident.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10614) Group coordinator onElection/onResignation should guard against leader epoch

2020-11-02 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224992#comment-17224992
 ] 

Bruno Cadonna commented on KAFKA-10614:
---

FYI: This bug makes system test 
{{StreamsBrokerBounceTest.test_broker_type_bounce}} fail once in a while when 
the group coordinator (i.e., leader of partition 2 of the {{__consumer_offset}} 
topic) is bounced. Unfortunately, I haven't been able to reproduce the failure 
locally.

> Group coordinator onElection/onResignation should guard against leader epoch
> 
>
> Key: KAFKA-10614
> URL: https://issues.apache.org/jira/browse/KAFKA-10614
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Guozhang Wang
>Assignee: Tom Bentley
>Priority: Major
>
> When there are a sequence of LeaderAndISR or StopReplica requests sent from 
> different controllers causing the group coordinator to elect / resign, we may 
> re-order the events due to race condition. For example:
> 1) First LeaderAndISR request received from old controller to resign as the 
> group coordinator.
> 2) Second LeaderAndISR request received from new controller to elect as the 
> group coordinator.
> 3) Although threads handling the 1/2) requests are synchronized on the 
> replica manager, their callback {{onLeadershipChange}} would trigger 
> {{onElection/onResignation}} which would schedule the loading / unloading on 
> background threads, and are not synchronized.
> 4) As a result, the {{onElection}} maybe triggered by the thread first, and 
> then {{onResignation}}. As a result, the coordinator would not recognize it 
> self as the coordinator and hence would respond any coordinator request with 
> {{NOT_COORDINATOR}}.
> Here are two proposals on top of my head:
> 1) Let the scheduled load / unload function to keep the passed in leader 
> epoch, and also materialize the epoch in memory. Then when execute the 
> unloading check against the leader epoch.
> 2) This may be a bit simpler: using a single background thread working on a 
> FIFO queue of loading / unloading jobs, since the caller are actually 
> synchronized on replica manager and order preserved, the enqueued loading / 
> unloading job would be correctly ordered as well. In that case we would avoid 
> the reordering. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10686) Pluggable standby tasks assignor for Kafka Streams

2020-11-05 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226725#comment-17226725
 ] 

Bruno Cadonna commented on KAFKA-10686:
---

Hi [~lkokhreidze], I guess this ticket makes sense. Looking forward to your KIP!

You might want to look into KIP-441 to see how a Kafka Streams client passes 
its task lags to the assignor. I can imagine that you want to use a similar 
mechanism to pass information about a Kafka Streams client to the assignor.

> Pluggable standby tasks assignor for Kafka Streams
> --
>
> Key: KAFKA-10686
> URL: https://issues.apache.org/jira/browse/KAFKA-10686
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Levani Kokhreidze
>Priority: Major
>
> In production, Kafka Streams instances often run across different clusters 
> and availability zones. In order to guarantee high availability of the Kafka 
> Streams deployments, users would need more granular control over which on 
> instances standby tasks can be created. 
> Idea of this ticket is to expose interface for Kafka Streams which can be 
> implemented by users to control where standby tasks can be created.
> Kafka Streams can have RackAware assignment as a default implementation that 
> will take into account `rack.id` of the application and make sure that 
> standby tasks are created on different racks. 
> Point of this ticket though is to give more flexibility to users on standby 
> task creation, in cases where just rack awareness is not enough. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure

2020-11-06 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227236#comment-17227236
 ] 

Bruno Cadonna commented on KAFKA-10688:
---

[~guozhang], Thank you for the proposal.

Shouldn't we not always throw a fatal error for an {{InvalidOffsetException}} 
on a repartition topic, since this should never happen? How do 1) and 2) 
differ? Could you please clarify? 

> Handle accidental truncation of repartition topics as exceptional failure
> -
>
> Key: KAFKA-10688
> URL: https://issues.apache.org/jira/browse/KAFKA-10688
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Today we always handle InvalidOffsetException from the main consumer by the 
> resetting policy assuming they are for source topics. But repartition topics 
> are also source topics and should never be truncated and hence cause 
> InvalidOffsetException.
> We should differentiate these repartition topics from external source topics 
> and treat the InvalidOffsetException from repartition topics as fatal and 
> close the whole application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure

2020-11-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228465#comment-17228465
 ] 

Bruno Cadonna commented on KAFKA-10688:
---

Maybe I misunderstood your previous comment.

In your proposal in 1) and 2) aren't you  proposing to reset repartition topics 
by using the global policy?

When would a repartition topic not have a valid committed offset after an 
offset was committed for the first time (i.e. first commit after a fresh start 
of the Streams application)?

Is not the fact that an repartitition topic does not have a valid committed 
offset enough to throw a fatal error? Why should we reset the repartition 
topics in point  1) and 2) in your proposal? 

> Handle accidental truncation of repartition topics as exceptional failure
> -
>
> Key: KAFKA-10688
> URL: https://issues.apache.org/jira/browse/KAFKA-10688
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Today we always handle InvalidOffsetException from the main consumer by the 
> resetting policy assuming they are for source topics. But repartition topics 
> are also source topics and should never be truncated and hence cause 
> InvalidOffsetException.
> We should differentiate these repartition topics from external source topics 
> and treat the InvalidOffsetException from repartition topics as fatal and 
> close the whole application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10062) Add a method to retrieve the current timestamp as known by the Streams app

2020-11-16 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232638#comment-17232638
 ] 

Bruno Cadonna commented on KAFKA-10062:
---

[~rohitdeshaws] I think [~wbottrell] and [~psmolinski] have already worked on 
it. See the corresponding KIP here:

[https://cwiki.apache.org/confluence/display/KAFKA/KIP-622%3A+Add+currentSystemTimeMs+and+currentStreamTimeMs+to+ProcessorContext]

However, I do not know how much progress they have done on the implementation.

[~wbottrell] and [~psmolinski] could you update the ticket, please?

> Add a method to retrieve the current timestamp as known by the Streams app
> --
>
> Key: KAFKA-10062
> URL: https://issues.apache.org/jira/browse/KAFKA-10062
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Piotr Smolinski
>Assignee: William Bottrell
>Priority: Major
>  Labels: needs-kip, newbie
>
> Please add to the ProcessorContext a method to retrieve current timestamp 
> compatible with Punctuator#punctate(long) method.
> Proposal in ProcessorContext:
> long getTimestamp(PunctuationType type);
> The method should return time value as known by the Punctuator scheduler with 
> the respective PunctuationType.
> The use-case is tracking of a process with timeout-based escalation.
> A transformer receives process events and in case of missing an event execute 
> an action (emit message) after given escalation timeout (several stages). The 
> initial message may already arrive with reference timestamp in the past and 
> may trigger different action upon arrival depending on how far in the past it 
> is.
> If the timeout should be computed against some further time only, Punctuator 
> is perfectly sufficient. The problem is that I have to evaluate the current 
> time-related state once the message arrives.
> I am using wall-clock time. Normally accessing System.currentTimeMillis() is 
> sufficient, but it breaks in unit testing with TopologyTestDriver, where the 
> app wall clock time is different from the system-wide one.
> To access the mentioned clock I am using reflection to access 
> ProcessorContextImpl#task and then StreamTask#time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10643) Static membership - repetitive PreparingRebalance with updating metadata for member reason

2020-11-23 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237265#comment-17237265
 ] 

Bruno Cadonna commented on KAFKA-10643:
---

[~eran-levy] Just to avoid misunderstandings: What do you mean exactly with 
"the storage became very large around 50GB-70GB per stream pod"? Do you mean 
the size used locally by the RocksDB state store or the size of the changelog 
topic on the Kafka brokers?

The size of the changelog topic on the Kafka brokers is independent of the 
state store used. It depends solely on the data your application writes to the 
state store without any state store specific overhead. Also the RocksDB metrics 
are independent of the changelog topic.

To identify write stalls, you could look at {{write-stall-duration-[avg | 
total]}} in the RocksDB metrics. For more RocksDB metrics, see 
[https://docs.confluent.io/platform/current/streams/monitoring.html#rocksdb-metrics]
 .

>From 2.7 on, there will be even more RocksDB metrics. See 
>[https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Kafka+Streams+to+Report+Properties+of+RocksDB]
> with which you can also monitor the size of RocksDB's sst files and the 
>number of pending compactions.

> Static membership - repetitive PreparingRebalance with updating metadata for 
> member reason
> --
>
> Key: KAFKA-10643
> URL: https://issues.apache.org/jira/browse/KAFKA-10643
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Eran Levy
>Priority: Major
> Attachments: broker-4-11.csv, client-4-11.csv, 
> client-d-9-11-11-2020.csv
>
>
> Kafka streams 2.6.0, brokers version 2.6.0. Kafka nodes are healthy, kafka 
> streams app is healthy. 
> Configured with static membership. 
> Every 10 minutes (I assume cause of topic.metadata.refresh.interval.ms), I 
> see the following group coordinator log for different stream consumers: 
> INFO [GroupCoordinator 2]: Preparing to rebalance group **--**-stream in 
> state PreparingRebalance with old generation 12244 (__consumer_offsets-45) 
> (reason: Updating metadata for member 
> -stream-11-1-013edd56-ed93-4370-b07c-1c29fbe72c9a) 
> (kafka.coordinator.group.GroupCoordinator)
> and right after that the following log: 
> INFO [GroupCoordinator 2]: Assignment received from leader for group 
> **-**-stream for generation 12246 (kafka.coordinator.group.GroupCoordinator)
>  
> Looked a bit on the kafka code and Im not sure that I get why such a thing 
> happening - is this line described the situation that happens here re the 
> "reason:"?[https://github.com/apache/kafka/blob/7ca299b8c0f2f3256c40b694078e422350c20d19/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L311]
> I also dont see it happening too often in other kafka streams applications 
> that we have. 
> The only thing suspicious that I see around every hour that different pods of 
> that kafka streams application throw this exception: 
> {"timestamp":"2020-10-25T06:44:20.414Z","level":"INFO","thread":"**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1","logger":"org.apache.kafka.clients.FetchSessionHandler","message":"[Consumer
>  
> clientId=**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1-restore-consumer,
>  groupId=null] Error sending fetch request (sessionId=34683236, epoch=2872) 
> to node 
> 3:","context":"default","exception":"org.apache.kafka.common.errors.DisconnectException:
>  null\n"}
> I came across this strange behaviour after stated to investigate a strange 
> stuck rebalancing state after one of the members left the group and caused 
> the rebalance to stuck - the only thing that I found is that maybe because 
> that too often preparing to rebalance states, the app might affected of this 
> bug - KAFKA-9752 ?
> I dont understand why it happens, it wasn't before I applied static 
> membership to that kafka streams application (since around 2 weeks ago). 
> Will be happy if you can help me
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10643) Static membership - repetitive PreparingRebalance with updating metadata for member reason

2020-11-23 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237265#comment-17237265
 ] 

Bruno Cadonna edited comment on KAFKA-10643 at 11/23/20, 10:08 AM:
---

[~eran-levy] Just to avoid misunderstandings: What do you mean exactly with 
"the storage became very large around 50GB-70GB per stream pod"? Do you mean 
the size used locally by the RocksDB state store or the size of the changelog 
topic on the Kafka brokers?

The size of the changelog topic on the Kafka brokers is independent of the 
state store used. It depends solely on the data your application writes to the 
state store without any state store specific overhead. Also the RocksDB metrics 
are independent of the changelog topic.

To identify write stalls, you could look at {{write-stall-duration-[avg | 
total]}} in the RocksDB metrics. For more RocksDB metrics, see 
[https://kafka.apache.org/documentation/#kafka_streams_rocksdb_monitoring] .

>From 2.7 on, there will be even more RocksDB metrics. See 
>[https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Kafka+Streams+to+Report+Properties+of+RocksDB]
> with which you can also monitor the size of RocksDB's sst files and the 
>number of pending compactions.


was (Author: cadonna):
[~eran-levy] Just to avoid misunderstandings: What do you mean exactly with 
"the storage became very large around 50GB-70GB per stream pod"? Do you mean 
the size used locally by the RocksDB state store or the size of the changelog 
topic on the Kafka brokers?

The size of the changelog topic on the Kafka brokers is independent of the 
state store used. It depends solely on the data your application writes to the 
state store without any state store specific overhead. Also the RocksDB metrics 
are independent of the changelog topic.

To identify write stalls, you could look at {{write-stall-duration-[avg | 
total]}} in the RocksDB metrics. For more RocksDB metrics, see 
[https://docs.confluent.io/platform/current/streams/monitoring.html#rocksdb-metrics]
 .

>From 2.7 on, there will be even more RocksDB metrics. See 
>[https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Kafka+Streams+to+Report+Properties+of+RocksDB]
> with which you can also monitor the size of RocksDB's sst files and the 
>number of pending compactions.

> Static membership - repetitive PreparingRebalance with updating metadata for 
> member reason
> --
>
> Key: KAFKA-10643
> URL: https://issues.apache.org/jira/browse/KAFKA-10643
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Eran Levy
>Priority: Major
> Attachments: broker-4-11.csv, client-4-11.csv, 
> client-d-9-11-11-2020.csv
>
>
> Kafka streams 2.6.0, brokers version 2.6.0. Kafka nodes are healthy, kafka 
> streams app is healthy. 
> Configured with static membership. 
> Every 10 minutes (I assume cause of topic.metadata.refresh.interval.ms), I 
> see the following group coordinator log for different stream consumers: 
> INFO [GroupCoordinator 2]: Preparing to rebalance group **--**-stream in 
> state PreparingRebalance with old generation 12244 (__consumer_offsets-45) 
> (reason: Updating metadata for member 
> -stream-11-1-013edd56-ed93-4370-b07c-1c29fbe72c9a) 
> (kafka.coordinator.group.GroupCoordinator)
> and right after that the following log: 
> INFO [GroupCoordinator 2]: Assignment received from leader for group 
> **-**-stream for generation 12246 (kafka.coordinator.group.GroupCoordinator)
>  
> Looked a bit on the kafka code and Im not sure that I get why such a thing 
> happening - is this line described the situation that happens here re the 
> "reason:"?[https://github.com/apache/kafka/blob/7ca299b8c0f2f3256c40b694078e422350c20d19/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L311]
> I also dont see it happening too often in other kafka streams applications 
> that we have. 
> The only thing suspicious that I see around every hour that different pods of 
> that kafka streams application throw this exception: 
> {"timestamp":"2020-10-25T06:44:20.414Z","level":"INFO","thread":"**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1","logger":"org.apache.kafka.clients.FetchSessionHandler","message":"[Consumer
>  
> clientId=**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1-restore-consumer,
>  groupId=null] Error sending fetch request (sessionId=34683236, epoch=2872) 
> to node 
> 3:","context":"default","exception":"org.apache.kafka.common.errors.DisconnectException:
>  null\n"}
> I came across this strange behaviour after stated to investigate a strange 
> stuck rebalancing state after one of the members left the 

[jira] [Commented] (KAFKA-10758) Kafka Streams consuming from a pattern goes to PENDING_SHUTDOWN when adding a new topic

2020-11-23 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237270#comment-17237270
 ] 

Bruno Cadonna commented on KAFKA-10758:
---

[~davideicardi] Thank you for the report!

Could you please also post your call to {{StreamsBuilder#stream(Pattern)}} and 
your topology description that you can get with 
{{topology.describe().toString()}}?

> Kafka Streams consuming from a pattern goes to PENDING_SHUTDOWN when adding a 
> new topic
> ---
>
> Key: KAFKA-10758
> URL: https://issues.apache.org/jira/browse/KAFKA-10758
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Davide Icardi
>Priority: Major
>
> I have a simple Kafka Stream app that consumes from multiple input topics 
> using the _stream_ function that accepts a Pattern 
> ([link|https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/StreamsBuilder.html#stream-java.util.regex.Pattern-]).
>  
> Whenever I add a new topic that matches the pattern the kafka stream state 
> goes to REBALANCING -> ERROR -> PENDING_SHUTDOWN .
> If I restart the app it correctly starts reading again without problems.
> It is by design? Should I handle this and simply restart the app?
>  
> Kafka Stream version is 2.6.0.
> The error is the following:
> {code:java}
> ERROR o.a.k.s.p.i.ProcessorTopology - Set of source nodes do not match:
> sourceNodesByName = [KSTREAM-SOURCE-03, KSTREAM-SOURCE-02]
> sourceTopicsByName = [KSTREAM-SOURCE-00, KSTREAM-SOURCE-14, 
> KSTREAM-SOURCE-03, KSTREAM-SOURCE-02]
> org.apache.kafka.common.KafkaException: User rebalance callback throws an 
> error
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:436)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
>  Caused by: java.lang.IllegalStateException: Tried to update source topics 
> but source nodes did not match
>   at 
> org.apache.kafka.streams.processor.internals.ProcessorTopology.updateSourceTopics(ProcessorTopology.java:151)
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.update(AbstractTask.java:109)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.update(StreamTask.java:514)
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.updateInputPartitionsAndResume(TaskManager.java:397)
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.handleAssignment(TaskManager.java:261)
>   at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.onAssignment(StreamsPartitionAssignor.java:1428)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokeOnAssignment(ConsumerCoordinator.java:279)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:421)
>   ... 10 common frames omitted
>  KafkaStream state is ERROR
>  17:28:53.200 [datalake-StreamThread-1] ERROR 
> o.apache.kafka.streams.KafkaStreams - stream-client [datalake] All stream 
> threads have died. The instance will be in error state and should be closed.
>  > User rebalance callback throws an error
>  KafkaStream state is PENDING_SHUTDOWN
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10766) Add Unit Test cases for RocksDbRangeIterator

2020-11-30 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10766:
--
Labels: newbie  (was: )

> Add Unit Test cases for RocksDbRangeIterator
> 
>
> Key: KAFKA-10766
> URL: https://issues.apache.org/jira/browse/KAFKA-10766
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Sagar Rao
>Assignee: Sagar Rao
>Priority: Major
>  Labels: newbie
>
> During the code review for KIP-614, it was noticed that RocksDbRangeIterator 
> does not have any unit test cases. Here is the github comment for referrence:
> [https://github.com/apache/kafka/pull/9508#discussion_r527612942]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10766) Add Unit Test cases for RocksDbRangeIterator

2020-11-30 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10766:
--
Component/s: unit tests
 streams

> Add Unit Test cases for RocksDbRangeIterator
> 
>
> Key: KAFKA-10766
> URL: https://issues.apache.org/jira/browse/KAFKA-10766
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Sagar Rao
>Assignee: Sagar Rao
>Priority: Major
>
> During the code review for KIP-614, it was noticed that RocksDbRangeIterator 
> does not have any unit test cases. Here is the github comment for referrence:
> [https://github.com/apache/kafka/pull/9508#discussion_r527612942]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10767) Add Unit Test cases for missing methods in ThreadCacheTest

2020-11-30 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10767:
--
Component/s: unit tests
 streams

> Add Unit Test cases for missing methods in ThreadCacheTest
> --
>
> Key: KAFKA-10767
> URL: https://issues.apache.org/jira/browse/KAFKA-10767
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Sagar Rao
>Assignee: Sagar Rao
>Priority: Major
>
> During the code review for KIP-614, it was noticed that some methods in 
> ThreadCache don't have unit tests. Need to identify them and add unit test 
> cases for them.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10767) Add Unit Test cases for missing methods in ThreadCacheTest

2020-11-30 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10767:
--
Labels: newbie  (was: )

> Add Unit Test cases for missing methods in ThreadCacheTest
> --
>
> Key: KAFKA-10767
> URL: https://issues.apache.org/jira/browse/KAFKA-10767
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Sagar Rao
>Assignee: Sagar Rao
>Priority: Major
>  Labels: newbie
>
> During the code review for KIP-614, it was noticed that some methods in 
> ThreadCache don't have unit tests. Need to identify them and add unit test 
> cases for them.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10789) Streamlining Tests in ChangeLoggingKeyValueBytesStoreTest

2020-12-01 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10789:
--
Component/s: streams

> Streamlining Tests in ChangeLoggingKeyValueBytesStoreTest
> -
>
> Key: KAFKA-10789
> URL: https://issues.apache.org/jira/browse/KAFKA-10789
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Sagar Rao
>Priority: Minor
>
> While reviewing, kIP-614, it was decided that tests for 
> [CachingInMemoryKeyValueStoreTest.java|https://github.com/apache/kafka/pull/9508/files/899b79781d3412658293b918dce16709121accf1#diff-fdfe70d8fa0798642f0ed54785624aa9850d5d86afff2285acdf12f2775c3588]
>  need to be streamlined to use mocked underlyingStore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10789) Streamlining Tests in ChangeLoggingKeyValueBytesStoreTest

2020-12-01 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10789:
--
Priority: Minor  (was: Major)

> Streamlining Tests in ChangeLoggingKeyValueBytesStoreTest
> -
>
> Key: KAFKA-10789
> URL: https://issues.apache.org/jira/browse/KAFKA-10789
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Sagar Rao
>Priority: Minor
>
> While reviewing, kIP-614, it was decided that tests for 
> [CachingInMemoryKeyValueStoreTest.java|https://github.com/apache/kafka/pull/9508/files/899b79781d3412658293b918dce16709121accf1#diff-fdfe70d8fa0798642f0ed54785624aa9850d5d86afff2285acdf12f2775c3588]
>  need to be streamlined to use mocked underlyingStore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10789) Streamlining Tests in ChangeLoggingKeyValueBytesStoreTest

2020-12-01 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10789:
--
Labels: newbie  (was: )

> Streamlining Tests in ChangeLoggingKeyValueBytesStoreTest
> -
>
> Key: KAFKA-10789
> URL: https://issues.apache.org/jira/browse/KAFKA-10789
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Sagar Rao
>Priority: Minor
>  Labels: newbie
>
> While reviewing, kIP-614, it was decided that tests for 
> [CachingInMemoryKeyValueStoreTest.java|https://github.com/apache/kafka/pull/9508/files/899b79781d3412658293b918dce16709121accf1#diff-fdfe70d8fa0798642f0ed54785624aa9850d5d86afff2285acdf12f2775c3588]
>  need to be streamlined to use mocked underlyingStore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10788) Streamlining Tests in CachingInMemoryKeyValueStoreTest

2020-12-01 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10788:
--
Component/s: unit tests
 streams

> Streamlining Tests in CachingInMemoryKeyValueStoreTest
> --
>
> Key: KAFKA-10788
> URL: https://issues.apache.org/jira/browse/KAFKA-10788
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Sagar Rao
>Priority: Major
>
> While reviewing, kIP-614, it was decided that tests for 
> [CachingInMemoryKeyValueStoreTest.java|https://github.com/apache/kafka/pull/9508/files/899b79781d3412658293b918dce16709121accf1#diff-fdfe70d8fa0798642f0ed54785624aa9850d5d86afff2285acdf12f2775c3588]
>  need to be streamlined to use mocked underlyingStore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10788) Streamlining Tests in CachingInMemoryKeyValueStoreTest

2020-12-01 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10788:
--
Labels: newbie  (was: )

> Streamlining Tests in CachingInMemoryKeyValueStoreTest
> --
>
> Key: KAFKA-10788
> URL: https://issues.apache.org/jira/browse/KAFKA-10788
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Sagar Rao
>Priority: Major
>  Labels: newbie
>
> While reviewing, kIP-614, it was decided that tests for 
> [CachingInMemoryKeyValueStoreTest.java|https://github.com/apache/kafka/pull/9508/files/899b79781d3412658293b918dce16709121accf1#diff-fdfe70d8fa0798642f0ed54785624aa9850d5d86afff2285acdf12f2775c3588]
>  need to be streamlined to use mocked underlyingStore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10789) Streamlining Tests in ChangeLoggingKeyValueBytesStoreTest

2020-12-01 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10789:
--
Component/s: unit tests

> Streamlining Tests in ChangeLoggingKeyValueBytesStoreTest
> -
>
> Key: KAFKA-10789
> URL: https://issues.apache.org/jira/browse/KAFKA-10789
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Sagar Rao
>Priority: Minor
>  Labels: newbie
>
> While reviewing, kIP-614, it was decided that tests for 
> [CachingInMemoryKeyValueStoreTest.java|https://github.com/apache/kafka/pull/9508/files/899b79781d3412658293b918dce16709121accf1#diff-fdfe70d8fa0798642f0ed54785624aa9850d5d86afff2285acdf12f2775c3588]
>  need to be streamlined to use mocked underlyingStore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10292) fix flaky streams/streams_broker_bounce_test.py

2020-12-01 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10292:
--
Description: 
{quote}
Module: kafkatest.tests.streams.streams_broker_bounce_test
Class:  StreamsBrokerBounceTest
Method: test_broker_type_bounce
Arguments:
\{
  "broker_type": "leader",
  "failure_mode": "clean_bounce",
  "num_threads": 1,
  "sleep_time_secs": 120
\}
{quote}

{quote}
Module: kafkatest.tests.streams.streams_broker_bounce_test
Class:  StreamsBrokerBounceTest
Method: test_broker_type_bounce
Arguments:
\{
  "broker_type": "controller",
  "failure_mode": "hard_shutdown",
  "num_threads": 3,
  "sleep_time_secs": 120
\}
{quote}

  was:
{quote}
Module: kafkatest.tests.streams.streams_broker_bounce_test
Class:  StreamsBrokerBounceTest
Method: test_broker_type_bounce
Arguments:
{
  "broker_type": "leader",
  "failure_mode": "clean_bounce",
  "num_threads": 1,
  "sleep_time_secs": 120
}
{quote}

{quote}
Module: kafkatest.tests.streams.streams_broker_bounce_test
Class:  StreamsBrokerBounceTest
Method: test_broker_type_bounce
Arguments:
{
  "broker_type": "controller",
  "failure_mode": "hard_shutdown",
  "num_threads": 3,
  "sleep_time_secs": 120
}
{quote}


> fix flaky streams/streams_broker_bounce_test.py
> ---
>
> Key: KAFKA-10292
> URL: https://issues.apache.org/jira/browse/KAFKA-10292
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, system tests
>Reporter: Chia-Ping Tsai
>Assignee: Bruno Cadonna
>Priority: Blocker
> Fix For: 2.8.0
>
>
> {quote}
> Module: kafkatest.tests.streams.streams_broker_bounce_test
> Class:  StreamsBrokerBounceTest
> Method: test_broker_type_bounce
> Arguments:
> \{
>   "broker_type": "leader",
>   "failure_mode": "clean_bounce",
>   "num_threads": 1,
>   "sleep_time_secs": 120
> \}
> {quote}
> {quote}
> Module: kafkatest.tests.streams.streams_broker_bounce_test
> Class:  StreamsBrokerBounceTest
> Method: test_broker_type_bounce
> Arguments:
> \{
>   "broker_type": "controller",
>   "failure_mode": "hard_shutdown",
>   "num_threads": 3,
>   "sleep_time_secs": 120
> \}
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-08 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245848#comment-17245848
 ] 

Bruno Cadonna commented on KAFKA-10772:
---

[~thebearmayor], do you run 2.6.1 built from 
[1cbc4da|https://github.com/apache/kafka/commit/1cbc4da0c9d19b25ffeb04cb2b52d827fbe38684]
 also broker side?

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-08 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245927#comment-17245927
 ] 

Bruno Cadonna commented on KAFKA-10772:
---

The fix for KAFKA-10284 is a pure broker-side fix. So even if you use 2.6.1 you 
do not have the fix on your brokers. That means, [~ableegoldman] 's guess that 
it could be a symptom of KAFKA-10284 may still be correct.

I am currently looking into this issue. If you could also provide some client 
and maybe also broker logs that show the bug, that would be awesome.

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-08 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245927#comment-17245927
 ] 

Bruno Cadonna edited comment on KAFKA-10772 at 12/8/20, 2:48 PM:
-

The fix for KAFKA-10284 is a pure broker-side fix. So even if you use 2.6.1 on 
client-side you do not have the fix on your brokers. That means, 
[~ableegoldman] 's guess that it could be a symptom of KAFKA-10284 may still be 
correct.

I am currently looking into this issue. If you could also provide some client 
and maybe also broker logs that show the bug, that would be awesome.


was (Author: cadonna):
The fix for KAFKA-10284 is a pure broker-side fix. So even if you use 2.6.1 you 
do not have the fix on your brokers. That means, [~ableegoldman] 's guess that 
it could be a symptom of KAFKA-10284 may still be correct.

I am currently looking into this issue. If you could also provide some client 
and maybe also broker logs that show the bug, that would be awesome.

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-08 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna reassigned KAFKA-10772:
-

Assignee: Bruno Cadonna

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Assignee: Bruno Cadonna
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7540) Flaky Test ConsumerBounceTest#testClose

2020-12-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246404#comment-17246404
 ] 

Bruno Cadonna commented on KAFKA-7540:
--

https://github.com/apache/kafka/pull/9696/checks?check_run_id=1520813832
{code:java}
java.lang.AssertionError: Assignment did not complete on time
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at 
kafka.api.ConsumerBounceTest.checkClosedState(ConsumerBounceTest.scala:486)
at 
kafka.api.ConsumerBounceTest.checkCloseWithCoordinatorFailure(ConsumerBounceTest.scala:257)
at kafka.api.ConsumerBounceTest.testClose(ConsumerBounceTest.scala:220) 
{code}

> Flaky Test ConsumerBounceTest#testClose
> ---
>
> Key: KAFKA-7540
> URL: https://issues.apache.org/jira/browse/KAFKA-7540
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 2.2.0
>Reporter: John Roesler
>Assignee: Jason Gustafson
>Priority: Critical
>  Labels: flaky-test
>
> Observed on Java 8: 
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/17314/testReport/junit/kafka.api/ConsumerBounceTest/testClose/]
>  
> Stacktrace:
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: -1
>   at 
> kafka.integration.KafkaServerTestHarness.killBroker(KafkaServerTestHarness.scala:146)
>   at 
> kafka.api.ConsumerBounceTest.checkCloseWithCoordinatorFailure(ConsumerBounceTest.scala:238)
>   at kafka.api.ConsumerBounceTest.testClose(ConsumerBounceTest.scala:211)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.Tes

[jira] [Commented] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246677#comment-17246677
 ] 

Bruno Cadonna commented on KAFKA-10772:
---

I think, I found the bug. If a static member joins the group it gets a member 
ID that is used to store the assignment for the static member broker side. A 
static member gets a new member ID each time it joins the group. That means, if 
a static member joins a group and gets a member ID that is used during the next 
assignment but before the assignment is broadcasted to the members in the 
SyncGroup response the same static member joins the group again (e.g. Streams 
client restart) and gets a new member ID, the group coordinator will not find 
the new member ID in the assignment and will store an empty assignment (i.e. an 
empty byte buffer) for the new member ID. Hence, the static member will get an 
empty assignment and throw the above {{IllegalStateException}}.

For example, assume the following static members:
* member A with group instance ID A and initial member ID 1
* member B with group instance ID B and initial member ID 2

The group leader will send the following assignment to the group coordinator:
1 -> some assigned partitions
2 -> some other assigned partitions

Before the group coordinator gets this assignment and broadcast it, member A 
rejoins and get member ID 3. The group coordinator will store the following 
assignment:
2 -> some other assigned partitions
3 -> empty buffer

Member A that now has member ID 3 will receive the empty buffer as its 
assignment.

I could reproduce the issue with 2.4.1 brokers, but not with 2.6 brokers.   

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Assignee: Bruno Cadonna
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals

[jira] [Commented] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246797#comment-17246797
 ] 

Bruno Cadonna commented on KAFKA-10772:
---

I think this issue is fixed for brokers 2.5 and above via 
https://issues.apache.org/jira/browse/KAFKA-9801.

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Assignee: Bruno Cadonna
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246797#comment-17246797
 ] 

Bruno Cadonna edited comment on KAFKA-10772 at 12/9/20, 8:20 PM:
-

I think the issue reported in this ticket is fixed for brokers 2.5 and above 
via https://issues.apache.org/jira/browse/KAFKA-9801.


was (Author: cadonna):
I think this issue is fixed for brokers 2.5 and above via 
https://issues.apache.org/jira/browse/KAFKA-9801.

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Assignee: Bruno Cadonna
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246797#comment-17246797
 ] 

Bruno Cadonna edited comment on KAFKA-10772 at 12/9/20, 8:22 PM:
-

I think the issue reported in this ticket is fixed for brokers 2.5 and above 
via KAFKA-9801. I would even dare to say that it is a duplicate of KAFKA-9801.


was (Author: cadonna):
I think the issue reported in this ticket is fixed for brokers 2.5 and above 
via https://issues.apache.org/jira/browse/KAFKA-9801. I would even dare to say 
that it is a duplicate of KAFKA-9801.

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Assignee: Bruno Cadonna
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246797#comment-17246797
 ] 

Bruno Cadonna edited comment on KAFKA-10772 at 12/9/20, 8:22 PM:
-

I think the issue reported in this ticket is fixed for brokers 2.5 and above 
via https://issues.apache.org/jira/browse/KAFKA-9801. I would even dare to say 
that it is a duplicate of KAFKA-9801.


was (Author: cadonna):
I think the issue reported in this ticket is fixed for brokers 2.5 and above 
via https://issues.apache.org/jira/browse/KAFKA-9801.

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Assignee: Bruno Cadonna
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246806#comment-17246806
 ] 

Bruno Cadonna commented on KAFKA-10772:
---

bq. It seems bad, and quite possibly a regression, for the client to actually 
fail to deserialize any broker response.

Since the faulty assignment in this case is a byte buffer of size zero, the 
client cannot even fail to deserialize it because there is nothing to 
deserialize.

However, the client could try to re-join in this case. But then it risks to 
start an infinite rejoin loop because there is not guarantee that next time it 
would get a non-empty assignment.

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Assignee: Bruno Cadonna
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246806#comment-17246806
 ] 

Bruno Cadonna edited comment on KAFKA-10772 at 12/9/20, 8:35 PM:
-

bq. It seems bad, and quite possibly a regression, for the client to actually 
fail to deserialize any broker response.

Since the faulty assignment in this case is a byte buffer of size zero, the 
client cannot even fail to deserialize it because there is nothing to 
deserialize.

However, the client could try to re-join in this case. But then it risks 
starting an infinite rejoin loop because there is not guarantee that next time 
it would get a non-empty assignment.


was (Author: cadonna):
bq. It seems bad, and quite possibly a regression, for the client to actually 
fail to deserialize any broker response.

Since the faulty assignment in this case is a byte buffer of size zero, the 
client cannot even fail to deserialize it because there is nothing to 
deserialize.

However, the client could try to re-join in this case. But then it risks to 
start an infinite rejoin loop because there is not guarantee that next time it 
would get a non-empty assignment.

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Assignee: Bruno Cadonna
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-10 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247142#comment-17247142
 ] 

Bruno Cadonna commented on KAFKA-10772:
---

[~lkokhreidze] [~thebearmayor]

To sum up: 
- This bug is fixed in 2.5 onwards via KAFKA-9801. 
- You get the {{IllegalStateException}} instead of the error reported in 
KAFKA-9801 because the {{IllegalStateException}} was introduced in the fix for 
KAFKA-9801 and you use Kafka Streams 2.6.
- You can fix this issue by upgrading your brokers to 2.5 or above.

I am going to close this issue as a duplicate of KAFKA-9801.

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Assignee: Bruno Cadonna
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-10 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna resolved KAFKA-10772.
---
Resolution: Duplicate

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Assignee: Bruno Cadonna
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-10772) java.lang.IllegalStateException: There are insufficient bytes available to read assignment from the sync-group response (actual byte size 0)

2020-12-10 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247142#comment-17247142
 ] 

Bruno Cadonna edited comment on KAFKA-10772 at 12/10/20, 10:13 AM:
---

[~lkokhreidze] [~thebearmayor]

To sum up: 
- This bug is fixed in 2.5 onwards via KAFKA-9801. 
- You get the {{IllegalStateException}} instead of the error reported in 
KAFKA-9801 because the {{IllegalStateException}} was introduced in the fix for 
KAFKA-9801 and you use Kafka Streams 2.6.
- You can fix this issue by upgrading your brokers to 2.5 or above.

I am going to close this issue as a duplicate of KAFKA-9801.

Let me know if you have any concerns.


was (Author: cadonna):
[~lkokhreidze] [~thebearmayor]

To sum up: 
- This bug is fixed in 2.5 onwards via KAFKA-9801. 
- You get the {{IllegalStateException}} instead of the error reported in 
KAFKA-9801 because the {{IllegalStateException}} was introduced in the fix for 
KAFKA-9801 and you use Kafka Streams 2.6.
- You can fix this issue by upgrading your brokers to 2.5 or above.

I am going to close this issue as a duplicate of KAFKA-9801.

> java.lang.IllegalStateException: There are insufficient bytes available to 
> read assignment from the sync-group response (actual byte size 0)
> 
>
> Key: KAFKA-10772
> URL: https://issues.apache.org/jira/browse/KAFKA-10772
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Levani Kokhreidze
>Assignee: Bruno Cadonna
>Priority: Blocker
> Attachments: KAFKA-10772.log
>
>
> From time to time we encounter the following exception that results in Kafka 
> Streams threads dying.
> Broker version 2.4.1, Client version 2.6.0
> {code:java}
> Nov 27 00:59:53.681 streaming-app service: prod | streaming-app-2 | 
> stream-client [cluster1-profile-stats-pipeline-client-id] State transition 
> from REBALANCING to ERROR Nov 27 00:59:53.681 streaming-app service: prod | 
> streaming-app-2 | stream-client [cluster1-profile-stats-pipeline-client-id] 
> State transition from REBALANCING to ERROR Nov 27 00:59:53.682 streaming-app 
> service: prod | streaming-app-2 | 2020-11-27 00:59:53.681 ERROR 105 --- 
> [-StreamThread-1] .KafkaStreamsBasedStreamProcessingEngine : Stream 
> processing pipeline: [profile-stats] encountered unrecoverable exception. 
> Thread: [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is 
> completely dead. If all worker threads die, Kafka Streams will be moved to 
> permanent ERROR state. Nov 27 00:59:53.682 streaming-app service: prod | 
> streaming-app-2 | Stream processing pipeline: [profile-stats] encountered 
> unrecoverable exception. Thread: 
> [cluster1-profile-stats-pipeline-client-id-StreamThread-1] is completely 
> dead. If all worker threads die, Kafka Streams will be moved to permanent 
> ERROR state. java.lang.IllegalStateException: There are insufficient bytes 
> available to read assignment from the sync-group response (actual byte size 
> 0) , this is not expected; it is possible that the leader's assign function 
> is buggy and did not return any assignment for this member, or because static 
> member is configured and the protocol is buggy hence did not get the 
> assignment for this member at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:367)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:440)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:513)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1268)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:766)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:624)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10843) Kafka Streams metadataForKey method returns null but allMetadata has the details

2020-12-11 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247886#comment-17247886
 ] 

Bruno Cadonna commented on KAFKA-10843:
---

[~maria_87] Thank you for the report!

{{allMetadata()}} returning a non-empty result and {{metadataForKey()}} 
returning {{null}} is indeed strange since they both read the information from 
the same member field. The result of that member field is updated during a 
rebalance. Could you try to call {{metadataForKey()}} after you observe that 
{{allMetadata()}} returns a non-empty result? Does it still return {{null}}? 

> Kafka Streams metadataForKey method returns null but allMetadata has the 
> details
> 
>
> Key: KAFKA-10843
> URL: https://issues.apache.org/jira/browse/KAFKA-10843
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.5.1
>Reporter: Maria Thomas
>Priority: Major
>
> Our application runs on multiple instances and to enable us to use get the 
> key information from the state store we use "metadataForKey" method to 
> retrieve the StreamMetadata and using the hostname do an RPC call to the host 
> to get the value associated with the key.
> This call was working fine in our DEV and TEST environments, however, it is 
> failing one our production clusters from the start. On further debugging, I 
> noticed the allMetadata() method was returning the state stores with the host 
> details as expected. However, it would not be feasible to go through each 
> store explicitly to get the key details.
> To note, the cluster I am using is a stretch cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10843) Kafka Streams metadataForKey method returns null but allMetadata has the details

2020-12-15 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17249674#comment-17249674
 ] 

Bruno Cadonna commented on KAFKA-10843:
---

[~maria_87] What do you mean exactly with "stretch cluster"? Are source topics, 
sink topics, and internal topics on different Kafka clusters? Where does the 
bootstrap server that you specified in coinfiguration {{bootstrap.servers}} 
reside? 

> Kafka Streams metadataForKey method returns null but allMetadata has the 
> details
> 
>
> Key: KAFKA-10843
> URL: https://issues.apache.org/jira/browse/KAFKA-10843
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.5.1
>Reporter: Maria Thomas
>Priority: Major
>
> Our application runs on multiple instances and to enable us to use get the 
> key information from the state store we use "metadataForKey" method to 
> retrieve the StreamMetadata and using the hostname do an RPC call to the host 
> to get the value associated with the key.
> This call was working fine in our DEV and TEST environments, however, it is 
> failing one our production clusters from the start. On further debugging, I 
> noticed the allMetadata() method was returning the state stores with the host 
> details as expected. However, it would not be feasible to go through each 
> store explicitly to get the key details.
> To note, the cluster I am using is a stretch cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12824) Remove Deprecated method KStream#branch

2021-05-25 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350941#comment-17350941
 ] 

Bruno Cadonna commented on KAFKA-12824:
---

[~mjsax] Do we know for which date AK 4.0.0 is planned? I could not find 
anything on the wiki. Since we do not know the release date, we do not know if 
the deprecation period will be enough or not. I would add a note to the 
description of this ticket that says that it is not clear whether the 
deprecation period of the subtasks is long enough for 4.0.0 and that they 
should be re-evaluated once it is clear when 4.0.0 will be released.

> Remove Deprecated method KStream#branch
> ---
>
> Key: KAFKA-12824
> URL: https://issues.apache.org/jira/browse/KAFKA-12824
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Josep Prat
>Priority: Blocker
> Fix For: 4.0.0
>
>
> The method branch in both Java and Scala KStream class was deprecated in 
> version 2.8:
>  * org.apache.kafka.streams.scala.kstream.KStream#branch
>  * 
> org.apache.kafka.streams.kstream.KStream#branch(org.apache.kafka.streams.kstream.Predicate  super K,? super V>...)
>  * 
> org.apache.kafka.streams.kstream.KStream#branch(org.apache.kafka.streams.kstream.Named,
>  org.apache.kafka.streams.kstream.Predicate...) 
>  
> See KAFKA-5488 and KIP-418



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5676) MockStreamsMetrics should be in o.a.k.test

2021-05-25 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350960#comment-17350960
 ] 

Bruno Cadonna commented on KAFKA-5676:
--

[~marcolotz] Thank you for looking into this!
The classes that use {{StreamsMetricsImpl}} should be only internal classes, 
like {{ClientMetrics}} that contains {{addStateMetrics()}}. They are allowed to 
use the implementation instead of the interface. The interface 
{{StreamsMetrics}} should be the one that is exposed in the public API.
Currently, I do not recall why we need to move {{MockStreamsMetrics}} in the 
public API. 
IMO, we should get rid of {{MockStreamsMetrics}} and replace its usages with an 
EasyMock mock. However, I do not know whether this is straight forward.


> MockStreamsMetrics should be in o.a.k.test
> --
>
> Key: KAFKA-5676
> URL: https://issues.apache.org/jira/browse/KAFKA-5676
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Marco Lotz
>Priority: Major
>  Labels: newbie
>  Time Spent: 96h
>  Remaining Estimate: 0h
>
> {{MockStreamsMetrics}}'s package should be `o.a.k.test` not 
> `o.a.k.streams.processor.internals`. 
> In addition, it should not require a {{Metrics}} parameter in its constructor 
> as it is only needed for its extended base class; the right way of mocking 
> should be implementing {{StreamsMetrics}} with mock behavior than extended a 
> real implementaion of {{StreamsMetricsImpl}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-5676) MockStreamsMetrics should be in o.a.k.test

2021-05-25 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350960#comment-17350960
 ] 

Bruno Cadonna edited comment on KAFKA-5676 at 5/25/21, 10:42 AM:
-

[~marcolotz] Thank you for looking into this!
The classes that use {{StreamsMetricsImpl}} should be only internal classes, 
like {{ClientMetrics}} that contains {{addStateMetrics()}}. They are allowed to 
use the implementation instead of the interface. The interface 
{{StreamsMetrics}} should be the one that is exposed in the public API.
Currently, I do not recall why we need to move {{MockStreamsMetrics}} in the 
public API. 
IMO, we should get rid of {{MockStreamsMetrics}} and replace its usages with an 
EasyMock mock. However, I do not know whether this is straight forward.
See also KAFKA-8977.



was (Author: cadonna):
[~marcolotz] Thank you for looking into this!
The classes that use {{StreamsMetricsImpl}} should be only internal classes, 
like {{ClientMetrics}} that contains {{addStateMetrics()}}. They are allowed to 
use the implementation instead of the interface. The interface 
{{StreamsMetrics}} should be the one that is exposed in the public API.
Currently, I do not recall why we need to move {{MockStreamsMetrics}} in the 
public API. 
IMO, we should get rid of {{MockStreamsMetrics}} and replace its usages with an 
EasyMock mock. However, I do not know whether this is straight forward.


> MockStreamsMetrics should be in o.a.k.test
> --
>
> Key: KAFKA-5676
> URL: https://issues.apache.org/jira/browse/KAFKA-5676
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Marco Lotz
>Priority: Major
>  Labels: newbie
>  Time Spent: 96h
>  Remaining Estimate: 0h
>
> {{MockStreamsMetrics}}'s package should be `o.a.k.test` not 
> `o.a.k.streams.processor.internals`. 
> In addition, it should not require a {{Metrics}} parameter in its constructor 
> as it is only needed for its extended base class; the right way of mocking 
> should be implementing {{StreamsMetrics}} with mock behavior than extended a 
> real implementaion of {{StreamsMetricsImpl}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12629) Failing Test: RaftClusterTest

2021-05-27 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352353#comment-17352353
 ] 

Bruno Cadonna commented on KAFKA-12629:
---

Failed on: 
https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10710/2/testReport/

{code:java}
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request.
at 
org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at 
org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
at 
kafka.server.RaftClusterTest.testCreateClusterAndCreateListDeleteTopic(RaftClusterTest.scala:94)
{code}

> Failing Test: RaftClusterTest
> -
>
> Key: KAFKA-12629
> URL: https://issues.apache.org/jira/browse/KAFKA-12629
> Project: Kafka
>  Issue Type: Test
>  Components: core, unit tests
>Reporter: Matthias J. Sax
>Assignee: Luke Chen
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 3.0.0
>
>
> {quote} {{java.util.concurrent.ExecutionException: 
> java.lang.ClassNotFoundException: 
> org.apache.kafka.controller.NoOpSnapshotWriterBuilder
>   at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> kafka.testkit.KafkaClusterTestKit.startup(KafkaClusterTestKit.java:364)
>   at 
> kafka.server.RaftClusterTest.testCreateClusterAndCreateAndManyTopicsWithManyPartitions(RaftClusterTest.scala:181)}}{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9168) Integrate JNI direct buffer support to RocksDBStore

2021-05-31 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354586#comment-17354586
 ] 

Bruno Cadonna commented on KAFKA-9168:
--

[~sagarrao] and [~ableegoldman] I took the liberty to start the internal 
benchmark on the draft PR.

> Integrate JNI direct buffer support to RocksDBStore
> ---
>
> Key: KAFKA-9168
> URL: https://issues.apache.org/jira/browse/KAFKA-9168
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Sagar Rao
>Assignee: Sagar Rao
>Priority: Major
>  Labels: perfomance
>
> There has been a PR created on rocksdb Java client to support direct 
> ByteBuffers in Java. We can look at integrating it whenever it gets merged. 
> Link to PR: [https://github.com/facebook/rocksdb/pull/2283]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5676) MockStreamsMetrics should be in o.a.k.test

2021-06-01 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355048#comment-17355048
 ] 

Bruno Cadonna commented on KAFKA-5676:
--

[~marcolotz] Thank you for your analysis!
 I agree that some aspects of the code can be improved. For example, it is 
questionable why methods like {{StreamsMetricsImpl#taskLevelSensor(...)}} are 
final. I do not share your concern about the static methods like 
{{StreamsMetricsImpl#addValueMetricToSensor(...)}}. They are utility methods 
that do not need the state of a {{StreamsMetricsImpl}} object. Some 
refactorings would maybe improve the situation. 

IMO, we should close this ticket as Won't Do and remove {{MockStreamsMetrics}} 
via KAFKA-8977. What do you think [~guozhang]?

> MockStreamsMetrics should be in o.a.k.test
> --
>
> Key: KAFKA-5676
> URL: https://issues.apache.org/jira/browse/KAFKA-5676
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: newbie
>  Time Spent: 96h
>  Remaining Estimate: 0h
>
> {{MockStreamsMetrics}}'s package should be `o.a.k.test` not 
> `o.a.k.streams.processor.internals`. 
> In addition, it should not require a {{Metrics}} parameter in its constructor 
> as it is only needed for its extended base class; the right way of mocking 
> should be implementing {{StreamsMetrics}} with mock behavior than extended a 
> real implementaion of {{StreamsMetricsImpl}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12519) Consider Removing Streams Old Built-in Metrics Version

2021-06-01 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna resolved KAFKA-12519.
---
Resolution: Fixed

> Consider Removing Streams Old Built-in Metrics Version 
> ---
>
> Key: KAFKA-12519
> URL: https://issues.apache.org/jira/browse/KAFKA-12519
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Blocker
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> We refactored the Streams' built-in metrics in KIP-444 and the new structure 
> was released in 2.5. We should consider removing the old structure in the 
> upcoming 3.0 release. This would give us the opportunity to simplify the code 
> around the built in metrics since we would not need to consider different 
> versions anymore.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9168) Integrate JNI direct buffer support to RocksDBStore

2021-06-03 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356273#comment-17356273
 ] 

Bruno Cadonna commented on KAFKA-9168:
--

Hi [~sagarrao], our internal benchmark did not show any significant performance 
improvements (i.e.. throughput improvements). I left some comments on your PR. 
I think the best thing to move forward here would be to implement a simple 
Kafka Streams app with which you can experiment different usages of the direct 
buffer for different operations on the RocksDB state stores. Then if you have 
found some good use cases for the direct buffers that improve performance in 
Streams, we can run our internal benchmarks again and verify if we see the 
improvements also there. WDYT? 

> Integrate JNI direct buffer support to RocksDBStore
> ---
>
> Key: KAFKA-9168
> URL: https://issues.apache.org/jira/browse/KAFKA-9168
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Sagar Rao
>Assignee: Sagar Rao
>Priority: Major
>  Labels: perfomance
>
> There has been a PR created on rocksdb Java client to support direct 
> ByteBuffers in Java. We can look at integrating it whenever it gets merged. 
> Link to PR: [https://github.com/facebook/rocksdb/pull/2283]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12905) Replace EasyMock and PowerMock with Mockito for NamedCacheMetricsTest

2021-06-08 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359162#comment-17359162
 ] 

Bruno Cadonna commented on KAFKA-12905:
---

[~a493172422] Could you convert this ticket to a sub-task of KAFKA-7438? 
I think it is better to track it there. 
You can do this on the top under "More". Let me know if you need any help.

> Replace EasyMock and PowerMock with Mockito for NamedCacheMetricsTest
> -
>
> Key: KAFKA-12905
> URL: https://issues.apache.org/jira/browse/KAFKA-12905
> Project: Kafka
>  Issue Type: Improvement
>Reporter: YI-CHEN WANG
>Assignee: YI-CHEN WANG
>Priority: Major
>
> For 
> [Kafka-7438|https://issues.apache.org/jira/browse/KAFKA-7438?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20assignee%20in%20(EMPTY)%20AND%20text%20~%20%22mockito%22]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12920) Consumer's cooperative sticky assignor need to clear generation / assignment data upon `onPartitionsLost`

2021-06-08 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359812#comment-17359812
 ] 

Bruno Cadonna commented on KAFKA-12920:
---

Is this regression? Should this be a blocker for 3.0?

> Consumer's cooperative sticky assignor need to clear generation / assignment 
> data upon `onPartitionsLost`
> -
>
> Key: KAFKA-12920
> URL: https://issues.apache.org/jira/browse/KAFKA-12920
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: bug, consumer
>
> Consumer's cooperative-sticky assignor does not track the owned partitions 
> inside the assignor --- i.e. when it reset its state in event of 
> ``onPartitionsLost``, the ``memberAssignment`` and ``generation`` inside the 
> assignor would not be cleared. This would cause a member to join with empty 
> generation on the protocol while with non-empty user-data encoding the old 
> assignment still (and hence pass the validation check on broker side during 
> JoinGroup), and eventually cause a single partition to be assigned to 
> multiple consumers within a generation.
> We should let the assignor to also clear its assignment/generation when 
> ``onPartitionsLost`` is triggered in order to avoid this scenario.
> Note that 1) for the regular sticky assignor the generation would still have 
> an older value, and this would cause the previously owned partitions to be 
> discarded during the assignment, and 2) for Streams' sticky assignor, it’s 
> encoding would indeed be cleared along with ``onPartitionsLost``. Hence only 
> Consumer's cooperative-sticky assignor have this issue to solve.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12892) InvalidACLException thrown in tests caused jenkins build unstable

2021-06-10 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360983#comment-17360983
 ] 

Bruno Cadonna commented on KAFKA-12892:
---

Is PR #10821 supposed to solve the issue?

I still see a lot of 
{code:java}
MultipleListenersWithAdditionalJaasContextTest > testProduceConsume() FAILED
[2021-06-10T11:11:52.209Z] 
org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = 
InvalidACL for /brokers/ids
[2021-06-10T11:11:52.209Z] at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:128)
[2021-06-10T11:11:52.209Z] at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
[2021-06-10T11:11:52.209Z] at 
kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:583)
[2021-06-10T11:11:52.209Z] at 
kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1729)
[2021-06-10T11:11:52.209Z] at 
kafka.zk.KafkaZkClient.makeSurePersistentPathExists(KafkaZkClient.scala:1627)
[2021-06-10T11:11:52.209Z] at 
kafka.zk.KafkaZkClient.$anonfun$createTopLevelPaths$1(KafkaZkClient.scala:1619)
[2021-06-10T11:11:52.209Z] at 
kafka.zk.KafkaZkClient.$anonfun$createTopLevelPaths$1$adapted(KafkaZkClient.scala:1619)
[2021-06-10T11:11:52.209Z] at 
scala.collection.immutable.List.foreach(List.scala:333)
[2021-06-10T11:11:52.209Z] at 
kafka.zk.KafkaZkClient.createTopLevelPaths(KafkaZkClient.scala:1619)
[2021-06-10T11:11:52.209Z] at 
kafka.server.KafkaServer.initZkClient(KafkaServer.scala:454)
[2021-06-10T11:11:52.209Z] at 
kafka.server.KafkaServer.startup(KafkaServer.scala:192)
[2021-06-10T11:11:52.209Z] at 
kafka.utils.TestUtils$.createServer(TestUtils.scala:166)
[2021-06-10T11:11:52.209Z] at 
kafka.server.MultipleListenersWithSameSecurityProtocolBaseTest.$anonfun$setUp$1(MultipleListenersWithSameSecurityProtocolBaseTest.scala:103)
[2021-06-10T11:11:52.210Z] at 
kafka.server.MultipleListenersWithSameSecurityProtocolBaseTest.$anonfun$setUp$1$adapted(MultipleListenersWithSameSecurityProtocolBaseTest.scala:76)
[2021-06-10T11:11:52.210Z] at 
scala.collection.immutable.Range.foreach(Range.scala:190)
{code}

Also on PRs that contain PR #10821. For example 
https://ci-builds.apache.org/blue/rest/organizations/jenkins/pipelines/Kafka/pipelines/kafka-pr/branches/PR-10856/runs/3/nodes/14/steps/121/log/?start=0

> InvalidACLException thrown in tests caused jenkins build unstable
> -
>
> Key: KAFKA-12892
> URL: https://issues.apache.org/jira/browse/KAFKA-12892
> Project: Kafka
>  Issue Type: Bug
>Reporter: Luke Chen
>Assignee: Igor Soarez
>Priority: Major
> Attachments: image-2021-06-04-21-05-57-222.png
>
>
> In KAFKA-12866, we fixed the issue that Kafka required ZK root access even 
> when using a chroot. But after the PR merged (build #183), trunk build keeps 
> failing at least one test group (mostly, JDK 15 and Scala 2.13). The build 
> result will said nothing useful:
> {code:java}
> > Task :core:integrationTest FAILED
> [2021-06-04T03:19:18.974Z] 
> [2021-06-04T03:19:18.974Z] FAILURE: Build failed with an exception.
> [2021-06-04T03:19:18.974Z] 
> [2021-06-04T03:19:18.974Z] * What went wrong:
> [2021-06-04T03:19:18.974Z] Execution failed for task ':core:integrationTest'.
> [2021-06-04T03:19:18.974Z] > Process 'Gradle Test Executor 128' finished with 
> non-zero exit value 1
> [2021-06-04T03:19:18.974Z]   This problem might be caused by incorrect test 
> process configuration.
> [2021-06-04T03:19:18.974Z]   Please refer to the test execution section in 
> the User Manual at 
> https://docs.gradle.org/7.0.2/userguide/java_testing.html#sec:test_execution
> {code}
>  
> After investigation, I found the failed tests is because there are many 
> `InvalidACLException` thrown during the tests, ex:
>  
> {code:java}
> GssapiAuthenticationTest > testServerNotFoundInKerberosDatabase() FAILED
> [2021-06-04T02:25:45.419Z] 
> org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = 
> InvalidACL for /config/topics/__consumer_offsets
> [2021-06-04T02:25:45.419Z] at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:128)
> [2021-06-04T02:25:45.419Z] at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> [2021-06-04T02:25:45.419Z] at 
> kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:583)
> [2021-06-04T02:25:45.419Z] at 
> kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1729)
> [2021-06-04T02:25:45.419Z] at 
> kafka.zk.KafkaZkClient.createOrSet$1(KafkaZkClient.scala:366)
> [2021-06-04T02:25:45.419Z] at 
> kafka.zk.KafkaZkClient.setOrCreateEntityConfigs(KafkaZkClient.scala:376

[jira] [Commented] (KAFKA-12629) Failing Test: RaftClusterTest

2021-06-22 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17367235#comment-17367235
 ] 

Bruno Cadonna commented on KAFKA-12629:
---

Failed again:

https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10911/1/testReport/

> Failing Test: RaftClusterTest
> -
>
> Key: KAFKA-12629
> URL: https://issues.apache.org/jira/browse/KAFKA-12629
> Project: Kafka
>  Issue Type: Test
>  Components: core, unit tests
>Reporter: Matthias J. Sax
>Assignee: Luke Chen
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 3.0.0
>
>
> {quote} {{java.util.concurrent.ExecutionException: 
> java.lang.ClassNotFoundException: 
> org.apache.kafka.controller.NoOpSnapshotWriterBuilder
>   at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> kafka.testkit.KafkaClusterTestKit.startup(KafkaClusterTestKit.java:364)
>   at 
> kafka.server.RaftClusterTest.testCreateClusterAndCreateAndManyTopicsWithManyPartitions(RaftClusterTest.scala:181)}}{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12786) Getting SslTransportLayerTest error

2021-06-22 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17367263#comment-17367263
 ] 

Bruno Cadonna commented on KAFKA-12786:
---

Failed again under JDK 11:

https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10912/2/testReport/

> Getting SslTransportLayerTest error 
> 
>
> Key: KAFKA-12786
> URL: https://issues.apache.org/jira/browse/KAFKA-12786
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
> Environment: Ububtu 20.04
>Reporter: Sibelle
>Priority: Major
>  Labels: beginner
> Attachments: Error.png
>
>
> SaslAuthenticatorTest > testRepeatedValidSaslPlainOverSsl() PASSED
> org.apache.kafka.common.network.SslTransportLayerTest.testUnsupportedTLSVersion(Args)[1]
>  failed, log available in 
> /kafka/clients/build/reports/testOutput/org.apache.kafka.common.network.SslTransportLayerTest.testUnsupportedTLSVersion(Args)[1].test.stdout
> SslTransportLayerTest > [1] tlsProtocol=TLSv1.2, useInlinePem=false FAILED
> org.opentest4j.AssertionFailedError: Condition not met within timeout 
> 15000. Metric not updated failed-authentication-total expected:<1.0> but 
> was:<0.0> ==> expected:  but was: 
> at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
> at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40)
> at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:193)
> at 
> org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:320)
> at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:368)
> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:317)
> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:301)
> at 
> org.apache.kafka.common.network.NioEchoServer.waitForMetrics(NioEchoServer.java:196)
> at 
> org.apache.kafka.common.network.NioEchoServer.verifyAuthenticationMetrics(NioEchoServer.java:155)
> at 
> org.apache.kafka.common.network.SslTransportLayerTest.testUnsupportedTLSVersion(SslTransportLayerTest.java:644)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12629) Failing Test: RaftClusterTest

2021-06-22 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17367264#comment-17367264
 ] 

Bruno Cadonna commented on KAFKA-12629:
---

Failed again:

https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10912/2/testReport/

> Failing Test: RaftClusterTest
> -
>
> Key: KAFKA-12629
> URL: https://issues.apache.org/jira/browse/KAFKA-12629
> Project: Kafka
>  Issue Type: Test
>  Components: core, unit tests
>Reporter: Matthias J. Sax
>Assignee: Luke Chen
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 3.0.0
>
>
> {quote} {{java.util.concurrent.ExecutionException: 
> java.lang.ClassNotFoundException: 
> org.apache.kafka.controller.NoOpSnapshotWriterBuilder
>   at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> kafka.testkit.KafkaClusterTestKit.startup(KafkaClusterTestKit.java:364)
>   at 
> kafka.server.RaftClusterTest.testCreateClusterAndCreateAndManyTopicsWithManyPartitions(RaftClusterTest.scala:181)}}{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12993) Formatting of Streams 'Memory Management' docs is messed up

2021-06-25 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369454#comment-17369454
 ] 

Bruno Cadonna commented on KAFKA-12993:
---

I thinkthis has already been done by 
https://github.com/apache/kafka/pull/10651. With the next release it should be 
also merged to kafka-site. 
I am not sure about the process here. Should every PR against docs in kafka 
also be made against kafka-site? \cc [~ableegoldman] [~mjsax] 

> Formatting of Streams 'Memory Management' docs is messed up 
> 
>
> Key: KAFKA-12993
> URL: https://issues.apache.org/jira/browse/KAFKA-12993
> Project: Kafka
>  Issue Type: Bug
>  Components: docs, streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.0.0, 2.8.1
>
>
> The formatting of this page is all messed up, starting in the RocksDB 
> section. It looks like there's a missing closing tag after the example 
> BoundedMemoryRocksDBConfig class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-12993) Formatting of Streams 'Memory Management' docs is messed up

2021-06-25 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369454#comment-17369454
 ] 

Bruno Cadonna edited comment on KAFKA-12993 at 6/25/21, 1:00 PM:
-

I think this has already been done by 
https://github.com/apache/kafka/pull/10651. With the next release it should be 
also merged to kafka-site. 
I am not sure about the process here. Should every PR against docs in kafka 
also be made against kafka-site? \cc [~ableegoldman] [~mjsax] 


was (Author: cadonna):
I thinkthis has already been done by 
https://github.com/apache/kafka/pull/10651. With the next release it should be 
also merged to kafka-site. 
I am not sure about the process here. Should every PR against docs in kafka 
also be made against kafka-site? \cc [~ableegoldman] [~mjsax] 

> Formatting of Streams 'Memory Management' docs is messed up 
> 
>
> Key: KAFKA-12993
> URL: https://issues.apache.org/jira/browse/KAFKA-12993
> Project: Kafka
>  Issue Type: Bug
>  Components: docs, streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.0.0, 2.8.1
>
>
> The formatting of this page is all messed up, starting in the RocksDB 
> section. It looks like there's a missing closing tag after the example 
> BoundedMemoryRocksDBConfig class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-12993) Formatting of Streams 'Memory Management' docs is messed up

2021-06-25 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369454#comment-17369454
 ] 

Bruno Cadonna edited comment on KAFKA-12993 at 6/25/21, 1:01 PM:
-

I think this has already been done by 
https://github.com/apache/kafka/pull/10651. With the next release it should be 
also merged to kafka-site. 
I am not sure about the process here. Should every PR against docs in kafka 
also be made against kafka-site? \cc [~ableegoldman] [~mjsax] [~vvcephei]


was (Author: cadonna):
I think this has already been done by 
https://github.com/apache/kafka/pull/10651. With the next release it should be 
also merged to kafka-site. 
I am not sure about the process here. Should every PR against docs in kafka 
also be made against kafka-site? \cc [~ableegoldman] [~mjsax] 

> Formatting of Streams 'Memory Management' docs is messed up 
> 
>
> Key: KAFKA-12993
> URL: https://issues.apache.org/jira/browse/KAFKA-12993
> Project: Kafka
>  Issue Type: Bug
>  Components: docs, streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.0.0, 2.8.1
>
>
> The formatting of this page is all messed up, starting in the RocksDB 
> section. It looks like there's a missing closing tag after the example 
> BoundedMemoryRocksDBConfig class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12511) Flaky test DynamicConnectionQuotaTest.testDynamicListenerConnectionCreationRateQuota()

2021-06-28 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370502#comment-17370502
 ] 

Bruno Cadonna commented on KAFKA-12511:
---

Failed again 
https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10920/1/testReport/kafka.network/DynamicConnectionQuotaTest/Build___JDK_16_and_Scala_2_13___testDynamicListenerConnectionCreationRateQuota__/

{code:java}
java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: 
Listener EXTERNAL connection rate 4.958803783948733 must be below 4.8 ==> 
expected:  but was: 
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
at 
kafka.network.DynamicConnectionQuotaTest.testDynamicListenerConnectionCreationRateQuota(DynamicConnectionQuotaTest.scala:232)
{code}

> Flaky test 
> DynamicConnectionQuotaTest.testDynamicListenerConnectionCreationRateQuota()
> --
>
> Key: KAFKA-12511
> URL: https://issues.apache.org/jira/browse/KAFKA-12511
> Project: Kafka
>  Issue Type: Bug
>Reporter: dengziming
>Priority: Minor
>
> First time:
> Listener PLAINTEXT connection rate 14.419389476913636 must be below 
> 14.399 ==> expected:  but was: 
> Second time:
> Listener EXTERNAL connection rate 10.998243336133811 must be below 
> 10.799 ==> expected:  but was: 
> details: 
> https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10289/4/testReport/junit/kafka.network/DynamicConnectionQuotaTest/Build___JDK_11___testDynamicListenerConnectionCreationRateQuota__/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13010) Flaky test org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()

2021-06-29 Thread Bruno Cadonna (Jira)
Bruno Cadonna created KAFKA-13010:
-

 Summary: Flaky test 
org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()
 Key: KAFKA-13010
 URL: https://issues.apache.org/jira/browse/KAFKA-13010
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Bruno Cadonna


Integration test {{test 
org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()}}
 sometimes fails with

{code:java}
java.lang.AssertionError: only one task
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:26)
at 
org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.getTaskMetadata(TaskMetadataIntegrationTest.java:162)
at 
org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation(TaskMetadataIntegrationTest.java:117)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-13010) Flaky test org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()

2021-06-29 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13010:
--
Labels: flaky-test  (was: )

> Flaky test 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()
> ---
>
> Key: KAFKA-13010
> URL: https://issues.apache.org/jira/browse/KAFKA-13010
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Bruno Cadonna
>Priority: Major
>  Labels: flaky-test
>
> Integration test {{test 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()}}
>  sometimes fails with
> {code:java}
> java.lang.AssertionError: only one task
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:26)
>   at 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.getTaskMetadata(TaskMetadataIntegrationTest.java:162)
>   at 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation(TaskMetadataIntegrationTest.java:117)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13009) Metrics recorder is re-initialised with different task

2021-07-01 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372587#comment-17372587
 ] 

Bruno Cadonna commented on KAFKA-13009:
---

Hi [~VictorvandenHoven]! Thank you for filing this ticket!
Could you please provide a minimal example to reproduce this issue?

> Metrics recorder is re-initialised with different task
> --
>
> Key: KAFKA-13009
> URL: https://issues.apache.org/jira/browse/KAFKA-13009
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.7.0
> Environment: Docker container
>Reporter: Victor van den Hoven
>Priority: Major
>
> When starting my Kafka Stream application, I get in the Logs:
>  
> [SmartMeterActionService-e0d0f403-87c7-4502-b1be-875d544899e2-StreamThread-1] 
> State transition from STARTING to 
> PARTITIONS_ASSIGNED[SmartMeterActionService-e0d0f403-87c7-4502-b1be-875d544899e2-StreamThread-1]
>  State transition from STARTING to PARTITIONS_ASSIGNED2021-06-29 07:35:58.258 
> ERROR 1 — [-StreamThread-1] o.a.k.s.p.internals.StreamThread         : 
> stream-thread 
> [SmartMeterActionService-e0d0f403-87c7-4502-b1be-875d544899e2-StreamThread-1] 
> Encountered the following exception during processing and the thread is going 
> to shut down: 
>  java.lang.IllegalStateException: Metrics recorder is re-initialised with 
> different task: previous task is -1_-1 whereas current task is 0_1. 
> *{color:#ff}This is a bug in Kafka Streams. Please open a bug report 
> under [https://issues.apache.org/jira/projects/KAFKA/issues] at{color}* 
> org.apache.kafka.streams.state.internals.metrics.RocksDBMetricsRecorder.init(RocksDBMetricsRecorder.java:137)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:252)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:55)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.init(CachingKeyValueStore.java:74)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:55)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$1(MeteredKeyValueStore.java:120)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:883)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:120)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStateStores(ProcessorStateManager.java:201)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:103)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.processor.internals.StandbyTask.initializeIfNeeded(StandbyTask.java:93)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:473)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:728)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:625)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
>  ~[kafka-streams-2.7.0.jar!/:na] at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:512)
>  ~[kafka-streams-2.7.0.jar!/:na]
>  2021-06-29 07:35:58.259  INFO 1 — [-StreamThread-1] 
> o.a.k.s.p.internals.StreamThread         : stream-thread 
> [SmartMeterActionService-e0d0f403-87c7-4502-b1be-875d544899e2-StreamThread-1] 
> State transition from PARTITIONS_ASSIGNED to PENDING_SHUTDOWN2021-06-29 
> 07:35:58.259  INFO 1 — [-StreamThread-1] o.a.k.s.p.internals.StreamThread     
>     : stream-thread 
> [SmartMeterActionService-e0d0f403-87c7-4502-b1be-875d544899e2-StreamThread-1] 
> Shutting down
>  
> After this the application shuts down!
>  
>  
> After removing the internal  change-log-topic the application could start 
> again without the issue.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13002) listOffsets must downgrade immediately for non MAX_TIMESTAMP specs

2021-07-01 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372615#comment-17372615
 ] 

Bruno Cadonna commented on KAFKA-13002:
---

[~dajac] I deployed the soak cluster that showed the issue from trunk. Until 
now it looks good. We also run into the same issue with our benchmarks. I also 
started them to see if everything is OK. We also saw the issue in the Streams 
system tests {{kafkatest.tests.streams.streams_broker_compatibility_test}}.  

> listOffsets must downgrade immediately for non MAX_TIMESTAMP specs
> --
>
> Key: KAFKA-13002
> URL: https://issues.apache.org/jira/browse/KAFKA-13002
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: John Roesler
>Assignee: Tom Scott
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: soaks.png
>
>
> Note: this is not a report against a released version of AK. It seems to be a 
> problem on the trunk development branch only.
> After deploying our soak test against `trunk/HEAD` on Friday, I noticed that 
> Streams is no longer processing:
> !soaks.png!
> I found this stacktrace in the logs during startup:
> {code:java}
> 5075 [2021-06-25T16:50:44-05:00] 
> (streams-soak-trunk-ccloud-alos_soak_i-0691913411e8c77c3_streamslog) 
> [2021-06-25 21:50:44,499] WARN [i-0691913411e8c77c3-StreamThread-1] The 
> listOffsets request failed. 
> (org.apache.kafka.streams.processor.internals.ClientUtils)
>  5076 [2021-06-25T16:50:44-05:00] 
> (streams-soak-trunk-ccloud-alos_soak_i-0691913411e8c77c3_streamslog) 
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.UnsupportedVersionException: The broker does 
> not support LIST_OFFSETS with version in range [7,7].   The supported 
> range is [0,6].
>  5077 at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>  5078 at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>  5079 at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>  5080 at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>  5081 at 
> org.apache.kafka.streams.processor.internals.ClientUtils.getEndOffsets(ClientUtils.java:147)
>  5082 at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.populateClientStatesMap(StreamsPartitionAssignor.java:643)
>  5083 at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assignTasksToClients(StreamsPartitionAssignor.java:579)
>  5084 at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assign(StreamsPartitionAssignor.java:387)
>  5085 at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:589)
>  5086 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:689)
>  5087 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1000(AbstractCoordinator.java:111)
>  5088 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:593)
>  5089 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:556)
>  5090 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1178)
>  5091 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1153)
>  5092 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:206)
>  5093 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:169)
>  5094 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:129)
>  5095 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:602)
>  5096 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:412)
>  5097 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297)
>  5098 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
>  5099 at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1

[jira] [Commented] (KAFKA-13002) listOffsets must downgrade immediately for non MAX_TIMESTAMP specs

2021-07-01 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372639#comment-17372639
 ] 

Bruno Cadonna commented on KAFKA-13002:
---

The soak cluster still looks good and the benchmarks completed successfully 
their first run. I guess the fix is fine.

> listOffsets must downgrade immediately for non MAX_TIMESTAMP specs
> --
>
> Key: KAFKA-13002
> URL: https://issues.apache.org/jira/browse/KAFKA-13002
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: John Roesler
>Assignee: Tom Scott
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: soaks.png
>
>
> Note: this is not a report against a released version of AK. It seems to be a 
> problem on the trunk development branch only.
> After deploying our soak test against `trunk/HEAD` on Friday, I noticed that 
> Streams is no longer processing:
> !soaks.png!
> I found this stacktrace in the logs during startup:
> {code:java}
> 5075 [2021-06-25T16:50:44-05:00] 
> (streams-soak-trunk-ccloud-alos_soak_i-0691913411e8c77c3_streamslog) 
> [2021-06-25 21:50:44,499] WARN [i-0691913411e8c77c3-StreamThread-1] The 
> listOffsets request failed. 
> (org.apache.kafka.streams.processor.internals.ClientUtils)
>  5076 [2021-06-25T16:50:44-05:00] 
> (streams-soak-trunk-ccloud-alos_soak_i-0691913411e8c77c3_streamslog) 
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.UnsupportedVersionException: The broker does 
> not support LIST_OFFSETS with version in range [7,7].   The supported 
> range is [0,6].
>  5077 at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>  5078 at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>  5079 at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>  5080 at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>  5081 at 
> org.apache.kafka.streams.processor.internals.ClientUtils.getEndOffsets(ClientUtils.java:147)
>  5082 at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.populateClientStatesMap(StreamsPartitionAssignor.java:643)
>  5083 at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assignTasksToClients(StreamsPartitionAssignor.java:579)
>  5084 at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assign(StreamsPartitionAssignor.java:387)
>  5085 at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:589)
>  5086 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:689)
>  5087 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1000(AbstractCoordinator.java:111)
>  5088 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:593)
>  5089 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:556)
>  5090 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1178)
>  5091 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1153)
>  5092 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:206)
>  5093 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:169)
>  5094 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:129)
>  5095 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:602)
>  5096 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:412)
>  5097 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297)
>  5098 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
>  5099 at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1297)
>  5100 at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1238)
>  5101 at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.ja

[jira] [Commented] (KAFKA-13010) Flaky test org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()

2021-07-01 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372977#comment-17372977
 ] 

Bruno Cadonna commented on KAFKA-13010:
---

Some logs that might be interesting:

{code}
[2021-06-29 12:19:40,200] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2-consumer]
 Finished unstable assignment of tasks, a followup rebalance will be scheduled. 
(org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor:818)
[2021-06-29 12:19:40,200] WARN [Consumer 
clientId=TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2-consumer,
 
groupId=TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation]
 The following subscribed topics are not assigned to any members: 
[inputTaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation] 
 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:611)
[2021-06-29 12:19:40,200] INFO [GroupCoordinator 0]: Assignment received from 
leader 
TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2-consumer-0a548162-9e3f-4003-98c5-54ece6f5e1b8
 for group 
TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation
 for generation 2. The group has 2 members, 0 of which are static. 
(kafka.coordinator.group.GroupCoordinator:66)
[2021-06-29 12:19:40,201] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-1-consumer]
 Requested to schedule immediate rebalance for new tasks to be safely revoked 
from current owner. 
(org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor:1300)
[2021-06-29 12:19:40,201] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 State transition from RUNNING to PARTITIONS_REVOKED 
(org.apache.kafka.streams.processor.internals.StreamThread:229)
[2021-06-29 12:19:40,201] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-1]
 Handle new assignment with:
New active tasks: []
New standby tasks: []
Existing active tasks: []
Existing standby tasks: [] 
(org.apache.kafka.streams.processor.internals.TaskManager:263)
[2021-06-29 12:19:40,201] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-1]
 State transition from STARTING to PARTITIONS_ASSIGNED 
(org.apache.kafka.streams.processor.internals.StreamThread:229)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 task [0_0] Suspended RUNNING 
(org.apache.kafka.streams.processor.internals.StreamTask:1187)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 task [0_0] Suspended running 
(org.apache.kafka.streams.processor.internals.StreamTask:300)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 partition revocation took 1 ms. 
(org.apache.kafka.streams.processor.internals.StreamThread:97)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2-consumer]
 No followup rebalance was requested, resetting the rebalance schedule. 
(org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor:1306)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 Handle new assignment with:
New active tasks: []
New standby tasks: []
Existing active tasks: [0_0]
Existing standby tasks: [] 
(org.apache.kafka.streams.processor.internals.TaskManager:263)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 task [0_0] Closing record collector clean 
(org.apache.kafka.streams.processor.internals.RecordCollectorImpl:268)

[jira] [Comment Edited] (KAFKA-13010) Flaky test org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()

2021-07-01 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372977#comment-17372977
 ] 

Bruno Cadonna edited comment on KAFKA-13010 at 7/1/21, 6:01 PM:


Some logs that might be interesting:

{code}
[2021-06-29 12:19:40,200] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2-consumer]
 Finished unstable assignment of tasks, a followup rebalance will be scheduled. 
(org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor:818)
[2021-06-29 12:19:40,200] WARN [Consumer 
clientId=TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2-consumer,
 
groupId=TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation]
 The following subscribed topics are not assigned to any members: 
[inputTaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation] 
 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:611)
[2021-06-29 12:19:40,200] INFO [GroupCoordinator 0]: Assignment received from 
leader 
TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2-consumer-0a548162-9e3f-4003-98c5-54ece6f5e1b8
 for group 
TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation
 for generation 2. The group has 2 members, 0 of which are static. 
(kafka.coordinator.group.GroupCoordinator:66)
[2021-06-29 12:19:40,201] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-1-consumer]
 Requested to schedule immediate rebalance for new tasks to be safely revoked 
from current owner. 
(org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor:1300)
[2021-06-29 12:19:40,201] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 State transition from RUNNING to PARTITIONS_REVOKED 
(org.apache.kafka.streams.processor.internals.StreamThread:229)
[2021-06-29 12:19:40,201] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-1]
 Handle new assignment with:
New active tasks: []
New standby tasks: []
Existing active tasks: []
Existing standby tasks: [] 
(org.apache.kafka.streams.processor.internals.TaskManager:263)
[2021-06-29 12:19:40,201] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-1]
 State transition from STARTING to PARTITIONS_ASSIGNED 
(org.apache.kafka.streams.processor.internals.StreamThread:229)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 task [0_0] Suspended RUNNING 
(org.apache.kafka.streams.processor.internals.StreamTask:1187)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 task [0_0] Suspended running 
(org.apache.kafka.streams.processor.internals.StreamTask:300)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 partition revocation took 1 ms. 
(org.apache.kafka.streams.processor.internals.StreamThread:97)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2-consumer]
 No followup rebalance was requested, resetting the rebalance schedule. 
(org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor:1306)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 Handle new assignment with:
New active tasks: []
New standby tasks: []
Existing active tasks: [0_0]
Existing standby tasks: [] 
(org.apache.kafka.streams.processor.internals.TaskManager:263)
[2021-06-29 12:19:40,202] INFO stream-thread 
[TaskMetadataTest_TaskMetadataIntegrationTestshouldReportCorrectCommittedOffsetInformation-6889e6c9-c4fa-427c-83bf-469b33a34bb5-StreamThread-2]
 task [0_0] Closing record collector clean 
(org.apache.kafka.str

[jira] [Commented] (KAFKA-8529) Flakey test ConsumerBounceTest#testCloseDuringRebalance

2021-07-12 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379260#comment-17379260
 ] 

Bruno Cadonna commented on KAFKA-8529:
--

Failed again: 

https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10943/5/testReport/junit/kafka.api/ConsumerBounceTest/Build___JDK_16_and_Scala_2_13___testCloseDuringRebalance___2/

{code:java}
org.opentest4j.AssertionFailedError: Rebalance did not complete in time ==> 
expected:  but was: 
at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40)
at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:193)
at 
kafka.api.ConsumerBounceTest.waitForRebalance$1(ConsumerBounceTest.scala:400)
at 
kafka.api.ConsumerBounceTest.checkCloseDuringRebalance(ConsumerBounceTest.scala:414)
at 
kafka.api.ConsumerBounceTest.testCloseDuringRebalance(ConsumerBounceTest.scala:381)
{code}

> Flakey test ConsumerBounceTest#testCloseDuringRebalance
> ---
>
> Key: KAFKA-8529
> URL: https://issues.apache.org/jira/browse/KAFKA-8529
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Boyang Chen
>Priority: Major
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5450/consoleFull]
>  
> *16:16:10* kafka.api.ConsumerBounceTest > testCloseDuringRebalance 
> STARTED*16:16:22* kafka.api.ConsumerBounceTest.testCloseDuringRebalance 
> failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.api.ConsumerBounceTest.testCloseDuringRebalance.test.stdout*16:16:22*
>  *16:16:22* kafka.api.ConsumerBounceTest > testCloseDuringRebalance 
> FAILED*16:16:22* java.lang.AssertionError: Rebalance did not complete in 
> time*16:16:22* at org.junit.Assert.fail(Assert.java:89)*16:16:22* 
> at org.junit.Assert.assertTrue(Assert.java:42)*16:16:22* at 
> kafka.api.ConsumerBounceTest.waitForRebalance$1(ConsumerBounceTest.scala:402)*16:16:22*
>  at 
> kafka.api.ConsumerBounceTest.checkCloseDuringRebalance(ConsumerBounceTest.scala:416)*16:16:22*
>  at 
> kafka.api.ConsumerBounceTest.testCloseDuringRebalance(ConsumerBounceTest.scala:379)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10251) Flaky Test kafka.api.TransactionsBounceTest.testWithGroupMetadata

2021-07-12 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379262#comment-17379262
 ] 

Bruno Cadonna commented on KAFKA-10251:
---

Failed again:

https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10943/5/testReport/junit/kafka.api/TransactionsBounceTest/Build___JDK_8_and_Scala_2_12___testWithGroupId___2/

{code:java}
org.opentest4j.AssertionFailedError: Consumed 0 records before timeout instead 
of the expected 200 records
at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:39)
at org.junit.jupiter.api.Assertions.fail(Assertions.java:117)
at 
kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:833)
at 
kafka.api.TransactionsBounceTest.testWithGroupId(TransactionsBounceTest.scala:112)
{code}

> Flaky Test kafka.api.TransactionsBounceTest.testWithGroupMetadata
> -
>
> Key: KAFKA-10251
> URL: https://issues.apache.org/jira/browse/KAFKA-10251
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: A. Sophie Blee-Goldman
>Assignee: Luke Chen
>Priority: Major
>
> h3. Stacktrace
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 200 records at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) 
> at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) 
> at org.scalatest.Assertions.fail(Assertions.scala:1091) at 
> org.scalatest.Assertions.fail$(Assertions.scala:1087) at 
> org.scalatest.Assertions$.fail(Assertions.scala:1389) at 
> kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:842) at 
> kafka.api.TransactionsBounceTest.testWithGroupMetadata(TransactionsBounceTest.scala:109)
>  
>  
> The logs are pretty much just this on repeat:
> {code:java}
> [2020-07-08 23:41:04,034] ERROR Error when sending message to topic 
> output-topic with key: 9955, value: 9955 with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback:52) 
> org.apache.kafka.common.KafkaException: Failing batch since transaction was 
> aborted at 
> org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:423)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:313) 
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:240) at 
> java.lang.Thread.run(Thread.java:748) [2020-07-08 23:41:04,034] ERROR Error 
> when sending message to topic output-topic with key: 9959, value: 9959 with 
> error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback:52) 
> org.apache.kafka.common.KafkaException: Failing batch since transaction was 
> aborted at 
> org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:423)
>  at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:313) 
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:240) at 
> java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12629) Failing Test: RaftClusterTest

2021-07-12 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379264#comment-17379264
 ] 

Bruno Cadonna commented on KAFKA-12629:
---

Failed again:

https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10943/5/testReport/

> Failing Test: RaftClusterTest
> -
>
> Key: KAFKA-12629
> URL: https://issues.apache.org/jira/browse/KAFKA-12629
> Project: Kafka
>  Issue Type: Test
>  Components: core, unit tests
>Reporter: Matthias J. Sax
>Assignee: Luke Chen
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 3.0.0
>
>
> {quote} {{java.util.concurrent.ExecutionException: 
> java.lang.ClassNotFoundException: 
> org.apache.kafka.controller.NoOpSnapshotWriterBuilder
>   at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> kafka.testkit.KafkaClusterTestKit.startup(KafkaClusterTestKit.java:364)
>   at 
> kafka.server.RaftClusterTest.testCreateClusterAndCreateAndManyTopicsWithManyPartitions(RaftClusterTest.scala:181)}}{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13010) Flaky test org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()

2021-07-14 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380444#comment-17380444
 ] 

Bruno Cadonna commented on KAFKA-13010:
---

For the future assignee of this ticket: I was able to reproduce this failure 
multiple times by running the test in IntelliJ in the until failure mode. It 
failed quite quickly after approx. 25 runs.

> Flaky test 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()
> ---
>
> Key: KAFKA-13010
> URL: https://issues.apache.org/jira/browse/KAFKA-13010
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Bruno Cadonna
>Priority: Major
>  Labels: flaky-test
>
> Integration test {{test 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()}}
>  sometimes fails with
> {code:java}
> java.lang.AssertionError: only one task
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:26)
>   at 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.getTaskMetadata(TaskMetadataIntegrationTest.java:162)
>   at 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation(TaskMetadataIntegrationTest.java:117)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-13088) Replace EasyMock with Mockito for ForwardingDisabledProcessorContextTest

2021-07-14 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13088:
--
Component/s: streams

> Replace EasyMock with Mockito for ForwardingDisabledProcessorContextTest
> 
>
> Key: KAFKA-13088
> URL: https://issues.apache.org/jira/browse/KAFKA-13088
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Chun-Hao Tang
>Assignee: Chun-Hao Tang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13094) Session windows do not consider user-specified grace when computing retention time for changelog

2021-07-15 Thread Bruno Cadonna (Jira)
Bruno Cadonna created KAFKA-13094:
-

 Summary: Session windows do not consider user-specified grace when 
computing retention time for changelog
 Key: KAFKA-13094
 URL: https://issues.apache.org/jira/browse/KAFKA-13094
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.8.0
Reporter: Bruno Cadonna


Session windows use internally method {{maintainMs()}} to compute the retention 
time for their changelog topic if users do not provide a retention time 
explicitly with {{Materilaized}}. However, {{maintainMs()}} does not consider 
user-specified grace period when computing the retention time.

The bug can be verified with the following test method:
{code:java}
@Test
public void 
shouldUseGapAndGraceAsRetentionTimeIfBothCombinedAreLargerThanDefaultRetentionTime()
 {
final Duration windowsSize = ofDays(1).minus(ofMillis(1));
final Duration gracePeriod = ofMillis(2);
assertEquals(windowsSize.toMillis() + gracePeriod.toMillis(), 
SessionWindows.with(windowsSize).grace(gracePeriod).maintainMs());
}
{code}   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-13094) Session windows do not consider user-specified grace when computing retention time for changelog

2021-07-15 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13094:
--
Description: 
Session windows use internally method {{maintainMs()}} to compute the retention 
time for their changelog topic if users do not provide a retention time 
explicitly with {{Materilaized}}. However, {{maintainMs()}} does not consider 
user-specified grace period when computing the retention time.

The bug can be verified with the following test method:
{code:java}
@Test
public void 
shouldUseGapAndGraceAsRetentionTimeIfBothCombinedAreLargerThanDefaultRetentionTime()
 {
final Duration windowsSize = ofDays(1).minus(ofMillis(1));
final Duration gracePeriod = ofMillis(2);
assertEquals(windowsSize.toMillis() + gracePeriod.toMillis(), 
SessionWindows.with(windowsSize).grace(gracePeriod).maintainMs());
}
{code}   

The 

  was:
Session windows use internally method {{maintainMs()}} to compute the retention 
time for their changelog topic if users do not provide a retention time 
explicitly with {{Materilaized}}. However, {{maintainMs()}} does not consider 
user-specified grace period when computing the retention time.

The bug can be verified with the following test method:
{code:java}
@Test
public void 
shouldUseGapAndGraceAsRetentionTimeIfBothCombinedAreLargerThanDefaultRetentionTime()
 {
final Duration windowsSize = ofDays(1).minus(ofMillis(1));
final Duration gracePeriod = ofMillis(2);
assertEquals(windowsSize.toMillis() + gracePeriod.toMillis(), 
SessionWindows.with(windowsSize).grace(gracePeriod).maintainMs());
}
{code}   


> Session windows do not consider user-specified grace when computing retention 
> time for changelog
> 
>
> Key: KAFKA-13094
> URL: https://issues.apache.org/jira/browse/KAFKA-13094
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.8.0
>Reporter: Bruno Cadonna
>Priority: Major
>
> Session windows use internally method {{maintainMs()}} to compute the 
> retention time for their changelog topic if users do not provide a retention 
> time explicitly with {{Materilaized}}. However, {{maintainMs()}} does not 
> consider user-specified grace period when computing the retention time.
> The bug can be verified with the following test method:
> {code:java}
> @Test
> public void 
> shouldUseGapAndGraceAsRetentionTimeIfBothCombinedAreLargerThanDefaultRetentionTime()
>  {
> final Duration windowsSize = ofDays(1).minus(ofMillis(1));
> final Duration gracePeriod = ofMillis(2);
> assertEquals(windowsSize.toMillis() + gracePeriod.toMillis(), 
> SessionWindows.with(windowsSize).grace(gracePeriod).maintainMs());
> }
> {code}   
> The 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-13094) Session windows do not consider user-specified grace when computing retention time for changelog

2021-07-15 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13094:
--
Description: 
Session windows use internally method {{maintainMs()}} to compute the retention 
time for their changelog topic if users do not provide a retention time 
explicitly with {{Materilaized}}. However, {{maintainMs()}} does not consider 
user-specified grace period when computing the retention time.

The bug can be verified with the following test method:
{code:java}
@Test
public void 
shouldUseGapAndGraceAsRetentionTimeIfBothCombinedAreLargerThanDefaultRetentionTime()
 {
final Duration gapSize = ofDays(1);
final Duration gracePeriod = ofMillis(2);
assertEquals(
gapSize.toMillis() + gracePeriod.toMillis(), 
SessionWindows.with(gapSize).grace(gracePeriod).maintainMs()
);
}
{code}   

The test should pass since the retention time of the changelog topic should be 
gap + grace. However, the test fails. 

  was:
Session windows use internally method {{maintainMs()}} to compute the retention 
time for their changelog topic if users do not provide a retention time 
explicitly with {{Materilaized}}. However, {{maintainMs()}} does not consider 
user-specified grace period when computing the retention time.

The bug can be verified with the following test method:
{code:java}
@Test
public void 
shouldUseGapAndGraceAsRetentionTimeIfBothCombinedAreLargerThanDefaultRetentionTime()
 {
final Duration gapSize = ofDays(1);
final Duration gracePeriod = ofMillis(2);
assertEquals(gapSize.toMillis() + gracePeriod.toMillis(), 
SessionWindows.with(gapSize).grace(gracePeriod).maintainMs());
}
{code}   

The test should pass since the retention time of the changelog topic should be 
gap + grace. However, the test fails. 


> Session windows do not consider user-specified grace when computing retention 
> time for changelog
> 
>
> Key: KAFKA-13094
> URL: https://issues.apache.org/jira/browse/KAFKA-13094
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.8.0
>Reporter: Bruno Cadonna
>Priority: Major
>
> Session windows use internally method {{maintainMs()}} to compute the 
> retention time for their changelog topic if users do not provide a retention 
> time explicitly with {{Materilaized}}. However, {{maintainMs()}} does not 
> consider user-specified grace period when computing the retention time.
> The bug can be verified with the following test method:
> {code:java}
> @Test
> public void 
> shouldUseGapAndGraceAsRetentionTimeIfBothCombinedAreLargerThanDefaultRetentionTime()
>  {
> final Duration gapSize = ofDays(1);
> final Duration gracePeriod = ofMillis(2);
> assertEquals(
> gapSize.toMillis() + gracePeriod.toMillis(), 
> SessionWindows.with(gapSize).grace(gracePeriod).maintainMs()
> );
> }
> {code}   
> The test should pass since the retention time of the changelog topic should 
> be gap + grace. However, the test fails. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-13094) Session windows do not consider user-specified grace when computing retention time for changelog

2021-07-15 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13094:
--
Description: 
Session windows use internally method {{maintainMs()}} to compute the retention 
time for their changelog topic if users do not provide a retention time 
explicitly with {{Materilaized}}. However, {{maintainMs()}} does not consider 
user-specified grace period when computing the retention time.

The bug can be verified with the following test method:
{code:java}
@Test
public void 
shouldUseGapAndGraceAsRetentionTimeIfBothCombinedAreLargerThanDefaultRetentionTime()
 {
final Duration gapSize = ofDays(1);
final Duration gracePeriod = ofMillis(2);
assertEquals(gapSize.toMillis() + gracePeriod.toMillis(), 
SessionWindows.with(gapSize).grace(gracePeriod).maintainMs());
}
{code}   

The test should pass since the retention time of the changelog topic should be 
gap + grace. However, the test fails. 

  was:
Session windows use internally method {{maintainMs()}} to compute the retention 
time for their changelog topic if users do not provide a retention time 
explicitly with {{Materilaized}}. However, {{maintainMs()}} does not consider 
user-specified grace period when computing the retention time.

The bug can be verified with the following test method:
{code:java}
@Test
public void 
shouldUseGapAndGraceAsRetentionTimeIfBothCombinedAreLargerThanDefaultRetentionTime()
 {
final Duration windowsSize = ofDays(1).minus(ofMillis(1));
final Duration gracePeriod = ofMillis(2);
assertEquals(windowsSize.toMillis() + gracePeriod.toMillis(), 
SessionWindows.with(windowsSize).grace(gracePeriod).maintainMs());
}
{code}   

The 


> Session windows do not consider user-specified grace when computing retention 
> time for changelog
> 
>
> Key: KAFKA-13094
> URL: https://issues.apache.org/jira/browse/KAFKA-13094
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.8.0
>Reporter: Bruno Cadonna
>Priority: Major
>
> Session windows use internally method {{maintainMs()}} to compute the 
> retention time for their changelog topic if users do not provide a retention 
> time explicitly with {{Materilaized}}. However, {{maintainMs()}} does not 
> consider user-specified grace period when computing the retention time.
> The bug can be verified with the following test method:
> {code:java}
> @Test
> public void 
> shouldUseGapAndGraceAsRetentionTimeIfBothCombinedAreLargerThanDefaultRetentionTime()
>  {
> final Duration gapSize = ofDays(1);
> final Duration gracePeriod = ofMillis(2);
> assertEquals(gapSize.toMillis() + gracePeriod.toMillis(), 
> SessionWindows.with(gapSize).grace(gracePeriod).maintainMs());
> }
> {code}   
> The test should pass since the retention time of the changelog topic should 
> be gap + grace. However, the test fails. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13600) Rebalances while streams is in degraded state can cause stores to be reassigned and restore from scratch

2022-02-04 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487148#comment-17487148
 ] 

Bruno Cadonna commented on KAFKA-13600:
---

I am fine with discussing the improvement on the PR and not in a KIP. I 
actually realized that the other improvements to the assignment algorithm 
included changes to the public API and therefore a KIP was needed. For me it is 
just important that we look really careful at the improvements because the 
assignment algorithm is a quite critical part of the system. 

Additionally, I did not want to discuss about a totally new assignment 
algorithm. I just linked the information for general interest.

Looking forward to the PR.

> Rebalances while streams is in degraded state can cause stores to be 
> reassigned and restore from scratch
> 
>
> Key: KAFKA-13600
> URL: https://issues.apache.org/jira/browse/KAFKA-13600
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.1.0, 2.8.1, 3.0.0
>Reporter: Tim Patterson
>Priority: Major
>
> Consider this scenario:
>  # A node is lost from the cluster.
>  # A rebalance is kicked off with a new "target assignment"'s(ie the 
> rebalance is attempting to move a lot of tasks - see 
> https://issues.apache.org/jira/browse/KAFKA-10121).
>  # The kafka cluster is now a bit more sluggish from the increased load.
>  # A Rolling Deploy happens triggering rebalances, during the rebalance 
> processing continues but offsets can't be committed(Or nodes are restarted 
> but fail to commit offsets)
>  # The most caught up nodes now aren't within `acceptableRecoveryLag` and so 
> the task is started in it's "target assignment" location, restoring all state 
> from scratch and delaying further processing instead of using the "almost 
> caught up" node.
> We've hit this a few times and having lots of state (~25TB worth) and being 
> heavy users of IQ this is not ideal for us.
> While we can increase `acceptableRecoveryLag` to larger values to try get 
> around this that causes other issues (ie a warmup becoming active when its 
> still quite far behind)
> The solution seems to be to balance "balanced assignment" with "most caught 
> up nodes".
> We've got a fork where we do just this and it's made a huge difference to the 
> reliability of our cluster.
> Our change is to simply use the most caught up node if the "target node" is 
> more than `acceptableRecoveryLag` behind.
> This gives up some of the load balancing type behaviour of the existing code 
> but in practise doesn't seem to matter too much.
> I guess maybe an algorithm that identified candidate nodes as those being 
> within `acceptableRecoveryLag` of the most caught up node might allow the 
> best of both worlds.
>  
> Our fork is
> [https://github.com/apache/kafka/compare/trunk...tim-patterson:fix_balance_uncaughtup?expand=1]
> (We also moved the capacity constraint code to happen after all the stateful 
> assignment to prioritise standby tasks over warmup tasks)
> Ideally we don't want to maintain a fork of kafka streams going forward so 
> are hoping to get a bit of discussion / agreement on the best way to handle 
> this.
> More than happy to contribute code/test different algo's in production system 
> or anything else to help with this issue



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13638) Slow KTable update when forwarding multiple values from transformer

2022-02-04 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487184#comment-17487184
 ] 

Bruno Cadonna commented on KAFKA-13638:
---

[~Lejon] How did you measure time with your test? I cannot really reproduce 
your numbers. 

> Slow KTable update when forwarding multiple values from transformer
> ---
>
> Key: KAFKA-13638
> URL: https://issues.apache.org/jira/browse/KAFKA-13638
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.1.0, 3.0.0
>Reporter: Ulrik
>Priority: Major
> Attachments: KafkaTest.java
>
>
> I have a topology where I stream messages from an input topic, transform the 
> message to multiple messages (via context.forward), and then store those 
> messages in a KTable.
> Since upgrading from kafka-streams 2.8.1 to 3.1.0 I have noticed that my 
> tests take significantly longer time to run. 
>  
> I have attached a test class to demonstrate my scenario. When running this 
> test with kafka-streams versions 2.8.1 and 3.1.0 I came up with the following 
> numbers:
>  
> *Version 2.8.1*
>  * one input message and one output message: 541 ms
>  * 8 input message and 30 output message per input message (240 output 
> messages in total): 919 ms
>  
> *Version 3.1.0*
>  * one input message and one output message: 908 ms
>  * 8 input message and 30 output message per input message (240 output 
> messages in total): 6 sec 94 ms
>  
> Even when the transformer just transforms and forwards one input message to 
> one output message, the test takes approx. 400 ms longer to run.
> When transforming 8 input messages to 240 output messages it takes approx 5 
> seconds longer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13599) Upgrade RocksDB to 6.27.3

2022-02-07 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13599:
--
Labels: streams  (was: )

> Upgrade RocksDB to 6.27.3
> -
>
> Key: KAFKA-13599
> URL: https://issues.apache.org/jira/browse/KAFKA-13599
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Reporter: Jonathan Albrecht
>Assignee: Jonathan Albrecht
>Priority: Major
>  Labels: streams
> Attachments: compat_report.html
>
>
> RocksDB v6.27.3 has been released and it is the first release to support 
> s390x. RocksDB is currently the only dependency in gradle/dependencies.gradle 
> without s390x support.
> RocksDB v6.27.3 has added some new options that require an update to 
> streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapter.java
>  but no other changes are needed to upgrade.
> A compatibility report is attached for the current version 6.22.1.1 -> 6.27.3



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13599) Upgrade RocksDB to 6.27.3

2022-02-07 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13599:
--
Fix Version/s: 3.2.0

> Upgrade RocksDB to 6.27.3
> -
>
> Key: KAFKA-13599
> URL: https://issues.apache.org/jira/browse/KAFKA-13599
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Reporter: Jonathan Albrecht
>Assignee: Jonathan Albrecht
>Priority: Major
>  Labels: streams
> Fix For: 3.2.0
>
> Attachments: compat_report.html
>
>
> RocksDB v6.27.3 has been released and it is the first release to support 
> s390x. RocksDB is currently the only dependency in gradle/dependencies.gradle 
> without s390x support.
> RocksDB v6.27.3 has added some new options that require an update to 
> streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapter.java
>  but no other changes are needed to upgrade.
> A compatibility report is attached for the current version 6.22.1.1 -> 6.27.3



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13638) Slow KTable update when forwarding multiple values from transformer

2022-02-07 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488071#comment-17488071
 ] 

Bruno Cadonna commented on KAFKA-13638:
---

[~Lejon] I put the following code around {{validateTopologyCanProcessData()}}:

{code}
final Time time = Time.SYSTEM;
final long startTime = time.milliseconds();
validateTopologyCanProcessData(builder);
final long endTime = time.milliseconds();
System.out.println("runtime: " + (endTime - startTime) + " ms");
{code}

and I measure for 2.8 an average (3 runs) of 1259 ms, and for trunk 1522 ms.

If I take the numbers of IntelliJ, I also do not get such a big difference as 
you. 

Could it be a matter of local setup?



> Slow KTable update when forwarding multiple values from transformer
> ---
>
> Key: KAFKA-13638
> URL: https://issues.apache.org/jira/browse/KAFKA-13638
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.1.0, 3.0.0
>Reporter: Ulrik
>Priority: Major
> Attachments: KafkaTest.java
>
>
> I have a topology where I stream messages from an input topic, transform the 
> message to multiple messages (via context.forward), and then store those 
> messages in a KTable.
> Since upgrading from kafka-streams 2.8.1 to 3.1.0 I have noticed that my 
> tests take significantly longer time to run. 
>  
> I have attached a test class to demonstrate my scenario. When running this 
> test with kafka-streams versions 2.8.1 and 3.1.0 I came up with the following 
> numbers:
>  
> *Version 2.8.1*
>  * one input message and one output message: 541 ms
>  * 8 input message and 30 output message per input message (240 output 
> messages in total): 919 ms
>  
> *Version 3.1.0*
>  * one input message and one output message: 908 ms
>  * 8 input message and 30 output message per input message (240 output 
> messages in total): 6 sec 94 ms
>  
> Even when the transformer just transforms and forwards one input message to 
> one output message, the test takes approx. 400 ms longer to run.
> When transforming 8 input messages to 240 output messages it takes approx 5 
> seconds longer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13647) RocksDb metrics 'number-open-files' is not correct

2022-02-07 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488076#comment-17488076
 ] 

Bruno Cadonna commented on KAFKA-13647:
---

[~LGouellec] Thank you for the report!

For each measurement of the number of open files, we record the number of files 
opened and the number of files closed since the last measurement. Then we 
subtract the number of files closed from the number of files opened. That is 
net number of files opened since last measurement (might also be negative). We 
then add the net number of files opened since last measurement to the number of 
open files which is the sum of all net numbers of files opened between 
measurements.

https://github.com/apache/kafka/blob/ca5d6f9229c170beb23809159113037f05a1120f/streams/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetricsRecorder.java#L471

So, I think using the sum metric is correct.
 
Does this make sense?

> RocksDb metrics 'number-open-files' is not correct
> --
>
> Key: KAFKA-13647
> URL: https://issues.apache.org/jira/browse/KAFKA-13647
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.0.0
>Reporter: Sylvain Le Gouellec
>Priority: Major
>
> We were looking at RocksDB metrics and noticed that the {{number-open-files}} 
> metric behaves like a counter, rather than a gauge. 
> Looking at the code, we think there is a small error in the type of metric 
> for that specific mbean (should be a value metric rather than a sum metric).
> See [ 
> https://github.com/apache/kafka/blob/ca5d6f9229c170beb23809159113037f05a1120f/streams/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetrics.java#L482|https://github.com/apache/kafka/blob/99b9b3e84f4e98c3f07714e1de6a139a004cbc5b/streams/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetrics.java#L482]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13599) Upgrade RocksDB to 6.27.3

2022-02-07 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13599:
--
Labels:   (was: streams)

> Upgrade RocksDB to 6.27.3
> -
>
> Key: KAFKA-13599
> URL: https://issues.apache.org/jira/browse/KAFKA-13599
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Reporter: Jonathan Albrecht
>Assignee: Jonathan Albrecht
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: compat_report.html
>
>
> RocksDB v6.27.3 has been released and it is the first release to support 
> s390x. RocksDB is currently the only dependency in gradle/dependencies.gradle 
> without s390x support.
> RocksDB v6.27.3 has added some new options that require an update to 
> streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapter.java
>  but no other changes are needed to upgrade.
> A compatibility report is attached for the current version 6.22.1.1 -> 6.27.3



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13647) RocksDb metrics 'number-open-files' is not correct

2022-02-07 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488400#comment-17488400
 ] 

Bruno Cadonna commented on KAFKA-13647:
---

[~guozhang] Nice catch!

> RocksDb metrics 'number-open-files' is not correct
> --
>
> Key: KAFKA-13647
> URL: https://issues.apache.org/jira/browse/KAFKA-13647
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.0.0
>Reporter: Sylvain Le Gouellec
>Priority: Major
> Attachments: image-2022-02-07-16-06-25-304.png, 
> image-2022-02-07-16-06-39-821.png, image-2022-02-07-16-06-53-164.png
>
>
> We were looking at RocksDB metrics and noticed that the {{number-open-files}} 
> metric behaves like a counter, rather than a gauge. 
> Looking at the code, we think there is a small error in the type of metric 
> for that specific mbean (should be a value metric rather than a sum metric).
> See [ 
> https://github.com/apache/kafka/blob/ca5d6f9229c170beb23809159113037f05a1120f/streams/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetrics.java#L482|https://github.com/apache/kafka/blob/99b9b3e84f4e98c3f07714e1de6a139a004cbc5b/streams/src/main/java/org/apache/kafka/streams/state/internals/metrics/RocksDBMetrics.java#L482]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13638) Slow KTable update when forwarding multiple values from transformer

2022-02-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489741#comment-17489741
 ] 

Bruno Cadonna commented on KAFKA-13638:
---

[~Lejon] Could you try to use another state store directory ({{state.dir}} 
config). By default that config points to {{/tmp/kafka-streams}}. Maybe the OS 
makes something weird with the temporary directory. Just an idea! 

> Slow KTable update when forwarding multiple values from transformer
> ---
>
> Key: KAFKA-13638
> URL: https://issues.apache.org/jira/browse/KAFKA-13638
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.1.0, 3.0.0
>Reporter: Ulrik
>Priority: Major
> Attachments: KafkaTest.java
>
>
> I have a topology where I stream messages from an input topic, transform the 
> message to multiple messages (via context.forward), and then store those 
> messages in a KTable.
> Since upgrading from kafka-streams 2.8.1 to 3.1.0 I have noticed that my 
> tests take significantly longer time to run. 
>  
> I have attached a test class to demonstrate my scenario. When running this 
> test with kafka-streams versions 2.8.1 and 3.1.0 I came up with the following 
> numbers:
>  
> *Version 2.8.1*
>  * one input message and one output message: 541 ms
>  * 8 input message and 30 output message per input message (240 output 
> messages in total): 919 ms
>  
> *Version 3.1.0*
>  * one input message and one output message: 908 ms
>  * 8 input message and 30 output message per input message (240 output 
> messages in total): 6 sec 94 ms
>  
> Even when the transformer just transforms and forwards one input message to 
> one output message, the test takes approx. 400 ms longer to run.
> When transforming 8 input messages to 240 output messages it takes approx 5 
> seconds longer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-8153) Streaming application with state stores takes up to 1 hour to restart

2022-02-10 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna resolved KAFKA-8153.
--
Resolution: Not A Problem

> Streaming application with state stores takes up to 1 hour to restart
> -
>
> Key: KAFKA-8153
> URL: https://issues.apache.org/jira/browse/KAFKA-8153
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.0
>Reporter: Michael Melsen
>Priority: Major
>
> We are using spring cloud stream with Kafka streams 2.0.1 and utilizing the 
> InteractiveQueryService to fetch data from the stores. There are 4 stores 
> that persist data on disk after aggregating data. The code for the topology 
> looks like this:
> {code:java}
> @Slf4j
> @EnableBinding(SensorMeasurementBinding.class)
> public class Consumer {
>   public static final String RETENTION_MS = "retention.ms";
>   public static final String CLEANUP_POLICY = "cleanup.policy";
>   @Value("${windowstore.retention.ms}")
>   private String retention;
> /**
>  * Process the data flowing in from a Kafka topic. Aggregate the data to:
>  * - 2 minute
>  * - 15 minutes
>  * - one hour
>  * - 12 hours
>  *
>  * @param stream
>  */
> @StreamListener(SensorMeasurementBinding.ERROR_SCORE_IN)
> public void process(KStream stream) {
> Map topicConfig = new HashMap<>();
> topicConfig.put(RETENTION_MS, retention);
> topicConfig.put(CLEANUP_POLICY, "delete");
> log.info("Changelog and local window store retention.ms: {} and 
> cleanup.policy: {}",
> topicConfig.get(RETENTION_MS),
> topicConfig.get(CLEANUP_POLICY));
> createWindowStore(LocalStore.TWO_MINUTES_STORE, topicConfig, stream);
> createWindowStore(LocalStore.FIFTEEN_MINUTES_STORE, topicConfig, stream);
> createWindowStore(LocalStore.ONE_HOUR_STORE, topicConfig, stream);
> createWindowStore(LocalStore.TWELVE_HOURS_STORE, topicConfig, stream);
>   }
>   private void createWindowStore(
> LocalStore localStore,
> Map topicConfig,
> KStream stream) {
> // Configure how the statestore should be materialized using the provide 
> storeName
> Materialized> materialized 
> = Materialized
> .as(localStore.getStoreName());
> // Set retention of changelog topic
> materialized.withLoggingEnabled(topicConfig);
> // Configure how windows looks like and how long data will be retained in 
> local stores
> TimeWindows configuredTimeWindows = getConfiguredTimeWindows(
> localStore.getTimeUnit(), 
> Long.parseLong(topicConfig.get(RETENTION_MS)));
> // Processing description:
> // The input data are 'samples' with key 
> :::
> // 1. With the map we add the Tag to the key and we extract the error 
> score from the data
> // 2. With the groupByKey we group  the data on the new key
> // 3. With windowedBy we split up the data in time intervals depending on 
> the provided LocalStore enum
> // 4. With reduce we determine the maximum value in the time window
> // 5. Materialized will make it stored in a table
> stream
> .map(getInstallationAssetModelAlgorithmTagKeyMapper())
> .groupByKey()
> .windowedBy(configuredTimeWindows)
> .reduce((aggValue, newValue) -> getMaxErrorScore(aggValue, 
> newValue), materialized);
>   }
>   private TimeWindows getConfiguredTimeWindows(long windowSizeMs, long 
> retentionMs) {
> TimeWindows timeWindows = TimeWindows.of(windowSizeMs);
> timeWindows.until(retentionMs);
> return timeWindows;
>   }
>   /**
>* Determine the max error score to keep by looking at the aggregated error 
> signal and
>* freshly consumed error signal
>*
>* @param aggValue
>* @param newValue
>* @return
>*/
>   private ErrorScore getMaxErrorScore(ErrorScore aggValue, ErrorScore 
> newValue) {
> if(aggValue.getErrorSignal() > newValue.getErrorSignal()) {
> return aggValue;
> }
> return newValue;
>   }
>   private KeyValueMapper KeyValue> 
> getInstallationAssetModelAlgorithmTagKeyMapper() {
> return (s, sensorMeasurement) -> new KeyValue<>(s + "::" + 
> sensorMeasurement.getT(),
> new ErrorScore(sensorMeasurement.getTs(), 
> sensorMeasurement.getE(), sensorMeasurement.getO()));
>   }
> }
> {code}
> So we are materializing aggregated data to four different stores after 
> determining the max value within a specific window for a specific key. Please 
> note that retention which is set to two months of data and the clean up 
> policy delete. We don't compact data.
> The size of the individual state stores on disk is between 14 to 20 gb of 
> data.
> We are making use of Interactive Queries: 
> [https://docs.confluent.io/current/streams/developer-guide/in

[jira] [Updated] (KAFKA-13435) Group won't consume partitions added after static member restart

2022-02-11 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13435:
--
Fix Version/s: 3.2.0

> Group won't consume partitions added after static member restart
> 
>
> Key: KAFKA-13435
> URL: https://issues.apache.org/jira/browse/KAFKA-13435
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 3.0.0
>Reporter: Ryan Leslie
>Assignee: David Jacot
>Priority: Critical
>  Labels: new-rebalance-should-fix
> Fix For: 3.2.0
>
>
> When using consumer groups with static membership, if the consumer marked as 
> leader has restarted, then metadata changes such as partition increase are 
> not triggering expected rebalances.
> To reproduce this issue, simply:
>  # Create a static consumer subscribed to a single topic
>  # Close the consumer and create a new one with the same group instance id
>  # Increase partitions for the topic
>  # Observe that no rebalance occurs and the new partitions are not assigned
> I have only tested this in 2.7, but it may apply to newer versions as well.
> h3. Analysis
> In {_}ConsumerCoordinator{_}, one responsibility of the leader consumer is to 
> track metadata and trigger a rebalance if there are changes such as new 
> partitions added:
> [https://github.com/apache/kafka/blob/43bcc5682da82a602a4c0a000dc7433d0507b450/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L793]
> {code:java}
> if (assignmentSnapshot != null && 
> !assignmentSnapshot.matches(metadataSnapshot)) {
> ...
> requestRejoinIfNecessary(reason);
> return true;
> }
> {code}
> Note that _assignmentSnapshot_ is currently only set if the consumer is the 
> leader:
> [https://github.com/apache/kafka/blob/43bcc5682da82a602a4c0a000dc7433d0507b450/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L353]
> {code:java}
> // Only the leader is responsible for monitoring for metadata changes (i.e. 
> partition changes)
> if (!isLeader)
> assignmentSnapshot = null;
> {code}
> And _isLeader_ is only true after an assignment is performed during a 
> rebalance:
> [https://github.com/apache/kafka/blob/43bcc5682da82a602a4c0a000dc7433d0507b450/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L634]
> That is, when a consumer group forms, exactly one consumer in the group 
> should have _isLeader == True_ and be responsible for triggering rebalances 
> on metadata changes.
> However, in the case of static membership, if the leader has been restarted 
> and rejoined the group, the group essentially no longer has a current leader. 
> Even though the metadata changes are fetched, no rebalance will be triggered. 
> That is, _isLeader_ will be false for all members.
> This issue does not resolve until after an actual group change that causes a 
> proper rebalance. In order to safely make a partition increase when using 
> static membership, consumers must be stopped and have timed out, or forcibly 
> removed with {_}AdminClient.removeMembersFromConsumerGroup(){_}.
> Correcting this in the client probably also requires help from the broker. 
> Currently, when a static consumer that is leader is restarted, the 
> coordinator does recognize the change:
> e.g. leader _bbfcb930-61a3-4d21-945c-85f4576490ff_ was restarted
> {noformat}
> [2021-11-04 13:53:13,487] INFO [GroupCoordinator 4]: Static member 
> Some(1GK7DRJPHZ0LRV91Y4D3SYHS5928XHXJQ6263GT26V5P70QX0) of group ryan_test 
> with unknown member id rejoins, assigning new member id 
> 1GK7DRJPHZ0LRV91Y4D3SYHS5928XHXJQ6263GT26V5P70QX0-af88ecf2-
> 6ebf-47da-95ef-c54fef17ab74, while old member id 
> 1GK7DRJPHZ0LRV91Y4D3SYHS5928XHXJQ6263GT26V5P70QX0-bbfcb930-61a3-4d21-945c-85f4576490ff
>  will be removed. (
> kafka.coordinator.group.GroupCoordinator){noformat}
> However, it does not attempt to update the leader id since this isn't a new 
> rebalance, and JOIN_GROUP will continue returning the now stale member id as 
> leader:
> {noformat}
> 2021-11-04 13:53:13,490 DEBUG o.a.k.c.c.i.AbstractCoordinator [Consumer 
> instanceId=1GK7DRJPHZ0LRV91Y4D3SYHS5928XHXJQ6263GT26V5P70QX0, 
> clientId=1GK7DRJPHZ0LRV91Y4D3SYHS5928XHXJQ6263GT26V5P70QX0, 
> groupId=ryan_test] Received successful JoinGroup response: 
> JoinGroupResponseData(throttleTimeMs=0, errorCode=0, generationId=40, 
> protocolType='consumer', protocolName='range', 
> leader='1GK7DRJPHZ0LRV91Y4D3SYHS5928XHXJQ6263GT26V5P70QX0-bbfcb930-61a3-4d21-945c-85f4576490ff',
>  
> memberId='1GK7DRJPHZ0LRV91Y4D3SYHS5928XHXJQ6263GT26V5P70QX0-af88ecf2-6ebf-47da-95ef-c54fef17ab74',
>  members=[]){noformat}
> This means that it's not easy for any particular restarted member to identify 

[jira] [Updated] (KAFKA-13549) Add "delete interval" config

2022-02-11 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13549:
--
Fix Version/s: 3.2.0

> Add "delete interval" config
> 
>
> Key: KAFKA-13549
> URL: https://issues.apache.org/jira/browse/KAFKA-13549
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Nicholas Telford
>Priority: Major
>  Labels: kip
> Fix For: 3.2.0
>
>
> KIP-811: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-811%3A+Add+config+min.repartition.purge.interval.ms+to+Kafka+Streams]
> Kafka Streams uses "delete record" requests to aggressively purge data from 
> repartition topics. Those request are sent each time we commit.
> For at-least-once with a default commit interval of 30 seconds, this works 
> fine. However, for exactly-once with a default commit interval of 100ms, it's 
> very aggressive. The main issue is broker side, because the broker logs every 
> "delete record" request, and thus broker logs are spammed if EOS is enabled.
> We should consider to add a new config (eg `delete.record.interval.ms` or 
> similar) to have a dedicated config for "delete record" requests, to decouple 
> it from the commit interval config and allow to purge data less aggressively, 
> even if the commit interval is small to avoid the broker side log spamming.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13494) Session and Window Queries for IQv2

2022-02-11 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13494:
--
Fix Version/s: 3.2.0

> Session and Window Queries for IQv2
> ---
>
> Key: KAFKA-13494
> URL: https://issues.apache.org/jira/browse/KAFKA-13494
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Patrick Stuedi
>Assignee: Patrick Stuedi
>Priority: Minor
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13479) Interactive Query v2

2022-02-11 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13479:
--
Fix Version/s: 3.2.0

> Interactive Query v2
> 
>
> Key: KAFKA-13479
> URL: https://issues.apache.org/jira/browse/KAFKA-13479
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Major
> Fix For: 3.2.0
>
>
> Kafka Streams supports an interesting and innovative API for "peeking" into 
> the internal state of running stateful stream processors from outside of the 
> application, called Interactive Query (IQ). This functionality has proven 
> invaluable to users over the years for everything from debugging running 
> applications to serving low latency queries straight from the Streams runtime.
> However, the actual interfaces for IQ were designed in the very early days of 
> Kafka Streams, before the project had gained significant adoption, and in the 
> absence of much precedent for this kind of API in peer projects. With the 
> benefit of hindsight, we can observe several problems with the original 
> design that we hope to address in a revised framework that will serve Streams 
> users well for many years to come.
>  
> This ticket tracks the implementation of KIP-796: 
> https://cwiki.apache.org/confluence/x/34xnCw



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13410) KRaft Upgrades

2022-02-11 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13410:
--
Fix Version/s: 3.2.0

> KRaft Upgrades
> --
>
> Key: KAFKA-13410
> URL: https://issues.apache.org/jira/browse/KAFKA-13410
> Project: Kafka
>  Issue Type: New Feature
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Major
> Fix For: 3.2.0
>
>
> This is the placeholder JIRA for KIP-778



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-12473) Make the "cooperative-sticky, range" as the default assignor

2022-02-11 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-12473:
--
Fix Version/s: 3.2.0

> Make the "cooperative-sticky, range" as the default assignor
> 
>
> Key: KAFKA-12473
> URL: https://issues.apache.org/jira/browse/KAFKA-12473
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: A. Sophie Blee-Goldman
>Assignee: Luke Chen
>Priority: Critical
>  Labels: kip
> Fix For: 3.2.0
>
>
> Now that 3.0 is coming up, we can change the default 
> ConsumerPartitionAssignor to something better than the RangeAssignor. The 
> original plan was to switch over to the StickyAssignor, but now that we have 
> incremental cooperative rebalancing we should  consider using the new 
> CooperativeStickyAssignor instead: this will enable the consumer group to 
> follow the COOPERATIVE protocol, improving the rebalancing experience OOTB.
> KIP: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177048248



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-12399) Deprecate Log4J Appender

2022-02-11 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-12399:
--
Fix Version/s: 3.2.0

> Deprecate Log4J Appender
> 
>
> Key: KAFKA-12399
> URL: https://issues.apache.org/jira/browse/KAFKA-12399
> Project: Kafka
>  Issue Type: Improvement
>  Components: logging
>Reporter: Dongjin Lee
>Assignee: Dongjin Lee
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.2.0
>
>
> As a following job of KAFKA-9366, we have to entirely remove the log4j 1.2.7 
> dependency from the classpath by removing dependencies on log4j-appender.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-10733) Enforce exception thrown for KafkaProducer txn APIs

2022-02-11 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10733:
--
Fix Version/s: (was: 3.2.0)

> Enforce exception thrown for KafkaProducer txn APIs
> ---
>
> Key: KAFKA-10733
> URL: https://issues.apache.org/jira/browse/KAFKA-10733
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer , streams
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: need-kip
>
> In general, KafkaProducer could throw both fatal and non-fatal errors as 
> KafkaException, which makes the exception catching hard. Furthermore, not 
> every single fatal exception (checked) is marked on the function signature 
> yet as of 2.7.
> We should have a general supporting strategy in long term for this matter, as 
> whether to declare all non-fatal exceptions as wrapped KafkaException while 
> extracting all fatal ones, or just add a flag to KafkaException indicating 
> fatal or not, to maintain binary compatibility.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-8872) Improvements to controller "deleting" state / topic Identifiers

2022-02-11 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490884#comment-17490884
 ] 

Bruno Cadonna commented on KAFKA-8872:
--

Hi [~jolshan], [~lucasbradstreet], and [~dengziming],

Is this work planned to be released with 3.2.0?

> Improvements to controller "deleting" state /  topic Identifiers
> 
>
> Key: KAFKA-8872
> URL: https://issues.apache.org/jira/browse/KAFKA-8872
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Lucas Bradstreet
>Assignee: Justine Olshan
>Priority: Major
>
> Kafka currently uniquely identifies a topic by its name. This is generally 
> sufficient, but there are flaws in this scheme if a topic is deleted and 
> recreated with the same name. As a result, Kafka attempts to prevent these 
> classes of issues by ensuring a topic is deleted from all replicas before 
> completing a deletion. This solution is not perfect, as it is possible for 
> partitions to be reassigned from brokers while they are down, and there are 
> no guarantees that this state will ever be cleaned up and will not cause 
> issues in the future.
> As the controller must wait for all replicas to delete their local 
> partitions, deletes can also become blocked, preventing topics from being 
> created with the same name until the deletion is complete on all replicas. 
> This can mean that downtime for a single broker can effectively cause a 
> complete outage for everyone producing/consuming to that topic name, as the 
> topic cannot be recreated without manual intervention.
> Unique topic IDs could help address this issue by associating a unique ID 
> with each topic, ensuring a newly created topic with a previously used name 
> cannot be confused with a previous topic with that name.
>  
> KIP-516: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-10357) Handle accidental deletion of repartition-topics as exceptional failure

2022-02-11 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-10357:
--
Fix Version/s: (was: 3.2.0)

> Handle accidental deletion of repartition-topics as exceptional failure
> ---
>
> Key: KAFKA-10357
> URL: https://issues.apache.org/jira/browse/KAFKA-10357
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bruno Cadonna
>Priority: Major
>  Labels: needs-kip
>
> Repartition topics are both written by Stream's producer and read by Stream's 
> consumer, so when they are accidentally deleted both clients may be notified. 
> But in practice the consumer would react to it much quicker than producer 
> since the latter has a delivery timeout expiration period (see 
> https://issues.apache.org/jira/browse/KAFKA-10356). When consumer reacts to 
> it, it will re-join the group since metadata changed and during the triggered 
> rebalance it would auto-recreate the topic silently and continue, causing 
> data lost silently. 
> One idea, is to only create all repartition topics *once* in the first 
> rebalance and not auto-create them any more in future rebalances, instead it 
> would be treated similar as INCOMPLETE_SOURCE_TOPIC_METADATA error code 
> (https://issues.apache.org/jira/browse/KAFKA-10355).
> The challenge part would be, how to determine if it is the first-ever 
> rebalance, and there are several wild ideas I'd like to throw out here:
> 1) change the thread state transition diagram so that STARTING state would 
> not transit to PARTITION_REVOKED but only to PARTITION_ASSIGNED, then in the 
> assign function we can check if the state is still in CREATED and not RUNNING.
> 2) augment the subscriptionInfo to encode whether or not this is the first 
> time ever rebalance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13666) Tests should not ignore exceptions for supported OS

2022-02-14 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-13666:
--
Component/s: streams

> Tests should not ignore exceptions for supported OS
> ---
>
> Key: KAFKA-13666
> URL: https://issues.apache.org/jira/browse/KAFKA-13666
> Project: Kafka
>  Issue Type: Test
>  Components: streams
>Affects Versions: 3.0.0
>Reporter: Rob Leland
>Priority: Minor
>
> A few of the tests are swallowing exceptions for all operations systems 
> because they might fail on windows. This could mask regressions in supported 
> OS. When using the testDrivers change this so exceptions are only ignored for 
> Windows OS.
> Please see: https://github.com/apache/kafka/pull/11752



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13666) Tests should not ignore exceptions for supported OS

2022-02-14 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17491938#comment-17491938
 ] 

Bruno Cadonna commented on KAFKA-13666:
---

[~rleland] Thank you for the ticket and the PR!

>From the description it is not clear to me, if the swallowing of the 
>exceptions is still needed. The commit that swallows dates before the fix for 
>[KAFKA-6647|https://issues.apache.org/jira/browse/KAFKA-6647]. So, it could be 
>that the swallowing is not needed anymore and we can remove it. Unfortunately, 
>I do not have a window machine to test it. Do you by any chance have one? 

> Tests should not ignore exceptions for supported OS
> ---
>
> Key: KAFKA-13666
> URL: https://issues.apache.org/jira/browse/KAFKA-13666
> Project: Kafka
>  Issue Type: Test
>  Components: streams
>Affects Versions: 3.0.0
>Reporter: Rob Leland
>Priority: Minor
>
> A few of the tests are swallowing exceptions for all operations systems 
> because they might fail on windows. This could mask regressions in supported 
> OS. When using the testDrivers change this so exceptions are only ignored for 
> Windows OS.
> Please see: https://github.com/apache/kafka/pull/11752



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-6823) Transient failure in DynamicBrokerReconfigurationTest.testThreadPoolResize

2022-02-14 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17491997#comment-17491997
 ] 

Bruno Cadonna commented on KAFKA-6823:
--

I am going to close this ticket since it seems a duplicate of KAFKA-7988 which 
is resolved.

> Transient failure in DynamicBrokerReconfigurationTest.testThreadPoolResize
> --
>
> Key: KAFKA-6823
> URL: https://issues.apache.org/jira/browse/KAFKA-6823
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Anna Povzner
>Priority: Major
>
> Saw in my PR build (*DK 10 and Scala 2.12 ):*
> *15:58:46* kafka.server.DynamicBrokerReconfigurationTest > 
> testThreadPoolResize FAILED
> *15:58:46*     java.lang.AssertionError: Invalid threads: expected 6, got 7: 
> List(ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-1, 
> ReplicaFetcherThread-0-2, ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-2, 
> ReplicaFetcherThread-1-0, ReplicaFetcherThread-0-1)
> *15:58:46*         at org.junit.Assert.fail(Assert.java:88)
> *15:58:46*         at org.junit.Assert.assertTrue(Assert.java:41)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.verifyThreads(DynamicBrokerReconfigurationTest.scala:1147)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.maybeVerifyThreadPoolSize$1(DynamicBrokerReconfigurationTest.scala:412)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.resizeThreadPool$1(DynamicBrokerReconfigurationTest.scala:431)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.reducePoolSize$1(DynamicBrokerReconfigurationTest.scala:417)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.$anonfun$testThreadPoolResize$3(DynamicBrokerReconfigurationTest.scala:440)
> *15:58:46*         at 
> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:156)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.verifyThreadPoolResize$1(DynamicBrokerReconfigurationTest.scala:439)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize(DynamicBrokerReconfigurationTest.scala:453)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


<    1   2   3   4   5   6   7   8   9   10   >