[ 
https://issues.apache.org/jira/browse/KAFKA-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161519#comment-17161519
 ] 

John Roesler edited comment on KAFKA-10274 at 7/20/20, 8:36 PM:
----------------------------------------------------------------

Hi [~hachikuji] ,

I'm trying to get a green system test build for the 2.5.1 release, and this 
test seems to be failing quite a bit in the last few days.

I see that you already fixed the test back in May in 
https://issues.apache.org/jira/browse/KAFKA-9802 for 2.5.1, and that you 
theorized that https://issues.apache.org/jira/browse/KAFKA-10235 may have 
re-introduced the test failure.

It doesn't look like KAFKA-10235 was backported to 2.5. Maybe it should have 
been, but then again, your last comment makes me think that we still need your 
current fix on top of it.

What do you think we should do here? Backport KAFKA-10235 and then the PR for 
this ticket once it's merged?

Thanks,

-John

 

PS, the results I looked at:

[http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-18--001.1595065230--confluentinc--2.5]–21e17cd14/report.html

[http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-19--001.1595151548--confluentinc--2.5]–21e17cd14/report.html

[http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-20--001.1595238538--confluentinc--2.5]–21e17cd14/report.html

from [https://jenkins.confluent.io/job/system-test-kafka/job/2.5/]


was (Author: vvcephei):
Hi [~hachikuji] ,

I'm trying to get a green system test build for the 2.5.1 release, and this 
test seems to be failing quite a bit in the last few days.

I see that you already fixed the test back in May in 
https://issues.apache.org/jira/browse/KAFKA-9802 for 2.5.1, and that you 
theorized that https://issues.apache.org/jira/browse/KAFKA-10235 may have 
re-introduced the test failure.

It doesn't look like KAFKA-10235 was backported to 2.5. Maybe it should have 
been, but then again, your last comment makes me think that we still need 
another fix on top of it.

What do you think we should do here?

Thanks,

-John

 

PS, the results I looked at:

http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-18--001.1595065230--confluentinc--2.5–21e17cd14/report.html

http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-19--001.1595151548--confluentinc--2.5–21e17cd14/report.html

http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-20--001.1595238538--confluentinc--2.5–21e17cd14/report.html

from https://jenkins.confluent.io/job/system-test-kafka/job/2.5/

> Transaction system test uses inconsistent timeouts
> --------------------------------------------------
>
>                 Key: KAFKA-10274
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10274
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Major
>
> We've seen some failures in the transaction system test with errors like the 
> following:
> {code}
> copier-1 : Message copier didn't make enough progress in 30s. Current 
> progress: 0
> {code}
> Looking at the consumer logs, we see the following messages repeating over 
> and over:
> {code}
> [2020-07-14 06:50:21,466] DEBUG [Consumer 
> clientId=consumer-transactions-test-consumer-group-1, 
> groupId=transactions-test-consumer-group] Fetching committed offsets for 
> partitions: [input-topic-1] 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-07-14 06:50:21,468] DEBUG [Consumer 
> clientId=consumer-transactions-test-consumer-group-1, 
> groupId=transactions-test-consumer-group] Failed to fetch offset for 
> partition input-topic-1: There are unstable offsets that need to be cleared. 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> {code}
> I think the problem is that the test implicitly depends on the transaction 
> timeout which has been configured to 40s even though it expects progress 
> after 30s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to