date:20220615

[jira] [Updated] (KAFKA-14000) Kafka-connect standby server shows empty tasks list

2022-06-15 Thread Xinyu Zou (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Zou updated KAFKA-14000:
--
Description: 
I'm using Kafka-connect distributed mode. There're two servers. One active and 
one standby. The standby server sometimes shows empty tasks list in status rest 
API response.

curl host:8443/connectors/name1/status
{code:java}
{
    "connector": {
        "state": "RUNNING",
        "worker_id": "1.2.3.4:10443"
    },
    "name": "name1",
    "tasks": [],
    "type": "source"
} {code}
I enabled TRACE log and checked. As required, the connect-status topic is set 
to cleanup.policy=compact. But messages in the topic won't be compacted timely. 
They will be compacted in a specific interval. So usually there're more than 
one messages with same key. E.g. When kafka-connect is launched there's no 
connector running. And then we start a new connector. Then there will be two 
messages in connect-status topic:

status-task-name1 : state=RUNNING, workerId='10.251.170.166:10443', 
generation=100

status-task-name1 : __

Please check the log file [^kafka-connect-trace.log]. We can see that the tasks 
status was removed finally. But actually the empty status was not the newest 
message in topic connect-status.

 

When reading status from connect-status topic, it doesn't sort messages by 
generation.

[https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java]

So I think this could be improved. We can either sort the messages after poll 
or compare generation value before we choose correct status message.

  was:
I'm using Kafka-connect distributed mode. There're two servers. One active and 
one standby. The standby server sometimes shows empty tasks list in status rest 
API response.

curl host:8443/connectors/name1/status
{code:java}
{
    "connector": {
        "state": "RUNNING",
        "worker_id": "1.2.3.4:10443"
    },
    "name": "name1",
    "tasks": [],
    "type": "source"
} {code}
I enabled TRACE log and checked. As required, the connect-status topic is set 
to cleanup.policy=compact. But messages in the topic won't be compacted timely. 
They will be compacted in a specific interval. So usually there're more than 
one messages with same key. E.g. When kafka-connect is launched there's no 
connector running. And then we start a new connector. Then there will be two 
messages in connect-status topic:

status-task-name1 : state=RUNNING, workerId='10.251.170.166:10443', 
generation=100

status-task-name1 : __

 

When reading status from connect-status topic, it doesn't sort messages by 
generation.

[https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java]

So I think this could be improved. We can either sort the messages after poll 
or compare generation value before we choose correct status message.


> Kafka-connect standby server shows empty tasks list
> ---
>
> Key: KAFKA-14000
> URL: https://issues.apache.org/jira/browse/KAFKA-14000
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.6.0
>Reporter: Xinyu Zou
>Priority: Major
> Attachments: kafka-connect-trace.log
>
>
> I'm using Kafka-connect distributed mode. There're two servers. One active 
> and one standby. The standby server sometimes shows empty tasks list in 
> status rest API response.
> curl host:8443/connectors/name1/status
> {code:java}
> {
>     "connector": {
>         "state": "RUNNING",
>         "worker_id": "1.2.3.4:10443"
>     },
>     "name": "name1",
>     "tasks": [],
>     "type": "source"
> } {code}
> I enabled TRACE log and checked. As required, the connect-status topic is set 
> to cleanup.policy=compact. But messages in the topic won't be compacted 
> timely. They will be compacted in a specific interval. So usually there're 
> more than one messages with same key. E.g. When kafka-connect is launched 
> there's no connector running. And then we start a new connector. Then there 
> will be two messages in connect-status topic:
> status-task-name1 : state=RUNNING, workerId='10.251.170.166:10443', 
> generation=100
> status-task-name1 : __
> Please check the log file [^kafka-connect-trace.log]. We can see that the 
> tasks status was removed finally. But actually the empty status was not the 
> newest message in topic connect-status.
>  
> When reading status from connect-status topic, it doesn't sort messages by 
> generation.
> [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java]
> So I think this could be improved. We can either sort the messages after poll 
> or compare generation value before we choose correct status message.



--
This message was

[jira] [Updated] (KAFKA-14000) Kafka-connect standby server shows empty tasks list

2022-06-15 Thread Xinyu Zou (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Zou updated KAFKA-14000:
--
Attachment: kafka-connect-trace.log

> Kafka-connect standby server shows empty tasks list
> ---
>
> Key: KAFKA-14000
> URL: https://issues.apache.org/jira/browse/KAFKA-14000
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.6.0
>Reporter: Xinyu Zou
>Priority: Major
> Attachments: kafka-connect-trace.log
>
>
> I'm using Kafka-connect distributed mode. There're two servers. One active 
> and one standby. The standby server sometimes shows empty tasks list in 
> status rest API response.
> curl host:8443/connectors/name1/status
> {code:java}
> {
>     "connector": {
>         "state": "RUNNING",
>         "worker_id": "1.2.3.4:10443"
>     },
>     "name": "name1",
>     "tasks": [],
>     "type": "source"
> } {code}
> I enabled TRACE log and checked. As required, the connect-status topic is set 
> to cleanup.policy=compact. But messages in the topic won't be compacted 
> timely. They will be compacted in a specific interval. So usually there're 
> more than one messages with same key. E.g. When kafka-connect is launched 
> there's no connector running. And then we start a new connector. Then there 
> will be two messages in connect-status topic:
> status-task-name1 : state=RUNNING, workerId='10.251.170.166:10443', 
> generation=100
> status-task-name1 : __
>  
> When reading status from connect-status topic, it doesn't sort messages by 
> generation.
> [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java]
> So I think this could be improved. We can either sort the messages after poll 
> or compare generation value before we choose correct status message.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (KAFKA-14000) Kafka-connect standby server shows empty tasks list

2022-06-15 Thread Xinyu Zou (Jira)

Xinyu Zou created KAFKA-14000:
-

Summary: Kafka-connect standby server shows empty tasks list
Key: KAFKA-14000
URL: https://issues.apache.org/jira/browse/KAFKA-14000
Project: Kafka
Issue Type: Bug
Components: KafkaConnect
Affects Versions: 2.6.0
Reporter: Xinyu Zou

I'm using Kafka-connect distributed mode. There're two servers. One active and
one standby. The standby server sometimes shows empty tasks list in status rest
API response.

curl host:8443/connectors/name1/status
{code:java}
{
"connector": {
"state": "RUNNING",
"worker_id": "1.2.3.4:10443"
},
"name": "name1",
"tasks": [],
"type": "source"
} {code}
I enabled TRACE log and checked. As required, the connect-status topic is set
to cleanup.policy=compact. But messages in the topic won't be compacted timely.
They will be compacted in a specific interval. So usually there're more than
one messages with same key. E.g. When kafka-connect is launched there's no
connector running. And then we start a new connector. Then there will be two
messages in connect-status topic:

status-task-name1 : state=RUNNING, workerId='10.251.170.166:10443',
generation=100

status-task-name1 : __

When reading status from connect-status topic, it doesn't sort messages by
generation.

[https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java]

So I think this could be improved. We can either sort the messages after poll
or compare generation value before we choose correct status message.

--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (KAFKA-12478) Consumer group may lose data for newly expanded partitions when add partitions for topic if the group is set to consume from the latest

2022-06-15 Thread hudeqi (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554884#comment-17554884
 ] 

hudeqi commented on KAFKA-12478:


Hello, Guozhang. I have started a vote on KIP-842 for this issue. Does the 
status of this issue also need to be changed synchronously? In addition, please 
check and vote on this vote, thank you. cc @ [~showuon] 

> Consumer group may lose data for newly expanded partitions when add 
> partitions for topic if the group is set to consume from the latest
> ---
>
> Key: KAFKA-12478
> URL: https://issues.apache.org/jira/browse/KAFKA-12478
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 3.1.1
>Reporter: hudeqi
>Priority: Blocker
>  Labels: patch
> Attachments: safe-console-consumer.png, safe-consume.png, 
> safe-produce.png, trunk-console-consumer.png, trunk-consume.png, 
> trunk-produce.png
>
>   Original Estimate: 1,158h
>  Remaining Estimate: 1,158h
>
>   This problem is exposed in our product environment: a topic is used to 
> produce monitoring data. *After expanding partitions, the consumer side of 
> the business reported that the data is lost.*
>   After preliminary investigation, the lost data is all concentrated in the 
> newly expanded partitions. The reason is: when the server expands, the 
> producer firstly perceives the expansion, and some data is written in the 
> newly expanded partitions. But the consumer group perceives the expansion 
> later, after the rebalance is completed, the newly expanded partitions will 
> be consumed from the latest if it is set to consume from the latest. Within a 
> period of time, the data of the newly expanded partitions is skipped and lost 
> by the consumer.
>   If it is not necessarily set to consume from the earliest for a huge data 
> flow topic when starts up, this will make the group consume historical data 
> from the broker crazily, which will affect the performance of brokers to a 
> certain extent. Therefore, *it is necessary to consume these partitions from 
> the earliest separately.*
>  
> I did a test and the result is as attached screenshot. Firstly, set by 
> producer and consumer "metadata.max.age.ms" are 500ms and 3ms 
> respectively.
> _trunk-console-consumer.png_ means to use the community version to start the 
> consumer and set "latest". 
> _trunk-produce.png_ means the data produced, "partition_count" means the 
> number of partitions of the current topic, "message" means the digital 
> content of the corresponding message, "send_to_partition_index" Indicates the 
> index of the partition to which the corresponding message is sent. It can be 
> seen that at 11:32:10, the producer perceives the expansion of the total 
> partitions from 2 to 3, and writes the numbers 38, 41, and 44 into the newly 
> expanded partition 2.
> _trunk-consume.png_ represents all the digital content consumed by the 
> community version. You can see that 38 and 41 sent to partition 2 were not 
> consumed at the beginning. Finally, after partition 2 was perceived, 38 and 
> 41 were still not consumed. Instead, it has been consumed from the latest 44, 
> so the two data of 38 and 41 are discarded.
>  
> _safe-console-consumer.png_ means to use the fixed version to start the 
> consumer and set "safe_latest". 
> _safe-produce.png_ means the data produced. It can be seen that at 12:12:09, 
> the producer perceives the expansion of the total partitions from 4 to 5, and 
> writes the numbers 109 and 114 into the newly expanded partition 4.
> _safe-consume.png_ represents all the digital content consumed by the fixed 
> version. You can see that 109 sent to partition 4 were not consumed at the 
> beginning. Finally, after partition 4 was perceived,109 was consumed as the 
> first data of partition 4. So the fixed version will not cause consumption to 
> lose data under this condition.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

98 matches

Mail list logo