[jira] [Updated] (KAFKA-4616) Message loss is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle
[ https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeep kumar singh updated KAFKA-4616: --- Summary: Message loss is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle (was: Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between ) > Message loss is seen when kafka-producer-perf-test.sh is running and any > broker restarted in middle > --- > > Key: KAFKA-4616 > URL: https://issues.apache.org/jira/browse/KAFKA-4616 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.0.0 > Environment: Apache mesos >Reporter: sandeep kumar singh > > if any broker is restarted while kafka-producer-perf-test.sh command is > running, we see message loss. > commands i run: > **perf command: > $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096 > --throughput 1000 --topic test3R3P3 --producer-props > bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x: > I am sending 10 messages of each having size 4096 > error thrown by perf command: > 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, > 433.0 max latency. > 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, > 798.0 max latency. > 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, > 503.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, > 594.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, > 501.0 max latency. > 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, > 516.0 max latency. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > truncated > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, > 497.0 max latency. > 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, > 521.0 max latency. > 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, > 418.0 max latency. > 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg > latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms > 99.9th. > **consumer command: > $ bin/kafka-console-consumer.sh --zookeeper > x.x.x.x:2181/dcos-service-kafka-framework --topic test3R3P3 > 1>~/kafka_output.log > message stored: > $ wc -l ~/kafka_output.log > 99932 /home/montana/kafka_output.log > I found only 99932 message are stored and 68 messages are lost. > **topic describe command: > $ bin/kafka-topics.sh --zookeeper x.x.x.x:2181/dcos-service-kafka-framework > --describe |grep test3R3 > Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs: > Topic: test3R3P3Partition: 0Leader: 2 Replicas: > 1,2,0 Isr: 2,0,1 > Topic: test3R3P3Partition: 1Leader: 2 Replicas: > 2,0,1 Isr: 2,0,1 > Topic: test3R3P3Partition: 2Leader: 0 Replicas: > 0,1,2 Isr: 2,0,1 > **consumer group command: > $ bin/kafka-consumer-groups.sh --zookeeper > x.x.x.x:2181/dcos-service-kafka-framework --describe --group > console-consumer-9926 > GROUP TOPIC PARTITION > CURRENT-OFFSET LOG-END-OFFSET LAG OWNER > console-consumer-9926 test3R3P3 0 > 33265 33265 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > console-consumer-9926 test3R3P3 1 > 4 4 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > console-consumer-9926 test3R3P3 2 > 3 3 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > could you please help me understand what this error means "err - > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received."? > Could you please provide suggestion to fix this issue? > we are seeing this behavior every-time we perform above test-scenario. > my understanding is, there should not any data loss till n-1 broker is alive. > is message loss is an expected behavior in the above case? > thanks > Sandeep -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4616) Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between
[ https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821003#comment-15821003 ] sandeep kumar singh commented on KAFKA-4616: thanks for reply. i applied acks=-1 option, but still see message loss. command i ran: $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096 --throughput 5000 --topic test2R3P3 --producer-props bootstrap.servers=localhost:9092,localhost:9093,localhost:9094 acks=-1 8890 records sent, 1777.3 records/sec (6.94 MB/sec), 2039.2 ms avg latency, 3282.0 max latency. 12342 records sent, 2468.4 records/sec (9.64 MB/sec), 2648.8 ms avg latency, 3448.0 max latency. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. truncated org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. ...truncated 10 records sent, 3716.504999 records/sec (14.52 MB/sec), 1565.19 ms avg latency, 3634.00 ms max latency, 1470 ms 50th, 3205 ms 95th, 3357 ms 99th, 3502 ms 99.9th. $ bin/kafka-consumer-groups.sh --zookeeper 127.0.0.1:2181 --describe --group console-consumer-96681 GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER console-consumer-96681 test2R3P3 0 3 3 0 console-consumer-96681_localhost.localdomain-1482869188877-44ac0d84-0 console-consumer-96681 test2R3P3 1 33271 33271 0 console-consumer-96681_localhost.localdomain-1482869188877-44ac0d84-0 console-consumer-96681 test2R3P3 2 3 3 0 console-consumer-96681_localhost.localdomain-1482869188877-44ac0d84-0 i send 10 messages but could see only 99937 messages get stored. > Message log is seen when kafka-producer-perf-test.sh is running and any > broker restarted in middle in-between > -- > > Key: KAFKA-4616 > URL: https://issues.apache.org/jira/browse/KAFKA-4616 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.0.0 > Environment: Apache mesos >Reporter: sandeep kumar singh > > if any broker is restarted while kafka-producer-perf-test.sh command is > running, we see message loss. > commands i run: > **perf command: > $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096 > --throughput 1000 --topic test3R3P3 --producer-props > bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x: > I am sending 10 messages of each having size 4096 > error thrown by perf command: > 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, > 433.0 max latency. > 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, > 798.0 max latency. > 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, > 503.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, > 594.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, > 501.0 max latency. > 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, > 516.0 max latency. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > truncated > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, > 497.0 max latency. > 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, > 521.0 max latency. > 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, > 418.0 max latency. > 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg > latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms > 99.9th. > **consumer command: > $ bin/kafka-console-consumer.sh --zookeeper > x.x.x.x:2181/dcos-service-kafka-framework --topic test3R3P3 > 1>~/kafka_output.log > message stored: > $ wc -l ~/kafka_output.log > 99932 /home/montana/kafka_output.log > I found only 99932 message are stored and 68 messages are lost. > **topic describe command: > $ bin/kafka-topics.sh --zookeeper x.x.x.x:2181/dcos-service-kafka-framework > --describe |grep test3R3 > Topic:test3R3P3 PartitionCount:3
[jira] [Commented] (KAFKA-4610) getting error:Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-1
[ https://issues.apache.org/jira/browse/KAFKA-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820941#comment-15820941 ] sandeep kumar singh commented on KAFKA-4610: thanks for the update. i am trying to check in broker logs. but non of the brokers throwing any exceptions when this error occurred. i am seeing this error "ERROR Error when sending message to topic test3R3P3 with key: null, value: 4096 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test3R3P3-0" every time i run producer. producer command - cat kafka_output.log | bin/kafka-console-producer.sh --broker-list localhost:9092,localhost:9093,localhost:9094 --batch-size 1000 --message-send-max-retries 10 --request-required-acks -1 --topic test3R3P3 kafka_output.log has 10 records of 4096 length each i see this error even when all brokers are healthy and all partitions have valid leaders. are you saying brokers restarts internally may be due to load? but when i check the UNIX Process ID of brokers before and after running broker, i see the PID is same. which means the brokers are not restarted. > getting error:Batch containing 3 record(s) expired due to timeout while > requesting metadata from brokers for test2R2P2-1 > > > Key: KAFKA-4610 > URL: https://issues.apache.org/jira/browse/KAFKA-4610 > Project: Kafka > Issue Type: Bug > Environment: Dev >Reporter: sandeep kumar singh > > i a getting below error when running producer client, which take messages > from an input file kafka_message.log. this log file is pilled with 10 > records per second of each message of length 4096 > error - > [2017-01-09 14:45:24,813] ERROR Error when sending message to topic test2R2P2 > with key: null, value: 4096 bytes with error: > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) > expired due to timeout while requesting metadata from brokers for test2R2P2-0 > [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 > with key: null, value: 4096 bytes with error: > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) > expired due to timeout while requesting metadata from brokers for test2R2P2-0 > [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 > with key: null, value: 4096 bytes with error: > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) > expired due to timeout while requesting metadata from brokers for test2R2P2-0 > [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 > with key: null, value: 4096 bytes with error: > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) > expired due to timeout while requesting metadata from brokers for test2R2P2-0 > [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 > with key: null, value: 4096 bytes with error: > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) > expired due to timeout while requesting metadata from brokers for test2R2P2-0 > command i run : > $ bin/kafka-console-producer.sh --broker-list x.x.x.x:,x.x.x.x: > --batch-size 1000 --message-send-max-retries 10 --request-required-acks 1 > --topic test2R2P2 <~/kafka_message.log > there are 2 brokers running and the topic has partitions = 2 and replication > factor 2. > Could you please help me understand what does that error means? > also i see message loss when i manually restart one of the broker and while > kafak-producer-perf-test command is running? is this a expected behavior? > thanks > Sandeep -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1843) Metadata fetch/refresh in new producer should handle all node connection states gracefully
[ https://issues.apache.org/jira/browse/KAFKA-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820892#comment-15820892 ] sandeep kumar singh commented on KAFKA-1843: in step 4 when you bring K1 up, should it not update/refresh metadata as (K1,K2) immediately, without worrying for metadata.max.age.ms. > Metadata fetch/refresh in new producer should handle all node connection > states gracefully > -- > > Key: KAFKA-1843 > URL: https://issues.apache.org/jira/browse/KAFKA-1843 > Project: Kafka > Issue Type: Bug > Components: clients, producer >Affects Versions: 0.8.2.0 >Reporter: Ewen Cheslack-Postava > Labels: patch > Fix For: 0.8.2.1 > > > KAFKA-1642 resolved some issues with the handling of broker connection states > to avoid high CPU usage, but made the minimal fix rather than the ideal one. > The code for handling the metadata fetch is difficult to get right because it > has to handle a lot of possible connectivity states and failure modes across > all the known nodes. It also needs to correctly integrate with the > surrounding event loop, providing correct poll() timeouts to both avoid busy > looping and make sure it wakes up and tries new nodes in the face of both > connection and request failures. > A patch here should address a few issues: > 1. Make sure connection timeouts, as implemented in KAFKA-1842, are cleanly > integrated. This mostly means that when a connecting node is selected to > fetch metadata from, that the code notices that and sets the next timeout > based on the connection timeout rather than some other backoff. > 2. Rethink the logic and naming of NetworkClient.leastLoadedNode. That method > actually takes into account a) the current connectivity of each node, b) > whether the node had a recent connection failure, c) the "load" in terms of > in flight requests. It also needs to ensure that different clients don't use > the same ordering across multiple calls (which is already addressed in the > current code by nodeIndexOffset) and that we always eventually try all nodes > in the face of connection failures (which isn't currently handled by > leastLoadedNode and probably cannot be without tracking additional state). > This method also has to work for new consumer use cases even though it is > currently only used by the new producer's metadata fetch. Finally it has to > properly handle when other code calls initiateConnect() since the normal path > for sending messages also initiates connections. > We can already say that there is an order of preference given a single call > (as follows), but making this work across multiple calls when some initial > choices fail to connect or return metadata *and* connection states may be > changing is much more difficult. > * Connected, zero in flight requests - the request can be sent immediately > * Connecting node - it will hopefully be connected very soon and by > definition has no in flight requests > * Disconnected - same reasoning as for a connecting node > * Connected, > 0 in flight requests - we consider any # of in flight > requests as a big enough backlog to delay the request a lot. > We could use an approach that better accounts for # of in flight requests > rather than just turning it into a boolean variable, but that probably > introduces much more complexity than it is worth. > 3. The most difficult case to handle so far has been when leastLoadedNode > returns a disconnected node to maybeUpdateMetadata as its best option. > Properly handling the two resulting cases (initiateConnect fails immediately > vs. taking some time to possibly establish the connection) is tricky. > 4. Consider optimizing for the failure cases. The most common cases are when > you already have an active connection and can immediately get the metadata or > you need to establish a connection, but the connection and metadata > request/response happen very quickly. These common cases are infrequent > enough (default every 5 min) that establishing an extra connection isn't a > big deal as long as it's eventually cleaned up. The edge cases, like network > partitions where some subset of nodes become unreachable for a long period, > are harder to reason about but we should be sure we will always be able to > gracefully recover from them. > KAFKA-1642 enumerated the possible outcomes of a single call to > maybeUpdateMetadata. A good fix for this would consider all of those outcomes > for repeated calls to -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4616) Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between
[ https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820237#comment-15820237 ] sandeep kumar singh commented on KAFKA-4616: is there any way to add this to perf-test command? i can add this option in producer command but not sure on how to add this option with pert-test command.. > Message log is seen when kafka-producer-perf-test.sh is running and any > broker restarted in middle in-between > -- > > Key: KAFKA-4616 > URL: https://issues.apache.org/jira/browse/KAFKA-4616 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.0.0 > Environment: Apache mesos >Reporter: sandeep kumar singh > > if any broker is restarted while kafka-producer-perf-test.sh command is > running, we see message loss. > commands i run: > **perf command: > $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096 > --throughput 1000 --topic test3R3P3 --producer-props > bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x: > I am sending 10 messages of each having size 4096 > error thrown by perf command: > 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, > 433.0 max latency. > 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, > 798.0 max latency. > 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, > 503.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, > 594.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, > 501.0 max latency. > 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, > 516.0 max latency. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > truncated > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, > 497.0 max latency. > 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, > 521.0 max latency. > 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, > 418.0 max latency. > 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg > latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms > 99.9th. > **consumer command: > $ bin/kafka-console-consumer.sh --zookeeper > x.x.x.x:2181/dcos-service-kafka-framework --topic test3R3P3 > 1>~/kafka_output.log > message stored: > $ wc -l ~/kafka_output.log > 99932 /home/montana/kafka_output.log > I found only 99932 message are stored and 68 messages are lost. > **topic describe command: > $ bin/kafka-topics.sh --zookeeper x.x.x.x:2181/dcos-service-kafka-framework > --describe |grep test3R3 > Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs: > Topic: test3R3P3Partition: 0Leader: 2 Replicas: > 1,2,0 Isr: 2,0,1 > Topic: test3R3P3Partition: 1Leader: 2 Replicas: > 2,0,1 Isr: 2,0,1 > Topic: test3R3P3Partition: 2Leader: 0 Replicas: > 0,1,2 Isr: 2,0,1 > **consumer group command: > $ bin/kafka-consumer-groups.sh --zookeeper > x.x.x.x:2181/dcos-service-kafka-framework --describe --group > console-consumer-9926 > GROUP TOPIC PARTITION > CURRENT-OFFSET LOG-END-OFFSET LAG OWNER > console-consumer-9926 test3R3P3 0 > 33265 33265 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > console-consumer-9926 test3R3P3 1 > 4 4 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > console-consumer-9926 test3R3P3 2 > 3 3 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > could you please help me understand what this error means "err - > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received."? > Could you please provide suggestion to fix this issue? > we are seeing this behavior every-time we perform above test-scenario. > my understanding is, there should not any data loss till n-1 broker is alive. > is message loss is an expected behavior in the above case? > thanks > Sandeep -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4616) Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between
sandeep kumar singh created KAFKA-4616: -- Summary: Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between Key: KAFKA-4616 URL: https://issues.apache.org/jira/browse/KAFKA-4616 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.10.0.0 Environment: Apache mesos Reporter: sandeep kumar singh if any broker is restarted while kafka-producer-perf-test.sh command is running, we see message loss. commands i run: **perf command: $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096 --throughput 1000 --topic test3R3P3 --producer-props bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x: I am sending 10 messages of each having size 4096 error thrown by perf command: 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, 433.0 max latency. 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, 798.0 max latency. 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, 503.0 max latency. 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, 594.0 max latency. 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, 501.0 max latency. 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, 516.0 max latency. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. truncated 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, 497.0 max latency. 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, 521.0 max latency. 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, 418.0 max latency. 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms 99.9th. **consumer command: $ bin/kafka-console-consumer.sh --zookeeper x.x.x.x:2181/dcos-service-kafka-framework --topic test3R3P3 1>~/kafka_output.log message stored: $ wc -l ~/kafka_output.log 99932 /home/montana/kafka_output.log I found only 99932 message are stored and 68 messages are lost. **topic describe command: $ bin/kafka-topics.sh --zookeeper x.x.x.x:2181/dcos-service-kafka-framework --describe |grep test3R3 Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs: Topic: test3R3P3Partition: 0Leader: 2 Replicas: 1,2,0 Isr: 2,0,1 Topic: test3R3P3Partition: 1Leader: 2 Replicas: 2,0,1 Isr: 2,0,1 Topic: test3R3P3Partition: 2Leader: 0 Replicas: 0,1,2 Isr: 2,0,1 **consumer group command: $ bin/kafka-consumer-groups.sh --zookeeper x.x.x.x:2181/dcos-service-kafka-framework --describe --group console-consumer-9926 GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER console-consumer-9926 test3R3P3 0 33265 33265 0 console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 console-consumer-9926 test3R3P3 1 4 4 0 console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 console-consumer-9926 test3R3P3 2 3 3 0 console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 could you please help me understand what this error means "err - org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received."? Could you please provide suggestion to fix this issue? we are seeing this behavior every-time we perform above test-scenario. my understanding is, there should not any data loss till n-1 broker is alive. is message loss is an expected behavior in the above case? thanks Sandeep -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4610) getting error:Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-1
[ https://issues.apache.org/jira/browse/KAFKA-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814494#comment-15814494 ] sandeep kumar singh commented on KAFKA-4610: thanks for reply. yes most of the messages get stored in kafka. say if i send 10 messages then i see almost 99000 messages get stored. update: i only see this error when i restart the leader broker for topic test2R2P2-0. this topic has replication 2 and partition 2. i have 2 broker cluster in my setup. could you please suggest how can we avoid such failures? on a separate note: i see data loss when running kafka-producer-perf-test.sh and killing one of the broker (in 2 node cluster) when the test is running. is this a expected behavior? i see same results for multiple tests. > getting error:Batch containing 3 record(s) expired due to timeout while > requesting metadata from brokers for test2R2P2-1 > > > Key: KAFKA-4610 > URL: https://issues.apache.org/jira/browse/KAFKA-4610 > Project: Kafka > Issue Type: Bug > Environment: Dev >Reporter: sandeep kumar singh > > i a getting below error when running producer client, which take messages > from an input file kafka_message.log. this log file is pilled with 10 > records per second of each message of length 4096 > error - > [2017-01-09 14:45:24,813] ERROR Error when sending message to topic test2R2P2 > with key: null, value: 4096 bytes with error: > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) > expired due to timeout while requesting metadata from brokers for test2R2P2-0 > [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 > with key: null, value: 4096 bytes with error: > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) > expired due to timeout while requesting metadata from brokers for test2R2P2-0 > [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 > with key: null, value: 4096 bytes with error: > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) > expired due to timeout while requesting metadata from brokers for test2R2P2-0 > [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 > with key: null, value: 4096 bytes with error: > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) > expired due to timeout while requesting metadata from brokers for test2R2P2-0 > [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 > with key: null, value: 4096 bytes with error: > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) > expired due to timeout while requesting metadata from brokers for test2R2P2-0 > command i run : > $ bin/kafka-console-producer.sh --broker-list x.x.x.x:,x.x.x.x: > --batch-size 1000 --message-send-max-retries 10 --request-required-acks 1 > --topic test2R2P2 <~/kafka_message.log > there are 2 brokers running and the topic has partitions = 2 and replication > factor 2. > Could you please help me understand what does that error means? > also i see message loss when i manually restart one of the broker and while > kafak-producer-perf-test command is running? is this a expected behavior? > thanks > Sandeep -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4610) getting error:Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-1
sandeep kumar singh created KAFKA-4610: -- Summary: getting error:Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-1 Key: KAFKA-4610 URL: https://issues.apache.org/jira/browse/KAFKA-4610 Project: Kafka Issue Type: Bug Environment: Dev Reporter: sandeep kumar singh Fix For: 0.10.2.0 i a getting below error when running producer client, which take messages from an input file kafka_message.log. this log file is pilled with 10 records per second of each message of length 4096 error - [2017-01-09 14:45:24,813] ERROR Error when sending message to topic test2R2P2 with key: null, value: 4096 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-0 [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 with key: null, value: 4096 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-0 [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 with key: null, value: 4096 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-0 [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 with key: null, value: 4096 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-0 [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 with key: null, value: 4096 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-0 command i run : $ bin/kafka-console-producer.sh --broker-list x.x.x.x:,x.x.x.x: --batch-size 1000 --message-send-max-retries 10 --request-required-acks 1 --topic test2R2P2 <~/kafka_message.log there are 2 brokers running and the topic has partitions = 2 and replication factor 2. Could you please help me understand what does that error means? also i see message loss when i manually restart one of the broker and while kafak-producer-perf-test command is running? is this a expected behavior? thanks Sandeep -- This message was sent by Atlassian JIRA (v6.3.4#6332)