[jira] [Updated] (KAFKA-4616) Message loss is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle

2017-01-12 Thread sandeep kumar singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandeep kumar singh updated KAFKA-4616:
---
Summary: Message loss is seen when kafka-producer-perf-test.sh is running 
and any broker restarted in middle  (was: Message log is seen when 
kafka-producer-perf-test.sh is running and any broker restarted in middle 
in-between )

> Message loss is seen when kafka-producer-perf-test.sh is running and any 
> broker restarted in middle
> ---
>
> Key: KAFKA-4616
> URL: https://issues.apache.org/jira/browse/KAFKA-4616
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: Apache mesos
>Reporter: sandeep kumar singh
>
> if any broker is restarted while kafka-producer-perf-test.sh command is 
> running, we see message loss.
> commands i run:
> **perf command:
> $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096  
> --throughput 1000 --topic test3R3P3 --producer-props 
> bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x:
> I am  sending 10 messages of each having size 4096
> error thrown by perf command:
> 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, 
> 433.0 max latency.
> 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, 
> 798.0 max latency.
> 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, 
> 503.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, 
> 594.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, 
> 501.0 max latency.
> 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, 
> 516.0 max latency.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> truncated
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, 
> 497.0 max latency.
> 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, 
> 521.0 max latency.
> 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, 
> 418.0 max latency.
> 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg 
> latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms 
> 99.9th.
> **consumer command:
> $ bin/kafka-console-consumer.sh --zookeeper 
> x.x.x.x:2181/dcos-service-kafka-framework --topic  test3R3P3  
> 1>~/kafka_output.log
> message stored:
> $ wc -l ~/kafka_output.log
> 99932 /home/montana/kafka_output.log
> I found only 99932 message are stored and 68 messages are lost.
> **topic describe command:
>  $ bin/kafka-topics.sh  --zookeeper x.x.x.x:2181/dcos-service-kafka-framework 
> --describe |grep test3R3
> Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs:
> Topic: test3R3P3Partition: 0Leader: 2   Replicas: 
> 1,2,0 Isr: 2,0,1
> Topic: test3R3P3Partition: 1Leader: 2   Replicas: 
> 2,0,1 Isr: 2,0,1
> Topic: test3R3P3Partition: 2Leader: 0   Replicas: 
> 0,1,2 Isr: 2,0,1
> **consumer group command:
> $  bin/kafka-consumer-groups.sh --zookeeper 
> x.x.x.x:2181/dcos-service-kafka-framework --describe --group 
> console-consumer-9926
> GROUP  TOPIC  PARTITION  
> CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
> console-consumer-9926  test3R3P3  0  
> 33265   33265   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> console-consumer-9926  test3R3P3  1  
> 4   4   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> console-consumer-9926  test3R3P3  2  
> 3   3   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> could you please help me understand what this error means "err - 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received."?
> Could you please provide suggestion to fix this issue?
> we are seeing this behavior every-time we perform above test-scenario.
> my understanding is, there should not any data loss till n-1 broker is alive. 
> is message loss is an expected behavior in the above case?
> thanks
> Sandeep



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4616) Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between

2017-01-12 Thread sandeep kumar singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821003#comment-15821003
 ] 

sandeep kumar singh commented on KAFKA-4616:


thanks for reply. i applied acks=-1 option, but still see message loss.

command i ran:
$ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096 
--throughput 5000 --topic test2R3P3 --producer-props 
bootstrap.servers=localhost:9092,localhost:9093,localhost:9094 acks=-1
8890 records sent, 1777.3 records/sec (6.94 MB/sec), 2039.2 ms avg latency, 
3282.0 max latency.
12342 records sent, 2468.4 records/sec (9.64 MB/sec), 2648.8 ms avg latency, 
3448.0 max latency.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition.
truncated
org.apache.kafka.common.errors.NetworkException: The server disconnected before 
a response was received.
...truncated
10 records sent, 3716.504999 records/sec (14.52 MB/sec), 1565.19 ms avg 
latency, 3634.00 ms max latency, 1470 ms 50th, 3205 ms 95th, 3357 ms 99th, 3502 
ms 99.9th.

$ bin/kafka-consumer-groups.sh --zookeeper 127.0.0.1:2181 --describe --group 
console-consumer-96681
GROUP  TOPIC  PARTITION  
CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
console-consumer-96681 test2R3P3  0  3  
 3   0   
console-consumer-96681_localhost.localdomain-1482869188877-44ac0d84-0
console-consumer-96681 test2R3P3  1  33271  
 33271   0   
console-consumer-96681_localhost.localdomain-1482869188877-44ac0d84-0
console-consumer-96681 test2R3P3  2  3  
 3   0   
console-consumer-96681_localhost.localdomain-1482869188877-44ac0d84-0

i send 10 messages but could see only 99937 messages get stored.


> Message log is seen when kafka-producer-perf-test.sh is running and any 
> broker restarted in middle in-between 
> --
>
> Key: KAFKA-4616
> URL: https://issues.apache.org/jira/browse/KAFKA-4616
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: Apache mesos
>Reporter: sandeep kumar singh
>
> if any broker is restarted while kafka-producer-perf-test.sh command is 
> running, we see message loss.
> commands i run:
> **perf command:
> $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096  
> --throughput 1000 --topic test3R3P3 --producer-props 
> bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x:
> I am  sending 10 messages of each having size 4096
> error thrown by perf command:
> 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, 
> 433.0 max latency.
> 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, 
> 798.0 max latency.
> 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, 
> 503.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, 
> 594.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, 
> 501.0 max latency.
> 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, 
> 516.0 max latency.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> truncated
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, 
> 497.0 max latency.
> 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, 
> 521.0 max latency.
> 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, 
> 418.0 max latency.
> 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg 
> latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms 
> 99.9th.
> **consumer command:
> $ bin/kafka-console-consumer.sh --zookeeper 
> x.x.x.x:2181/dcos-service-kafka-framework --topic  test3R3P3  
> 1>~/kafka_output.log
> message stored:
> $ wc -l ~/kafka_output.log
> 99932 /home/montana/kafka_output.log
> I found only 99932 message are stored and 68 messages are lost.
> **topic describe command:
>  $ bin/kafka-topics.sh  --zookeeper x.x.x.x:2181/dcos-service-kafka-framework 
> --describe |grep test3R3
> Topic:test3R3P3 PartitionCount:3

[jira] [Commented] (KAFKA-4610) getting error:Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-1

2017-01-12 Thread sandeep kumar singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820941#comment-15820941
 ] 

sandeep kumar singh commented on KAFKA-4610:


thanks for the update. i am trying to check in broker logs. but non of the 
brokers throwing any exceptions when this error occurred. i am seeing this 
error "ERROR Error when sending message to topic test3R3P3 with key: null, 
value: 4096 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
expired due to timeout while requesting metadata from brokers for test3R3P3-0" 
every time i run producer.  

producer command - cat kafka_output.log | bin/kafka-console-producer.sh 
--broker-list localhost:9092,localhost:9093,localhost:9094 --batch-size 1000 
--message-send-max-retries 10 --request-required-acks -1 --topic test3R3P3

kafka_output.log has 10 records of 4096 length each

i see this error even when all brokers are healthy and all partitions have 
valid leaders. 

are you saying brokers restarts internally may be due to load? but when i check 
the UNIX Process ID of brokers before and after running broker, i see the PID 
is same. which means the brokers are not restarted.

> getting error:Batch containing 3 record(s) expired due to timeout while 
> requesting metadata from brokers for test2R2P2-1
> 
>
> Key: KAFKA-4610
> URL: https://issues.apache.org/jira/browse/KAFKA-4610
> Project: Kafka
>  Issue Type: Bug
> Environment: Dev
>Reporter: sandeep kumar singh
>
> i a getting below error when running producer client, which take messages 
> from an input file kafka_message.log. this log file is pilled with 10 
> records per second of each message of length 4096
> error - 
> [2017-01-09 14:45:24,813] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> command i run :
> $ bin/kafka-console-producer.sh --broker-list x.x.x.x:,x.x.x.x: 
> --batch-size 1000 --message-send-max-retries 10 --request-required-acks 1 
> --topic test2R2P2 <~/kafka_message.log
> there are 2 brokers running and the topic has partitions = 2 and replication 
> factor 2. 
> Could you please help me understand what does that error means?
> also i see message loss when i manually restart one of the broker and while 
> kafak-producer-perf-test command is running? is this a expected behavior?
> thanks
> Sandeep



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1843) Metadata fetch/refresh in new producer should handle all node connection states gracefully

2017-01-12 Thread sandeep kumar singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820892#comment-15820892
 ] 

sandeep kumar singh commented on KAFKA-1843:


in step 4 when you bring K1 up, should it not update/refresh metadata as 
(K1,K2) immediately, without worrying for  metadata.max.age.ms.

> Metadata fetch/refresh in new producer should handle all node connection 
> states gracefully
> --
>
> Key: KAFKA-1843
> URL: https://issues.apache.org/jira/browse/KAFKA-1843
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, producer 
>Affects Versions: 0.8.2.0
>Reporter: Ewen Cheslack-Postava
>  Labels: patch
> Fix For: 0.8.2.1
>
>
> KAFKA-1642 resolved some issues with the handling of broker connection states 
> to avoid high CPU usage, but made the minimal fix rather than the ideal one. 
> The code for handling the metadata fetch is difficult to get right because it 
> has to handle a lot of possible connectivity states and failure modes across 
> all the known nodes. It also needs to correctly integrate with the 
> surrounding event loop, providing correct poll() timeouts to both avoid busy 
> looping and make sure it wakes up and tries new nodes in the face of both 
> connection and request failures.
> A patch here should address a few issues:
> 1. Make sure connection timeouts, as implemented in KAFKA-1842, are cleanly 
> integrated. This mostly means that when a connecting node is selected to 
> fetch metadata from, that the code notices that and sets the next timeout 
> based on the connection timeout rather than some other backoff.
> 2. Rethink the logic and naming of NetworkClient.leastLoadedNode. That method 
> actually takes into account a) the current connectivity of each node, b) 
> whether the node had a recent connection failure, c) the "load" in terms of 
> in flight requests. It also needs to ensure that different clients don't use 
> the same ordering across multiple calls (which is already addressed in the 
> current code by nodeIndexOffset) and that we always eventually try all nodes 
> in the face of connection failures (which isn't currently handled by 
> leastLoadedNode and probably cannot be without tracking additional state). 
> This method also has to work for new consumer use cases even though it is 
> currently only used by the new producer's metadata fetch. Finally it has to 
> properly handle when other code calls initiateConnect() since the normal path 
> for sending messages also initiates connections.
> We can already say that there is an order of preference given a single call 
> (as follows), but making this work across multiple calls when some initial 
> choices fail to connect or return metadata *and* connection states may be 
> changing is much more difficult.
>  * Connected, zero in flight requests - the request can be sent immediately
>  * Connecting node - it will hopefully be connected very soon and by 
> definition has no in flight requests
>  * Disconnected - same reasoning as for a connecting node
>  * Connected, > 0 in flight requests - we consider any # of in flight 
> requests as a big enough backlog to delay the request a lot.
> We could use an approach that better accounts for # of in flight requests 
> rather than just turning it into a boolean variable, but that probably 
> introduces much more complexity than it is worth.
> 3. The most difficult case to handle so far has been when leastLoadedNode 
> returns a disconnected node to maybeUpdateMetadata as its best option. 
> Properly handling the two resulting cases (initiateConnect fails immediately 
> vs. taking some time to possibly establish the connection) is tricky.
> 4. Consider optimizing for the failure cases. The most common cases are when 
> you already have an active connection and can immediately get the metadata or 
> you need to establish a connection, but the connection and metadata 
> request/response happen very quickly. These common cases are infrequent 
> enough (default every 5 min) that establishing an extra connection isn't a 
> big deal as long as it's eventually cleaned up. The edge cases, like network 
> partitions where some subset of nodes become unreachable for a long period, 
> are harder to reason about but we should be sure we will always be able to 
> gracefully recover from them.
> KAFKA-1642 enumerated the possible outcomes of a single call to 
> maybeUpdateMetadata. A good fix for this would consider all of those outcomes 
> for repeated calls to 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4616) Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between

2017-01-11 Thread sandeep kumar singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820237#comment-15820237
 ] 

sandeep kumar singh commented on KAFKA-4616:


is there any way to add this to perf-test command? i can add this option in 
producer command but not sure on how to add this option with pert-test command..


> Message log is seen when kafka-producer-perf-test.sh is running and any 
> broker restarted in middle in-between 
> --
>
> Key: KAFKA-4616
> URL: https://issues.apache.org/jira/browse/KAFKA-4616
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: Apache mesos
>Reporter: sandeep kumar singh
>
> if any broker is restarted while kafka-producer-perf-test.sh command is 
> running, we see message loss.
> commands i run:
> **perf command:
> $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096  
> --throughput 1000 --topic test3R3P3 --producer-props 
> bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x:
> I am  sending 10 messages of each having size 4096
> error thrown by perf command:
> 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, 
> 433.0 max latency.
> 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, 
> 798.0 max latency.
> 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, 
> 503.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, 
> 594.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, 
> 501.0 max latency.
> 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, 
> 516.0 max latency.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> truncated
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, 
> 497.0 max latency.
> 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, 
> 521.0 max latency.
> 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, 
> 418.0 max latency.
> 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg 
> latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms 
> 99.9th.
> **consumer command:
> $ bin/kafka-console-consumer.sh --zookeeper 
> x.x.x.x:2181/dcos-service-kafka-framework --topic  test3R3P3  
> 1>~/kafka_output.log
> message stored:
> $ wc -l ~/kafka_output.log
> 99932 /home/montana/kafka_output.log
> I found only 99932 message are stored and 68 messages are lost.
> **topic describe command:
>  $ bin/kafka-topics.sh  --zookeeper x.x.x.x:2181/dcos-service-kafka-framework 
> --describe |grep test3R3
> Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs:
> Topic: test3R3P3Partition: 0Leader: 2   Replicas: 
> 1,2,0 Isr: 2,0,1
> Topic: test3R3P3Partition: 1Leader: 2   Replicas: 
> 2,0,1 Isr: 2,0,1
> Topic: test3R3P3Partition: 2Leader: 0   Replicas: 
> 0,1,2 Isr: 2,0,1
> **consumer group command:
> $  bin/kafka-consumer-groups.sh --zookeeper 
> x.x.x.x:2181/dcos-service-kafka-framework --describe --group 
> console-consumer-9926
> GROUP  TOPIC  PARTITION  
> CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
> console-consumer-9926  test3R3P3  0  
> 33265   33265   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> console-consumer-9926  test3R3P3  1  
> 4   4   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> console-consumer-9926  test3R3P3  2  
> 3   3   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> could you please help me understand what this error means "err - 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received."?
> Could you please provide suggestion to fix this issue?
> we are seeing this behavior every-time we perform above test-scenario.
> my understanding is, there should not any data loss till n-1 broker is alive. 
> is message loss is an expected behavior in the above case?
> thanks
> Sandeep



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4616) Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between

2017-01-11 Thread sandeep kumar singh (JIRA)
sandeep kumar singh created KAFKA-4616:
--

 Summary: Message log is seen when kafka-producer-perf-test.sh is 
running and any broker restarted in middle in-between 
 Key: KAFKA-4616
 URL: https://issues.apache.org/jira/browse/KAFKA-4616
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.0.0
 Environment: Apache mesos
Reporter: sandeep kumar singh


if any broker is restarted while kafka-producer-perf-test.sh command is 
running, we see message loss.

commands i run:

**perf command:
$ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096  
--throughput 1000 --topic test3R3P3 --producer-props 
bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x:

I am  sending 10 messages of each having size 4096

error thrown by perf command:
4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, 433.0 
max latency.
5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, 798.0 
max latency.
5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, 503.0 
max latency.
5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, 594.0 
max latency.
5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, 501.0 
max latency.
5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, 516.0 
max latency.
org.apache.kafka.common.errors.NetworkException: The server disconnected before 
a response was received.
org.apache.kafka.common.errors.NetworkException: The server disconnected before 
a response was received.
org.apache.kafka.common.errors.NetworkException: The server disconnected before 
a response was received.
truncated
5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, 497.0 
max latency.
4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, 521.0 
max latency.
5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, 418.0 
max latency.
10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg 
latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms 
99.9th.


**consumer command:
$ bin/kafka-console-consumer.sh --zookeeper 
x.x.x.x:2181/dcos-service-kafka-framework --topic  test3R3P3  
1>~/kafka_output.log

message stored:
$ wc -l ~/kafka_output.log
99932 /home/montana/kafka_output.log

I found only 99932 message are stored and 68 messages are lost.

**topic describe command:
 $ bin/kafka-topics.sh  --zookeeper x.x.x.x:2181/dcos-service-kafka-framework 
--describe |grep test3R3
Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs:
Topic: test3R3P3Partition: 0Leader: 2   Replicas: 1,2,0 
Isr: 2,0,1
Topic: test3R3P3Partition: 1Leader: 2   Replicas: 2,0,1 
Isr: 2,0,1
Topic: test3R3P3Partition: 2Leader: 0   Replicas: 0,1,2 
Isr: 2,0,1

**consumer group command:
$  bin/kafka-consumer-groups.sh --zookeeper 
x.x.x.x:2181/dcos-service-kafka-framework --describe --group 
console-consumer-9926
GROUP  TOPIC  PARTITION  
CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
console-consumer-9926  test3R3P3  0  33265  
 33265   0   
console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
console-consumer-9926  test3R3P3  1  4  
 4   0   
console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
console-consumer-9926  test3R3P3  2  3  
 3   0   
console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0


could you please help me understand what this error means "err - 
org.apache.kafka.common.errors.NetworkException: The server disconnected before 
a response was received."?

Could you please provide suggestion to fix this issue?

we are seeing this behavior every-time we perform above test-scenario.

my understanding is, there should not any data loss till n-1 broker is alive. 
is message loss is an expected behavior in the above case?

thanks
Sandeep



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4610) getting error:Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-1

2017-01-10 Thread sandeep kumar singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814494#comment-15814494
 ] 

sandeep kumar singh commented on KAFKA-4610:


thanks for reply. yes most of the messages get stored in kafka. say if i send 
10 messages then i see almost 99000 messages get stored. 

update: i only see this error when i restart the leader broker for topic 
test2R2P2-0. this topic has replication 2 and partition 2. i have 2 broker 
cluster in my setup. could you please suggest how can we avoid such failures?

on a separate note: i see data loss when running kafka-producer-perf-test.sh 
and killing one of the broker (in 2 node cluster) when the test is running. is 
this a expected behavior? i see same results for multiple tests.



> getting error:Batch containing 3 record(s) expired due to timeout while 
> requesting metadata from brokers for test2R2P2-1
> 
>
> Key: KAFKA-4610
> URL: https://issues.apache.org/jira/browse/KAFKA-4610
> Project: Kafka
>  Issue Type: Bug
> Environment: Dev
>Reporter: sandeep kumar singh
>
> i a getting below error when running producer client, which take messages 
> from an input file kafka_message.log. this log file is pilled with 10 
> records per second of each message of length 4096
> error - 
> [2017-01-09 14:45:24,813] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> command i run :
> $ bin/kafka-console-producer.sh --broker-list x.x.x.x:,x.x.x.x: 
> --batch-size 1000 --message-send-max-retries 10 --request-required-acks 1 
> --topic test2R2P2 <~/kafka_message.log
> there are 2 brokers running and the topic has partitions = 2 and replication 
> factor 2. 
> Could you please help me understand what does that error means?
> also i see message loss when i manually restart one of the broker and while 
> kafak-producer-perf-test command is running? is this a expected behavior?
> thanks
> Sandeep



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4610) getting error:Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-1

2017-01-09 Thread sandeep kumar singh (JIRA)
sandeep kumar singh created KAFKA-4610:
--

 Summary: getting error:Batch containing 3 record(s) expired due to 
timeout while requesting metadata from brokers for test2R2P2-1
 Key: KAFKA-4610
 URL: https://issues.apache.org/jira/browse/KAFKA-4610
 Project: Kafka
  Issue Type: Bug
 Environment: Dev
Reporter: sandeep kumar singh
 Fix For: 0.10.2.0


i a getting below error when running producer client, which take messages from 
an input file kafka_message.log. this log file is pilled with 10 records 
per second of each message of length 4096

error - 
[2017-01-09 14:45:24,813] ERROR Error when sending message to topic test2R2P2 
with key: null, value: 4096 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
expired due to timeout while requesting metadata from brokers for test2R2P2-0
[2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
with key: null, value: 4096 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
expired due to timeout while requesting metadata from brokers for test2R2P2-0
[2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
with key: null, value: 4096 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
expired due to timeout while requesting metadata from brokers for test2R2P2-0
[2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
with key: null, value: 4096 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
expired due to timeout while requesting metadata from brokers for test2R2P2-0
[2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
with key: null, value: 4096 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
expired due to timeout while requesting metadata from brokers for test2R2P2-0

command i run :
$ bin/kafka-console-producer.sh --broker-list x.x.x.x:,x.x.x.x: 
--batch-size 1000 --message-send-max-retries 10 --request-required-acks 1 
--topic test2R2P2 <~/kafka_message.log

there are 2 brokers running and the topic has partitions = 2 and replication 
factor 2. 

Could you please help me understand what does that error means?
also i see message loss when i manually restart one of the broker and while 
kafak-producer-perf-test command is running? is this a expected behavior?

thanks
Sandeep



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)