[jira] [Created] (KAFKA-1628) [New Java Producer] Topic which contains "." does not correct corresponding metric name

2014-09-10 Thread Bhavesh Mistry (JIRA)
Bhavesh Mistry created KAFKA-1628:
-

 Summary: [New Java Producer] Topic which contains "."  does not 
correct corresponding metric name 
 Key: KAFKA-1628
 URL: https://issues.apache.org/jira/browse/KAFKA-1628
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.8.2
 Environment: ALL
Reporter: Bhavesh Mistry
Priority: Minor



Hmm, it seems that we do allow "." in the topic name. The topic name can't
be just "." or ".." though. So, if there is a topic "test.1", we will have
the following jmx metrics name.

kafka.producer.console-producer.topic.test:type=1

It should be changed to
kafka.producer.console-producer.topic:type=test.1

Could you file a jira to follow up on this?

Thanks,

Jun




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-09-18 Thread Bhavesh Mistry (JIRA)
Bhavesh Mistry created KAFKA-1642:
-

 Summary: [Java New Producer Kafka Trunk] CPU Usage Spike to 100% 
when network connection is lost
 Key: KAFKA-1642
 URL: https://issues.apache.org/jira/browse/KAFKA-1642
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Bhavesh Mistry
Assignee: Jun Rao


I see my CPU spike to 100% when network connection is lost for while.  It seems 
network  IO thread are very busy logging following error message.  Is this 
expected behavior ?
2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
producer I/O thread: 

java.lang.IllegalStateException: No entry found for node -2

at 
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)

at 
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)

at 
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)

at 
org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)

at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)

at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)

at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)

at java.lang.Thread.run(Thread.java:744)

Thanks,

Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-09-25 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148121#comment-14148121
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 9/25/14 6:42 PM:


HI [~jkreps],

I will work on the sample program. We are not setting reconnect.backoff.ms and 
retry.backoff.ms configuration so it would be default configuration.  Only 
thing I can tell you is that I have 4 Producer instances per JVM.  So this 
might amplify issue. 

Thanks,

Bhavesh 


was (Author: bmis13):
HI [~jkreps],

I will work on the sample program. We are not setting reconnect.backoff.ms and 
retry.backoff.ms configuration so it would be default configuration.  Only 
thing I can tell you is that I have 4 Producer instance per JVM.  So this might 
amplify issue. 

Thanks,

Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-09-25 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148121#comment-14148121
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

HI [~jkreps],

I will work on the sample program. We are not setting reconnect.backoff.ms and 
retry.backoff.ms configuration so it would be default configuration.  Only 
thing I can tell you is that I have 4 Producer instance per JVM.  So this might 
amplify issue. 

Thanks,

Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1692) [Java New Producer] IO Thread Name Must include Client ID

2014-10-08 Thread Bhavesh Mistry (JIRA)
Bhavesh Mistry created KAFKA-1692:
-

 Summary: [Java New Producer]  IO Thread Name Must include  Client 
ID
 Key: KAFKA-1692
 URL: https://issues.apache.org/jira/browse/KAFKA-1692
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Affects Versions: 0.8.2
Reporter: Bhavesh Mistry
Assignee: Jun Rao
Priority: Trivial


Please add client id so people who are looking at Jconsole or Profile tool can 
see Thread by client id since single JVM can have multiple producer instance.  

org.apache.kafka.clients.producer.KafkaProducer
{code}
String ioThreadName = "kafka-producer-network-thread";
 if(clientId != null){
ioThreadName = ioThreadName  + " | "+clientId; 
}
this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1692) [Java New Producer] IO Thread Name Must include Client ID

2014-10-08 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164275#comment-14164275
 ] 

Bhavesh Mistry commented on KAFKA-1692:
---

Description is just suggestion.  Sorry could not submit path since I have 
release next week.

Thanks,

Bhavesh

> [Java New Producer]  IO Thread Name Must include  Client ID
> ---
>
> Key: KAFKA-1692
> URL: https://issues.apache.org/jira/browse/KAFKA-1692
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>Priority: Trivial
>
> Please add client id so people who are looking at Jconsole or Profile tool 
> can see Thread by client id since single JVM can have multiple producer 
> instance.  
> org.apache.kafka.clients.producer.KafkaProducer
> {code}
> String ioThreadName = "kafka-producer-network-thread";
>  if(clientId != null){
>   ioThreadName = ioThreadName  + " | "+clientId; 
> }
> this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-13 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169415#comment-14169415
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

{code}

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

public static void main(String[] args) throws IOException {

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 4;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}

Callback callback = new Callback() {
public void onCompletion(RecordMetadata metadata, 
Exception exception) {
if(exception != null){
System.err.println("Msg dropped..!");
exception.printStackTrace();
}

}
};

ProducerRecord record = new ProducerRecord(topic, 
builder.toString().getBytes());
while (true) {
try {
for (int i = 0; i < producer.length; i++) {
producer[i].send(record, callback);
}
Thread.sleep(10);
} catch (Throwable th) {
System.err.println("FATAL ");
th.printStackTrace();
}

}

}

}

{code}

{code: name=kafkaproducer.properties }
# THIS IS FOR NEW PRODUCERS API TRUNK Please see the configuration at 
https://kafka.apache.org/documentation.html#newproducerconfigs
# Broker List
bootstrap.servers= BROKERS HERE...
#Data Acks
acks=1
# 64MB of Buffer for log lines (including all messages).
buffer.memory=134217728
compression.type=snappy
retries=3
# DEFAULT FROM THE KAFKA...
# batch size =  ((buffer.memory) / (number of partitions)) (so we can have in 
progress batch size created for each partition.).
batch.size=1048576
#2MiB
max.request.size=1048576
send.buffer.bytes=2097152
# We do not want to block the buffer Full so application thread will not be 
blocked but logs lines will be dropped...
block.on.buffer.full=false
#2MiB
send.buffer.bytes=2097152

#wait...
linger.ms=5000
{code}

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.j

[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-13 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169415#comment-14169415
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 10/13/14 4:08 PM:
-

{code TestNetworkDownProducer.java}


import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

static int numberTh = 200;
static CountDownLatch latch = new CountDownLatch(200);
public static void main(String[] args) throws IOException, 
InterruptedException {

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "logmon.test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 4;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}
ExecutorService service =   new ThreadPoolExecutor(numberTh, 
numberTh,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(numberTh *2));

for(int i = 0 ; i < numberTh;i++){
service.execute(new 
MyProducer(producer,10,builder.toString(), topic));
}   
latch.await();

System.out.println("All Producers done...!");
for (int i = 0; i < producer.length; i++) {
producer[i].close();
}   
service.shutdownNow();
System.out.println("All done...!");

}



static class MyProducer implements Runnable {

Producer[] producer;
long maxloops;
String msg ;
String topic;

MyProducer(Producer[] list, long maxloops,String msg,String 
topic){
this.producer = list;
this.maxloops = maxloops;
this.msg = msg;
this.topic = topic;
}
public void run() {
ProducerRecord record = new ProducerRecord(topic, 
msg.toString().getBytes());
Callback  callBack = new  MyCallback();
try{
for(long j=0 ; j < maxloops ; j++){
try {
for (int i = 0; i < 
producer.length; i++) {

producer[i].send(record, callBack);
}
Thread.sleep(10);
} catch (Throwable th) {
System.err.println("FATAL ");
th.printStackTrace();
}
}

}finally {
latch.countDown();
}   
}
}   

static class MyCallback implements Callback {
public void onCompletion(RecordMetadata metadata, Exception 
exception) {
if(exception != null){
System.err.println("Msg dropped..!");
exception.printStackTrace();
}

}
}

}
{code}

{code: kafkaproducer.properties }
# THIS IS FOR NEW PRODUCERS API TRUNK Please see the configuration at 
https://kafka.apache.org/documentation.html#newproducerconfigs
# Broker List
bootstrap.servers= BROKERS HERE...
#Data Acks
acks=1

[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-13 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169415#comment-14169415
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 10/13/14 4:09 PM:
-

{code}


import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

static int numberTh = 200;
static CountDownLatch latch = new CountDownLatch(200);
public static void main(String[] args) throws IOException, 
InterruptedException {

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 4;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}
ExecutorService service =   new ThreadPoolExecutor(numberTh, 
numberTh,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(numberTh *2));

for(int i = 0 ; i < numberTh;i++){
service.execute(new 
MyProducer(producer,10,builder.toString(), topic));
}   
latch.await();

System.out.println("All Producers done...!");
for (int i = 0; i < producer.length; i++) {
producer[i].close();
}   
service.shutdownNow();
System.out.println("All done...!");

}



static class MyProducer implements Runnable {

Producer[] producer;
long maxloops;
String msg ;
String topic;

MyProducer(Producer[] list, long maxloops,String msg,String 
topic){
this.producer = list;
this.maxloops = maxloops;
this.msg = msg;
this.topic = topic;
}
public void run() {
ProducerRecord record = new ProducerRecord(topic, 
msg.toString().getBytes());
Callback  callBack = new  MyCallback();
try{
for(long j=0 ; j < maxloops ; j++){
try {
for (int i = 0; i < 
producer.length; i++) {

producer[i].send(record, callBack);
}
Thread.sleep(10);
} catch (Throwable th) {
System.err.println("FATAL ");
th.printStackTrace();
}
}

}finally {
latch.countDown();
}   
}
}   

static class MyCallback implements Callback {
public void onCompletion(RecordMetadata metadata, Exception 
exception) {
if(exception != null){
System.err.println("Msg dropped..!");
exception.printStackTrace();
}

}
}

}
{code}

Property File
{code }
# THIS IS FOR NEW PRODUCERS API TRUNK Please see the configuration at 
https://kafka.apache.org/documentation.html#newproducerconfigs
# Broker List
bootstrap.servers= BROKERS HERE...
#Data Acks
acks=1
# 64MB of Buffer for log lines (including al

[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-13 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169415#comment-14169415
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 10/13/14 4:09 PM:
-

{code}


import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

static int numberTh = 200;
static CountDownLatch latch = new CountDownLatch(200);
public static void main(String[] args) throws IOException, 
InterruptedException {

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 4;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}
ExecutorService service =   new ThreadPoolExecutor(numberTh, 
numberTh,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(numberTh *2));

for(int i = 0 ; i < numberTh;i++){
service.execute(new 
MyProducer(producer,10,builder.toString(), topic));
}   
latch.await();

System.out.println("All Producers done...!");
for (int i = 0; i < producer.length; i++) {
producer[i].close();
}   
service.shutdownNow();
System.out.println("All done...!");

}



static class MyProducer implements Runnable {

Producer[] producer;
long maxloops;
String msg ;
String topic;

MyProducer(Producer[] list, long maxloops,String msg,String 
topic){
this.producer = list;
this.maxloops = maxloops;
this.msg = msg;
this.topic = topic;
}
public void run() {
ProducerRecord record = new ProducerRecord(topic, 
msg.toString().getBytes());
Callback  callBack = new  MyCallback();
try{
for(long j=0 ; j < maxloops ; j++){
try {
for (int i = 0; i < 
producer.length; i++) {

producer[i].send(record, callBack);
}
Thread.sleep(10);
} catch (Throwable th) {
System.err.println("FATAL ");
th.printStackTrace();
}
}

}finally {
latch.countDown();
}   
}
}   

static class MyCallback implements Callback {
public void onCompletion(RecordMetadata metadata, Exception 
exception) {
if(exception != null){
System.err.println("Msg dropped..!");
exception.printStackTrace();
}

}
}

}
{code}


{code }
# THIS IS FOR NEW PRODUCERS API TRUNK Please see the configuration at 
https://kafka.apache.org/documentation.html#newproducerconfigs
# Broker List
bootstrap.servers= BROKERS HERE...
#Data Acks
acks=1
# 64MB of Buffer for log lines (including all messages).
buff

[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-13 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169415#comment-14169415
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 10/13/14 4:10 PM:
-

{code}


import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

static int numberTh = 200;
static CountDownLatch latch = new CountDownLatch(200);
public static void main(String[] args) throws IOException, 
InterruptedException {

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 4;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}
ExecutorService service =   new ThreadPoolExecutor(numberTh, 
numberTh,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(numberTh *2));

for(int i = 0 ; i < numberTh;i++){
service.execute(new 
MyProducer(producer,10,builder.toString(), topic));
}   
latch.await();

System.out.println("All Producers done...!");
for (int i = 0; i < producer.length; i++) {
producer[i].close();
}   
service.shutdownNow();
System.out.println("All done...!");

}



static class MyProducer implements Runnable {

Producer[] producer;
long maxloops;
String msg ;
String topic;

MyProducer(Producer[] list, long maxloops,String msg,String 
topic){
this.producer = list;
this.maxloops = maxloops;
this.msg = msg;
this.topic = topic;
}
public void run() {
ProducerRecord record = new ProducerRecord(topic, 
msg.toString().getBytes());
Callback  callBack = new  MyCallback();
try{
for(long j=0 ; j < maxloops ; j++){
try {
for (int i = 0; i < 
producer.length; i++) {

producer[i].send(record, callBack);
}
Thread.sleep(10);
} catch (Throwable th) {
System.err.println("FATAL ");
th.printStackTrace();
}
}

}finally {
latch.countDown();
}   
}
}   

static class MyCallback implements Callback {
public void onCompletion(RecordMetadata metadata, Exception 
exception) {
if(exception != null){
System.err.println("Msg dropped..!");
exception.printStackTrace();
}

}
}

}
{code}

This is property file used:
{code }
# THIS IS FOR NEW PRODUCERS API TRUNK Please see the configuration at 
https://kafka.apache.org/documentation.html#newproducerconfigs
# Broker List
bootstrap.servers= BROKERS HERE...
#Data Acks
acks=1
# 64MB of Buffer for log lines (in

[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-13 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169415#comment-14169415
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 10/13/14 4:11 PM:
-

{code}


import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

static int numberTh = 200;
static CountDownLatch latch = new CountDownLatch(200);
public static void main(String[] args) throws IOException, 
InterruptedException {

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 4;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}
ExecutorService service =   new ThreadPoolExecutor(numberTh, 
numberTh,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(numberTh *2));

for(int i = 0 ; i < numberTh;i++){
service.execute(new 
MyProducer(producer,10,builder.toString(), topic));
}   
latch.await();

System.out.println("All Producers done...!");
for (int i = 0; i < producer.length; i++) {
producer[i].close();
}   
service.shutdownNow();
System.out.println("All done...!");

}



static class MyProducer implements Runnable {

Producer[] producer;
long maxloops;
String msg ;
String topic;

MyProducer(Producer[] list, long maxloops,String msg,String 
topic){
this.producer = list;
this.maxloops = maxloops;
this.msg = msg;
this.topic = topic;
}
public void run() {
ProducerRecord record = new ProducerRecord(topic, 
msg.toString().getBytes());
Callback  callBack = new  MyCallback();
try{
for(long j=0 ; j < maxloops ; j++){
try {
for (int i = 0; i < 
producer.length; i++) {

producer[i].send(record, callBack);
}
Thread.sleep(10);
} catch (Throwable th) {
System.err.println("FATAL ");
th.printStackTrace();
}
}

}finally {
latch.countDown();
}   
}
}   

static class MyCallback implements Callback {
public void onCompletion(RecordMetadata metadata, Exception 
exception) {
if(exception != null){
System.err.println("Msg dropped..!");
exception.printStackTrace();
}

}
}

}
{code}

This is property file used:
{code}
# THIS IS FOR NEW PRODUCERS API TRUNK Please see the configuration at 
https://kafka.apache.org/documentation.html#newproducerconfigs
# Broker List
bootstrap.servers= BROKERS HERE...
#Data Acks
acks=1
# 64MB of Buffer for log lines (inc

[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-13 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169453#comment-14169453
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

 [~jkreps]  Let me know if you need any other help !!

Thanks,
Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-13 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169415#comment-14169415
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 10/13/14 5:05 PM:
-

{code}


import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

static int numberTh = 200;
static CountDownLatch latch = new CountDownLatch(200);
public static void main(String[] args) throws IOException, 
InterruptedException {

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 4;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}
ExecutorService service =   new ThreadPoolExecutor(numberTh, 
numberTh,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(numberTh *2));

for(int i = 0 ; i < numberTh;i++){
service.execute(new 
MyProducer(producer,10,builder.toString(), topic));
}   
latch.await();

System.out.println("All Producers done...!");
for (int i = 0; i < producer.length; i++) {
producer[i].close();
}   
service.shutdownNow();
System.out.println("All done...!");

}



static class MyProducer implements Runnable {

Producer[] producer;
long maxloops;
String msg ;
String topic;

MyProducer(Producer[] list, long maxloops,String msg,String 
topic){
this.producer = list;
this.maxloops = maxloops;
this.msg = msg;
this.topic = topic;
}
public void run() {
ProducerRecord record = new ProducerRecord(topic, 
msg.toString().getBytes());
Callback  callBack = new  MyCallback();
try{
for(long j=0 ; j < maxloops ; j++){
try {
for (int i = 0; i < 
producer.length; i++) {

producer[i].send(record, callBack);
}
Thread.sleep(10);
} catch (Throwable th) {
System.err.println("FATAL ");
th.printStackTrace();
}
}

}finally {
latch.countDown();
}   
}
}   

static class MyCallback implements Callback {
public void onCompletion(RecordMetadata metadata, Exception 
exception) {
if(exception != null){
System.err.println("Msg dropped..!");
exception.printStackTrace();
}

}
}

}
{code}

This is property file used:
{code}
# THIS IS FOR NEW PRODUCERS API TRUNK Please see the configuration at 
https://kafka.apache.org/documentation.html#newproducerconfigs
# Broker List
bootstrap.servers= BROKERS HERE...
#Data Acks
acks=1
# 64MB of Buffer for log lines 

[jira] [Created] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)
Bhavesh Mistry created KAFKA-1710:
-

 Summary: [New Java Producer Potential Deadlock] Producer Deadlock 
when all messages is being sent to single partition
 Key: KAFKA-1710
 URL: https://issues.apache.org/jira/browse/KAFKA-1710
 Project: Kafka
  Issue Type: Bug
  Components: producer 
 Environment: Development
Reporter: Bhavesh Mistry
Assignee: Jun Rao
Priority: Critical


Hi Kafka Dev Team,

When I run the test to send message to single partition for 3 minutes or so on, 
I have encounter deadlock (please see the screen attached) and thread 
contention from YourKit profiling.  

Use Case:

1)  Aggregating messages into same partition for metric counting. 
2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.


Here is output:

Frozen threads found (potential deadlock)
 
It seems that the following threads have not changed their stack for more than 
10 seconds.
These threads are possibly (but not necessarily!) in a deadlock or hung.
 
pool-1-thread-128 <--- Frozen for at least 2m
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:237
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:84
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-159 <--- Frozen for at least 2m 1 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:237
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:84
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-55 <--- Frozen for at least 2m
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:237
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:84
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744

Thanks,

Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1709) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)
Bhavesh Mistry created KAFKA-1709:
-

 Summary: [New Java Producer Potential Deadlock] Producer Deadlock 
when all messages is being sent to single partition
 Key: KAFKA-1709
 URL: https://issues.apache.org/jira/browse/KAFKA-1709
 Project: Kafka
  Issue Type: Bug
  Components: producer 
 Environment: Development
Reporter: Bhavesh Mistry
Assignee: Jun Rao
Priority: Critical


Hi Kafka Dev Team,

When I run the test to send message to single partition for 3 minutes or so on, 
I have encounter deadlock (please see the screen attached) and thread 
contention from YourKit profiling.  

Use Case:

1)  Aggregating messages into same partition for metric counting. 
2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.


Here is output:

Frozen threads found (potential deadlock)
 
It seems that the following threads have not changed their stack for more than 
10 seconds.
These threads are possibly (but not necessarily!) in a deadlock or hung.
 
pool-1-thread-128 <--- Frozen for at least 2m
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:237
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:84
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-159 <--- Frozen for at least 2m 1 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:237
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:84
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-55 <--- Frozen for at least 2m
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:237
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:84
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744

Thanks,

Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavesh Mistry updated KAFKA-1710:
--
Attachment: TestNetworkDownProducer.java

Java Test Program to Reproduce this issue.

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>Priority: Critical
>  Labels: performance
> Attachments: TestNetworkDownProducer.java
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavesh Mistry updated KAFKA-1710:
--
Attachment: Screen Shot 2014-10-15 at 9.09.06 PM.png
Screen Shot 2014-10-13 at 10.19.04 AM.png

Your Kit Thread view show thread contentions...

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, TestNetworkDownProducer.java
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173330#comment-14173330
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

Here is out put of Yourkit:

{code}
Frozen threads found (potential deadlock)
 
It seems that the following threads have not changed their stack for more than 
10 seconds.
These threads are possibly (but not necessarily!) in a deadlock or hung.
 
kafka-producer-network-thread <--- Frozen for at least 14 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.ready(Cluster, 
long) RecordAccumulator.java:214
org.apache.kafka.clients.producer.internals.Sender.run(long) Sender.java:147
org.apache.kafka.clients.producer.internals.Sender.run() Sender.java:115
java.lang.Thread.run() Thread.java:744



pool-1-thread-106 <--- Frozen for at least 20 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-15 <--- Frozen for at least 13 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-161 <--- Frozen for at least 13 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-165 <--- Frozen for at least 17 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-172 <--- Frozen for at least 20 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-184 <--- Frozen for at least 11 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-26 <--- Frozen for at least 11 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 

[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173332#comment-14173332
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

MORE OutPut:

{code}
Frozen threads found (potential deadlock)
 
It seems that the following threads have not changed their stack for more than 
10 seconds.
These threads are possibly (but not necessarily!) in a deadlock or hung.
 
pool-1-thread-108 <--- Frozen for at least 12 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-113 <--- Frozen for at least 13 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-118 <--- Frozen for at least 16 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-138 <--- Frozen for at least 12 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-151 <--- Frozen for at least 22 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-155 <--- Frozen for at least 13 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-160 <--- Frozen for at least 13 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) 
KafkaProducer.java:238
org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
TestNetworkDownProducer.java:85
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1145
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615
java.lang.Thread.run() Thread.java:744



pool-1-thread-163 <--- Frozen for at least 12 sec
org.apache.kafka.clients.producer.internals.RecordAccumulator.app

[jira] [Updated] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavesh Mistry updated KAFKA-1710:
--
Attachment: Screen Shot 2014-10-15 at 9.14.15 PM.png

Your Kit Monitor Screen shot:

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavesh Mistry updated KAFKA-1710:
--
Attachment: th15.dump
th14.dump
th13.dump
th12.dump
th11.dump
th10.dump
th9.dump
th8.dump
th7.dump
th6.dump
th5.dump
th4.dump
th3.dump
th2.dump
th1.dump

JStack Thread dumps.

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173344#comment-14173344
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

I am not able to attached yourkit profiler snapshot.  I get following error:

TestNetworkDownProducer-2014-10-15-2.snapshot is too large to attach. 
Attachment is 28.19 MB but the largest allowed attachment is 10.00 MB.

Thanks,
Bhavesh 

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173344#comment-14173344
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 4:36 AM:
-

I am not able to attached yourkit profiler snapshot.  So I have uploaded to git 
hub 
https://github.com/bmistry13/kafka-trunk-producer/blob/master/TestNetworkDownProducer-2014-10-15-3.snapshot

Thanks,
Bhavesh 


was (Author: bmis13):
I am not able to attached yourkit profiler snapshot.  I get following error:

TestNetworkDownProducer-2014-10-15-2.snapshot is too large to attach. 
Attachment is 28.19 MB but the largest allowed attachment is 10.00 MB.

Thanks,
Bhavesh 

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173344#comment-14173344
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 4:40 AM:
-

[~jkreps] and [~junrao],  

I am not able to attached yourkit profiler snapshot.  So I have uploaded to git 
hub 
https://github.com/bmistry13/kafka-trunk-producer/blob/master/TestNetworkDownProducer-2014-10-15-3.snapshot

Let me know if you need more details. 

Thanks,
Bhavesh 


was (Author: bmis13):
I am not able to attached yourkit profiler snapshot.  So I have uploaded to git 
hub 
https://github.com/bmistry13/kafka-trunk-producer/blob/master/TestNetworkDownProducer-2014-10-15-3.snapshot

Thanks,
Bhavesh 

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173368#comment-14173368
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 4:54 AM:
-

Here is property file used for testing:

{code}
# THIS IS FOR NEW PRODUCERS API TRUNK Please see the configuration at 
https://kafka.apache.org/documentation.html#newproducerconfigs
# Broker List
bootstrap.servers=[list here]
#Data Acks
acks=0
# 64MB of Buffer for log lines (including all messages).
buffer.memory=134217728
compression.type=snappy
retries=3
# DEFAULT FROM THE KAFKA...
# batch size =  ((buffer.memory) / (number of partitions)) (so we can have in 
progress batch size created for each partition.).
batch.size=1048576
#2MiB
max.request.size=1048576
send.buffer.bytes=2097152
# We do not want to block the buffer Full so application thread will not be 
blocked but logs lines will be dropped...
block.on.buffer.full=false
#2MiB
send.buffer.bytes=2097152

#wait...
linger.ms=360
{code}


was (Author: bmis13):
Here is property file used for testing:

{code}
# THIS IS FOR NEW PRODUCERS API TRUNK Please see the configuration at 
https://kafka.apache.org/documentation.html#newproducerconfigs
# Broker List
bootstrap.servers=dare-msgq00.sv.walmartlabs.com:9092,dare-msgq01.sv.walmartlabs.com:9092,dare-msgq02.sv.walmartlabs.com:9092
#Data Acks
acks=0
# 64MB of Buffer for log lines (including all messages).
buffer.memory=134217728
compression.type=snappy
retries=3
# DEFAULT FROM THE KAFKA...
# batch size =  ((buffer.memory) / (number of partitions)) (so we can have in 
progress batch size created for each partition.).
batch.size=1048576
#2MiB
max.request.size=1048576
send.buffer.bytes=2097152
# We do not want to block the buffer Full so application thread will not be 
blocked but logs lines will be dropped...
block.on.buffer.full=false
#2MiB
send.buffer.bytes=2097152

#wait...
linger.ms=360
{code}

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAcc

[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-15 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173368#comment-14173368
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

Here is property file used for testing:

{code}
# THIS IS FOR NEW PRODUCERS API TRUNK Please see the configuration at 
https://kafka.apache.org/documentation.html#newproducerconfigs
# Broker List
bootstrap.servers=dare-msgq00.sv.walmartlabs.com:9092,dare-msgq01.sv.walmartlabs.com:9092,dare-msgq02.sv.walmartlabs.com:9092
#Data Acks
acks=0
# 64MB of Buffer for log lines (including all messages).
buffer.memory=134217728
compression.type=snappy
retries=3
# DEFAULT FROM THE KAFKA...
# batch size =  ((buffer.memory) / (number of partitions)) (so we can have in 
progress batch size created for each partition.).
batch.size=1048576
#2MiB
max.request.size=1048576
send.buffer.bytes=2097152
# We do not want to block the buffer Full so application thread will not be 
blocked but logs lines will be dropped...
block.on.buffer.full=false
#2MiB
send.buffer.bytes=2097152

#wait...
linger.ms=360
{code}

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174278#comment-14174278
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

[~ewencp],

Thanks for looking into this.  If you look at the thread dump, you will see the 
blocked threads as well.  As this particular code exposes the Thread 
contentions in the Kafka Producer.  We have this issues when we aggregate use 
case.  It would be great if you into alternative to synchronization block.

{code}
 synchronized (dq) {
..
}
{code}

Thanks,

Bhavesh 

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174278#comment-14174278
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 9:32 PM:
-

[~ewencp],

Thanks for looking into this.  If you look at the thread dump, you will see the 
blocked threads as well.  As this particular code exposes the Thread 
contentions in the Kafka Producer.  We have this issues when we aggregate event 
to send to same partition regardless of number of producers.  It would be great 
if you into alternative implementation to synchronization block.  That is root 
of the problem.

{code title=RecordAccumulator.java|borderStyle=solid}
 synchronized (dq) {
..
}
{code}

Do you think it would be better to do this following way ?
{code title=KafkaAsyncProducer.java|borderStyle=solid }
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Future;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.Metric;
import org.apache.kafka.common.PartitionInfo;

public class KafkaAsyncProducer implements Producer {

// TODO configure this queue
private final LinkedBlockingQueue asyncQueue; 
private final KafkaProducer producer;
private final List threadList;
private final CountDownLatch latch;

private final AtomicBoolean close = new AtomicBoolean(false);

public KafkaAsyncProducer(int capacity, int numberOfDrainTreads,
Properties configFile ){
if(configFile == null){
throw new NullPointerException("Producer configuration 
cannot be null");
}
// set the capacity for the queue
asyncQueue = new LinkedBlockingQueue(capacity);
producer = new KafkaProducer(configFile);
threadList = new ArrayList(numberOfDrainTreads);
latch = new CountDownLatch(numberOfDrainTreads);
// start the drain threads...
for(int i =0 ; i < numberOfDrainTreads ; i ++){
Thread th = new Thread(new 
ConsumerThread(),"Kafka_Drain-" +i);
th.setDaemon(true);
threadList.add(th);
th.start();
}

}



public Future send(ProducerRecord record) {
try {
if(record == null){
throw new NullPointerException("Null record 
cannot be sent.");
}
if(close.get()){
throw new KafkaException("Producer aready 
closed or in processec of closing...");
}
asyncQueue.put(record);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}   
return null;
}

public Future send(ProducerRecord record, Callback 
callback) {
throw new UnsupportedOperationException("Send not supported");
}

public List partitionsFor(String topic) {
// TODO Auto-generated method stub
return null;
}

public Map metrics() {

return producer.metrics();
}

public void close() {
close.compareAndSet(false, true);
// wait for drain threads to finish
try {
latch.await();
// now drain the remaining messages
while(!asyncQueue.isEmpty()){
ProducerRecord record  = asyncQueue.poll();
producer.send(record);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
producer.close();
}

private class ConsumerThread implements Runnable{
public void run() {
try{
while(!close.get()){
ProducerRecord record;
try {
   

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174278#comment-14174278
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 9:33 PM:
-

[~ewencp],

Thanks for looking into this.  If you look at the thread dump, you will see the 
blocked threads as well.  As this particular code exposes the Thread 
contentions in the Kafka Producer.  We have this issues when we aggregate event 
to send to same partition regardless of number of producers.  It would be great 
if you into alternative implementation to synchronization block.  That is root 
of the problem.

{code title=RecordAccumulator.java|borderStyle=solid}
 synchronized (dq) {
  
}
{code}

Do you think it would be better to do this following way ?
{code title=KafkaAsyncProducer.java|borderStyle=solid }
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Future;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.Metric;
import org.apache.kafka.common.PartitionInfo;

public class KafkaAsyncProducer implements Producer {

// TODO configure this queue
private final LinkedBlockingQueue asyncQueue; 
private final KafkaProducer producer;
private final List threadList;
private final CountDownLatch latch;

private final AtomicBoolean close = new AtomicBoolean(false);

public KafkaAsyncProducer(int capacity, int numberOfDrainTreads,
Properties configFile ){
if(configFile == null){
throw new NullPointerException("Producer configuration 
cannot be null");
}
// set the capacity for the queue
asyncQueue = new LinkedBlockingQueue(capacity);
producer = new KafkaProducer(configFile);
threadList = new ArrayList(numberOfDrainTreads);
latch = new CountDownLatch(numberOfDrainTreads);
// start the drain threads...
for(int i =0 ; i < numberOfDrainTreads ; i ++){
Thread th = new Thread(new 
ConsumerThread(),"Kafka_Drain-" +i);
th.setDaemon(true);
threadList.add(th);
th.start();
}

}



public Future send(ProducerRecord record) {
try {
if(record == null){
throw new NullPointerException("Null record 
cannot be sent.");
}
if(close.get()){
throw new KafkaException("Producer aready 
closed or in processec of closing...");
}
asyncQueue.put(record);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}   
return null;
}

public Future send(ProducerRecord record, Callback 
callback) {
throw new UnsupportedOperationException("Send not supported");
}

public List partitionsFor(String topic) {
// TODO Auto-generated method stub
return null;
}

public Map metrics() {

return producer.metrics();
}

public void close() {
close.compareAndSet(false, true);
// wait for drain threads to finish
try {
latch.await();
// now drain the remaining messages
while(!asyncQueue.isEmpty()){
ProducerRecord record  = asyncQueue.poll();
producer.send(record);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
producer.close();
}

private class ConsumerThread implements Runnable{
public void run() {
try{
while(!close.get()){
ProducerRecord record;
try {
   

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174278#comment-14174278
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 9:34 PM:
-

[~ewencp],

Thanks for looking into this.  If you look at the thread dump, you will see the 
blocked threads as well.  As this particular code exposes the Thread 
contentions in the Kafka Producer.  We have this issues when we aggregate event 
to send to same partition regardless of number of producers.  It would be great 
if you into alternative implementation to synchronization block.  That is root 
of the problem.
synchronized (dq) {
  
}

Do you think it would be better to do this following way ?
{code title=KafkaAsyncProducer.java|borderStyle=solid}
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Future;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.Metric;
import org.apache.kafka.common.PartitionInfo;

public class KafkaAsyncProducer implements Producer {

// TODO configure this queue
private final LinkedBlockingQueue asyncQueue; 
private final KafkaProducer producer;
private final List threadList;
private final CountDownLatch latch;

private final AtomicBoolean close = new AtomicBoolean(false);

public KafkaAsyncProducer(int capacity, int numberOfDrainTreads,
Properties configFile ){
if(configFile == null){
throw new NullPointerException("Producer configuration 
cannot be null");
}
// set the capacity for the queue
asyncQueue = new LinkedBlockingQueue(capacity);
producer = new KafkaProducer(configFile);
threadList = new ArrayList(numberOfDrainTreads);
latch = new CountDownLatch(numberOfDrainTreads);
// start the drain threads...
for(int i =0 ; i < numberOfDrainTreads ; i ++){
Thread th = new Thread(new 
ConsumerThread(),"Kafka_Drain-" +i);
th.setDaemon(true);
threadList.add(th);
th.start();
}

}



public Future send(ProducerRecord record) {
try {
if(record == null){
throw new NullPointerException("Null record 
cannot be sent.");
}
if(close.get()){
throw new KafkaException("Producer aready 
closed or in processec of closing...");
}
asyncQueue.put(record);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}   
return null;
}

public Future send(ProducerRecord record, Callback 
callback) {
throw new UnsupportedOperationException("Send not supported");
}

public List partitionsFor(String topic) {
// TODO Auto-generated method stub
return null;
}

public Map metrics() {

return producer.metrics();
}

public void close() {
close.compareAndSet(false, true);
// wait for drain threads to finish
try {
latch.await();
// now drain the remaining messages
while(!asyncQueue.isEmpty()){
ProducerRecord record  = asyncQueue.poll();
producer.send(record);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
producer.close();
}

private class ConsumerThread implements Runnable{
public void run() {
try{
while(!close.get()){
ProducerRecord record;
try {
record = asyncQueue.take();
   

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174278#comment-14174278
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 9:33 PM:
-

[~ewencp],

Thanks for looking into this.  If you look at the thread dump, you will see the 
blocked threads as well.  As this particular code exposes the Thread 
contentions in the Kafka Producer.  We have this issues when we aggregate event 
to send to same partition regardless of number of producers.  It would be great 
if you into alternative implementation to synchronization block.  That is root 
of the problem.
synchronized (dq) {
  
}

Do you think it would be better to do this following way ?
{code title=KafkaAsyncProducer.java|borderStyle=solid }
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Future;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.Metric;
import org.apache.kafka.common.PartitionInfo;

public class KafkaAsyncProducer implements Producer {

// TODO configure this queue
private final LinkedBlockingQueue asyncQueue; 
private final KafkaProducer producer;
private final List threadList;
private final CountDownLatch latch;

private final AtomicBoolean close = new AtomicBoolean(false);

public KafkaAsyncProducer(int capacity, int numberOfDrainTreads,
Properties configFile ){
if(configFile == null){
throw new NullPointerException("Producer configuration 
cannot be null");
}
// set the capacity for the queue
asyncQueue = new LinkedBlockingQueue(capacity);
producer = new KafkaProducer(configFile);
threadList = new ArrayList(numberOfDrainTreads);
latch = new CountDownLatch(numberOfDrainTreads);
// start the drain threads...
for(int i =0 ; i < numberOfDrainTreads ; i ++){
Thread th = new Thread(new 
ConsumerThread(),"Kafka_Drain-" +i);
th.setDaemon(true);
threadList.add(th);
th.start();
}

}



public Future send(ProducerRecord record) {
try {
if(record == null){
throw new NullPointerException("Null record 
cannot be sent.");
}
if(close.get()){
throw new KafkaException("Producer aready 
closed or in processec of closing...");
}
asyncQueue.put(record);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}   
return null;
}

public Future send(ProducerRecord record, Callback 
callback) {
throw new UnsupportedOperationException("Send not supported");
}

public List partitionsFor(String topic) {
// TODO Auto-generated method stub
return null;
}

public Map metrics() {

return producer.metrics();
}

public void close() {
close.compareAndSet(false, true);
// wait for drain threads to finish
try {
latch.await();
// now drain the remaining messages
while(!asyncQueue.isEmpty()){
ProducerRecord record  = asyncQueue.poll();
producer.send(record);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
producer.close();
}

private class ConsumerThread implements Runnable{
public void run() {
try{
while(!close.get()){
ProducerRecord record;
try {
record = asyncQueue.take();
  

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174278#comment-14174278
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 9:34 PM:
-

[~ewencp],

Thanks for looking into this.  If you look at the thread dump, you will see the 
blocked threads as well.  As this particular code exposes the Thread 
contentions in the Kafka Producer.  We have this issues when we aggregate event 
to send to same partition regardless of number of producers.  It would be great 
if you into alternative implementation to synchronization block.  That is root 
of the problem.
synchronized (dq) {
  
}

Do you think it would be better to do this following way ?
{code title=KafkaAsyncProducer.java}
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Future;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.Metric;
import org.apache.kafka.common.PartitionInfo;

public class KafkaAsyncProducer implements Producer {

// TODO configure this queue
private final LinkedBlockingQueue asyncQueue; 
private final KafkaProducer producer;
private final List threadList;
private final CountDownLatch latch;

private final AtomicBoolean close = new AtomicBoolean(false);

public KafkaAsyncProducer(int capacity, int numberOfDrainTreads,
Properties configFile ){
if(configFile == null){
throw new NullPointerException("Producer configuration 
cannot be null");
}
// set the capacity for the queue
asyncQueue = new LinkedBlockingQueue(capacity);
producer = new KafkaProducer(configFile);
threadList = new ArrayList(numberOfDrainTreads);
latch = new CountDownLatch(numberOfDrainTreads);
// start the drain threads...
for(int i =0 ; i < numberOfDrainTreads ; i ++){
Thread th = new Thread(new 
ConsumerThread(),"Kafka_Drain-" +i);
th.setDaemon(true);
threadList.add(th);
th.start();
}

}



public Future send(ProducerRecord record) {
try {
if(record == null){
throw new NullPointerException("Null record 
cannot be sent.");
}
if(close.get()){
throw new KafkaException("Producer aready 
closed or in processec of closing...");
}
asyncQueue.put(record);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}   
return null;
}

public Future send(ProducerRecord record, Callback 
callback) {
throw new UnsupportedOperationException("Send not supported");
}

public List partitionsFor(String topic) {
// TODO Auto-generated method stub
return null;
}

public Map metrics() {

return producer.metrics();
}

public void close() {
close.compareAndSet(false, true);
// wait for drain threads to finish
try {
latch.await();
// now drain the remaining messages
while(!asyncQueue.isEmpty()){
ProducerRecord record  = asyncQueue.poll();
producer.send(record);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
producer.close();
}

private class ConsumerThread implements Runnable{
public void run() {
try{
while(!close.get()){
ProducerRecord record;
try {
record = asyncQueue.take();
 

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174278#comment-14174278
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 9:34 PM:
-

[~ewencp],

Thanks for looking into this.  If you look at the thread dump, you will see the 
blocked threads as well.  As this particular code exposes the Thread 
contentions in the Kafka Producer.  We have this issues when we aggregate event 
to send to same partition regardless of number of producers.  It would be great 
if you into alternative implementation to synchronization block.  That is root 
of the problem.
synchronized (dq) {
  
}

Do you think it would be better to do this following way ?
{code}
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Future;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.Metric;
import org.apache.kafka.common.PartitionInfo;

public class KafkaAsyncProducer implements Producer {

// TODO configure this queue
private final LinkedBlockingQueue asyncQueue; 
private final KafkaProducer producer;
private final List threadList;
private final CountDownLatch latch;

private final AtomicBoolean close = new AtomicBoolean(false);

public KafkaAsyncProducer(int capacity, int numberOfDrainTreads,
Properties configFile ){
if(configFile == null){
throw new NullPointerException("Producer configuration 
cannot be null");
}
// set the capacity for the queue
asyncQueue = new LinkedBlockingQueue(capacity);
producer = new KafkaProducer(configFile);
threadList = new ArrayList(numberOfDrainTreads);
latch = new CountDownLatch(numberOfDrainTreads);
// start the drain threads...
for(int i =0 ; i < numberOfDrainTreads ; i ++){
Thread th = new Thread(new 
ConsumerThread(),"Kafka_Drain-" +i);
th.setDaemon(true);
threadList.add(th);
th.start();
}

}



public Future send(ProducerRecord record) {
try {
if(record == null){
throw new NullPointerException("Null record 
cannot be sent.");
}
if(close.get()){
throw new KafkaException("Producer aready 
closed or in processec of closing...");
}
asyncQueue.put(record);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}   
return null;
}

public Future send(ProducerRecord record, Callback 
callback) {
throw new UnsupportedOperationException("Send not supported");
}

public List partitionsFor(String topic) {
// TODO Auto-generated method stub
return null;
}

public Map metrics() {

return producer.metrics();
}

public void close() {
close.compareAndSet(false, true);
// wait for drain threads to finish
try {
latch.await();
// now drain the remaining messages
while(!asyncQueue.isEmpty()){
ProducerRecord record  = asyncQueue.poll();
producer.send(record);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
producer.close();
}

private class ConsumerThread implements Runnable{
public void run() {
try{
while(!close.get()){
ProducerRecord record;
try {
record = asyncQueue.take();
if(record != nu

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174278#comment-14174278
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 9:36 PM:
-

[~ewencp],

Thanks for looking into this.  If you look at the thread dump, you will see the 
blocked threads as well.  As this particular code exposes the Thread 
contentions in the Kafka Producer.  We have this issues when we aggregate event 
to send to same partition regardless of number of producers.  It would be great 
if you into alternative implementation to synchronization block.Test code 
amplifies the root cause.

That is root of the problem.
synchronized (dq) {
  
}

Do you think it would be better to do this following way ?
{code}
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Future;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.Metric;
import org.apache.kafka.common.PartitionInfo;

public class KafkaAsyncProducer implements Producer {

// TODO configure this queue
private final LinkedBlockingQueue asyncQueue; 
private final KafkaProducer producer;
private final List threadList;
private final CountDownLatch latch;

private final AtomicBoolean close = new AtomicBoolean(false);

public KafkaAsyncProducer(int capacity, int numberOfDrainTreads,
Properties configFile ){
if(configFile == null){
throw new NullPointerException("Producer configuration 
cannot be null");
}
// set the capacity for the queue
asyncQueue = new LinkedBlockingQueue(capacity);
producer = new KafkaProducer(configFile);
threadList = new ArrayList(numberOfDrainTreads);
latch = new CountDownLatch(numberOfDrainTreads);
// start the drain threads...
for(int i =0 ; i < numberOfDrainTreads ; i ++){
Thread th = new Thread(new 
ConsumerThread(),"Kafka_Drain-" +i);
th.setDaemon(true);
threadList.add(th);
th.start();
}

}



public Future send(ProducerRecord record) {
try {
if(record == null){
throw new NullPointerException("Null record 
cannot be sent.");
}
if(close.get()){
throw new KafkaException("Producer aready 
closed or in processec of closing...");
}
asyncQueue.put(record);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}   
return null;
}

public Future send(ProducerRecord record, Callback 
callback) {
throw new UnsupportedOperationException("Send not supported");
}

public List partitionsFor(String topic) {
// TODO Auto-generated method stub
return null;
}

public Map metrics() {

return producer.metrics();
}

public void close() {
close.compareAndSet(false, true);
// wait for drain threads to finish
try {
latch.await();
// now drain the remaining messages
while(!asyncQueue.isEmpty()){
ProducerRecord record  = asyncQueue.poll();
producer.send(record);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
producer.close();
}

private class ConsumerThread implements Runnable{
public void run() {
try{
while(!close.get()){
ProducerRecord record;
try {
record = asyncQueue.take();

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174278#comment-14174278
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 9:38 PM:
-

[~ewencp],

Thanks for looking into this.  If you look at the thread dump, you will see the 
blocked threads as well.  As this particular code exposes the Thread 
contentions in the Kafka Producer.  We have this issues when we aggregate event 
to send to same partition regardless of number of producers.  It would be great 
if you can look into alternative implementation to synchronization block.
Test code amplifies the root cause.

That is root of the problem.
synchronized (dq) {
  
}

Do you think it would be better to do this following way ?
{code}
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Future;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.Metric;
import org.apache.kafka.common.PartitionInfo;

public class KafkaAsyncProducer implements Producer {

// TODO configure this queue
private final LinkedBlockingQueue asyncQueue; 
private final KafkaProducer producer;
private final List threadList;
private final CountDownLatch latch;

private final AtomicBoolean close = new AtomicBoolean(false);

public KafkaAsyncProducer(int capacity, int numberOfDrainTreads,
Properties configFile ){
if(configFile == null){
throw new NullPointerException("Producer configuration 
cannot be null");
}
// set the capacity for the queue
asyncQueue = new LinkedBlockingQueue(capacity);
producer = new KafkaProducer(configFile);
threadList = new ArrayList(numberOfDrainTreads);
latch = new CountDownLatch(numberOfDrainTreads);
// start the drain threads...
for(int i =0 ; i < numberOfDrainTreads ; i ++){
Thread th = new Thread(new 
ConsumerThread(),"Kafka_Drain-" +i);
th.setDaemon(true);
threadList.add(th);
th.start();
}

}



public Future send(ProducerRecord record) {
try {
if(record == null){
throw new NullPointerException("Null record 
cannot be sent.");
}
if(close.get()){
throw new KafkaException("Producer aready 
closed or in processec of closing...");
}
asyncQueue.put(record);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}   
return null;
}

public Future send(ProducerRecord record, Callback 
callback) {
throw new UnsupportedOperationException("Send not supported");
}

public List partitionsFor(String topic) {
// TODO Auto-generated method stub
return null;
}

public Map metrics() {

return producer.metrics();
}

public void close() {
close.compareAndSet(false, true);
// wait for drain threads to finish
try {
latch.await();
// now drain the remaining messages
while(!asyncQueue.isEmpty()){
ProducerRecord record  = asyncQueue.poll();
producer.send(record);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
producer.close();
}

private class ConsumerThread implements Runnable{
public void run() {
try{
while(!close.get()){
ProducerRecord record;
try {
record = asyncQueue.take();
   

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174278#comment-14174278
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/16/14 9:56 PM:
-

[~ewencp],

Thanks for looking into this.  If you look at the thread dump, you will see the 
blocked threads as well.  As this particular code exposes the Thread 
contentions in the Kafka Producer.  We have this issues when we aggregate event 
to send to same partition regardless of number of producers.  It would be great 
if you can look into alternative implementation to synchronization block.
Test code amplifies the root cause.

That is root of the problem.
synchronized (dq) {
  
}

Do you think it would be better to do this following way ?
{code}
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Future;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.Metric;
import org.apache.kafka.common.PartitionInfo;

public class KafkaAsyncProducer implements Producer {

// TODO configure this queue
private final LinkedBlockingQueue asyncQueue; 
private final KafkaProducer producer;
private final List threadList;
private final CountDownLatch latch;

private final AtomicBoolean close = new AtomicBoolean(false);

public KafkaAsyncProducer(int capacity, int numberOfDrainTreads,
Properties configFile ){
if(configFile == null){
throw new NullPointerException("Producer configuration 
cannot be null");
}
// set the capacity for the queue
asyncQueue = new LinkedBlockingQueue(capacity);
producer = new KafkaProducer(configFile);
threadList = new ArrayList(numberOfDrainTreads);
latch = new CountDownLatch(numberOfDrainTreads);
// start the drain threads...
for(int i =0 ; i < numberOfDrainTreads ; i ++){
Thread th = new Thread(new 
ConsumerThread(),"Kafka_Drain-" +i);
th.setDaemon(true);
threadList.add(th);
th.start();
}

}



public Future send(ProducerRecord record) {
try {
if(record == null){
throw new NullPointerException("Null record 
cannot be sent.");
}
if(close.get()){
throw new KafkaException("Producer aready 
closed or in processec of closing...");
}
asyncQueue.put(record);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}   
return null;
}

public Future send(ProducerRecord record, Callback 
callback) {
throw new UnsupportedOperationException("Send not supported");
}

public List partitionsFor(String topic) {
// TODO Auto-generated method stub
return null;
}

public Map metrics() {

return producer.metrics();
}

public void close() {
close.compareAndSet(false, true);
// wait for drain threads to finish
try {
latch.await();
// now drain the remaining messages
while(!asyncQueue.isEmpty()){
ProducerRecord record  = asyncQueue.poll();
producer.send(record);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
producer.close();
}

private class ConsumerThread implements Runnable{
public void run() {
try{
while(!close.get()){
ProducerRecord record;
try {
record = asyncQueue.take();
   

[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174403#comment-14174403
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

[~ewencp],

Thanks for the looking into this issue.  We consume as fast as we can 
re-publish the message to another aggregated topic based on some kes in 
message. We  see thread contentions in profile tool and I separated out the 
code and to amplify the problem.  We run with about 75 threads.  [~ewencp] can 
you please discuss this issue with Kafka Community as well ?  The dead lock 
will occur something depending on Thread scheduling  and how log the are 
blocked.  All I am asking is there a better way to enqueue in coming messages.  
I just proposed simple above solution that does not impact application threads 
and only drain threads will be blocked and with buffer as you mentioned we 
might get better through-put (of course at expense of buffered memory 
(unbounded concurrent queue)  and thread context switching) .If you feel 
this is know performance issue to send to to single partition then please close 
this, and you may start discussion on Kafka Community for this issue.  Thanks 
for your help and suggestions  !! 

According to thread dumps, blocks are happening  in Synchronization block.  
{code}
"pool-1-thread-200" prio=5 tid=0x7f92451c2000 nid=0x20103 waiting for 
monitor entry [0x00012d228000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:139)
- waiting to lock <0x000703ce39f0> (a java.util.ArrayDeque)
at 
org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:238)
at 
org.kafka.test.TestNetworkDownProducer$MyProducer.run(TestNetworkDownProducer.java:85)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

"pool-1-thread-199" prio=5 tid=0x7f92451c1800 nid=0x1ff03 waiting for 
monitor entry [0x00012d0e5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:139)
- waiting to lock <0x000703ce39f0> (a java.util.ArrayDeque)
at 
org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:238)
at 
org.kafka.test.TestNetworkDownProducer$MyProducer.run(TestNetworkDownProducer.java:85)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurren

[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174462#comment-14174462
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

[~jkreps],

Did you get chance to re-produce the problem ?  Has someone else reported this 
issues or similar issue ?

Thanks,

Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Jun Rao
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175548#comment-14175548
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

[~ewencp],

Thank you for entertaining this issue and you may close this.  I do agree with 
you if I increase number of producers then throughput will be alleviated  
(thread contention to critical block) at expense of TCP connections, memory 
etc.  

Do you think it would be good to open another jira issues or story for 
improving performance when sending to single partition for some time to avoid 
Thread contention?  Please let me know if I should open the performance aspect 
of New Producer.

Thanks,
Bhavesh

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175548#comment-14175548
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/17/14 9:19 PM:
-

[~ewencp],

Thank you for entertaining this issue and you may close this.  I do agree with 
you if I increase number of producers then throughput will be alleviated  
(thread contention to critical block) at expense of TCP connections, memory 
etc.  

Do you think it would be good to open another jira issues or story for 
improving performance when sending to single partition for some time to avoid 
Thread contention?  Please let me know if I should open the performance aspect 
of New Producer.  Only request is to make  New Producer truly Async to enqueue 
the message regardless of  message key or partition number.

Thanks,
Bhavesh


was (Author: bmis13):
[~ewencp],

Thank you for entertaining this issue and you may close this.  I do agree with 
you if I increase number of producers then throughput will be alleviated  
(thread contention to critical block) at expense of TCP connections, memory 
etc.  

Do you think it would be good to open another jira issues or story for 
improving performance when sending to single partition for some time to avoid 
Thread contention?  Please let me know if I should open the performance aspect 
of New Producer.

Thanks,
Bhavesh

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() 

[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175574#comment-14175574
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

[~jkreps],

Only request is to make New Producer truly Async to enqueue the message 
regardless of message key hashcode or partition number for the message.   The 
new Producer is far far better than old Scala producer. ( I have worked both 
with new and old producers/consumer and entire linked-in pipeline)  But new 
producer inherit the same problem that old producer had thread contention when 
queuing message into buffer.   I think Kafka Dev team can do better because 
this use case of aggregating events into single partition is widely used. 
What my plan is to replace the Steam processing framework with Kafka is 
possible (For Aggregation and counting metrics etc)   We currently use 
following steam processor, but it has lots of down fall and only distribute the 
load which Kafka Brokers provide.  Any way this is our use case.

https://github.com/walmartlabs/mupd8
http://vldb.org/pvldb/vol5/p1814_wanglam_vldb2012.pdf 

Thanks,
Bhavesh


> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175574#comment-14175574
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/17/14 9:32 PM:
-

[~jkreps],

Only request is to make New Producer truly Async to enqueue the message 
regardless of message key hashcode or partition number for the message.   The 
new Producer is far far better than old Scala producer. ( I have worked on both 
new and old producers/consumer and entire linked-in pipeline)  But new producer 
inherit the same problem that old producer had thread contention when queuing 
message into buffer.   I think Kafka Dev team can do better because this use 
case of aggregating events into single partition is widely used. 
What my plan is to replace the Steam processing framework with Kafka is 
possible (For Aggregation and counting metrics etc)   We currently use 
following steam processor, but it has lots of down fall and only distribute the 
load which Kafka Brokers provide.  Any way this is our use case.

https://github.com/walmartlabs/mupd8
http://vldb.org/pvldb/vol5/p1814_wanglam_vldb2012.pdf 

Thanks,
Bhavesh



was (Author: bmis13):
[~jkreps],

Only request is to make New Producer truly Async to enqueue the message 
regardless of message key hashcode or partition number for the message.   The 
new Producer is far far better than old Scala producer. ( I have worked both 
with new and old producers/consumer and entire linked-in pipeline)  But new 
producer inherit the same problem that old producer had thread contention when 
queuing message into buffer.   I think Kafka Dev team can do better because 
this use case of aggregating events into single partition is widely used. 
What my plan is to replace the Steam processing framework with Kafka is 
possible (For Aggregation and counting metrics etc)   We currently use 
following steam processor, but it has lots of down fall and only distribute the 
load which Kafka Brokers provide.  Any way this is our use case.

https://github.com/walmartlabs/mupd8
http://vldb.org/pvldb/vol5/p1814_wanglam_vldb2012.pdf 

Thanks,
Bhavesh


> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concu

[jira] [Commented] (KAFKA-1721) Snappy compressor is not thread safe

2014-10-21 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178856#comment-14178856
 ] 

Bhavesh Mistry commented on KAFKA-1721:
---

I have filled https://github.com/xerial/snappy-java/issues/88 for tracking for 
Snappy. 

There is patch provided and Thanks to [~ewencp] for testing the patch.  Please 
see above link for more details.


Thanks,

Bhavesh 

> Snappy compressor is not thread safe
> 
>
> Key: KAFKA-1721
> URL: https://issues.apache.org/jira/browse/KAFKA-1721
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>
> From the mailing list, it can generate this exception:
> 2014-10-20 18:55:21.841 [kafka-producer-network-thread] ERROR
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in
> kafka producer I/O thread:
> *java.lang.NullPointerException*
> at
> org.xerial.snappy.BufferRecycler.releaseInputBuffer(BufferRecycler.java:153)
> at org.xerial.snappy.SnappyOutputStream.close(SnappyOutputStream.java:317)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:160)
> at org.apache.kafka.common.record.Compressor.close(Compressor.java:94)
> at
> org.apache.kafka.common.record.MemoryRecords.close(MemoryRecords.java:119)
> at
> org.apache.kafka.clients.producer.internals.RecordAccumulator.drain(RecordAccumulator.java:285)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> This appears to be an issue with the snappy-java library using ThreadLocal 
> for an internal buffer recycling object which results in that object being 
> shared unsafely across threads if one thread sends to multiple producers:
> {quote}
> I think the issue is that you're
> using all your producers across a thread pool and the snappy library
> uses ThreadLocal BufferRecyclers. When new Snappy streams are allocated,
> they may be allocated from the same thread (e.g. one of your MyProducer
> classes calls Producer.send() on multiple producers from the same
> thread) and therefore use the same BufferRecycler. Eventually you hit
> the code in the stacktrace, and if two producer send threads hit it
> concurrently they improperly share the unsynchronized BufferRecycler.
> This seems like a pain to fix -- it's really a deficiency of the snappy
> library and as far as I can see there's no external control over
> BufferRecycler in their API. One possibility is to record the thread ID
> when we generate a new stream in Compressor and use that to synchronize
> access to ensure no concurrent BufferRecycler access. That could be made
> specific to snappy so it wouldn't impact other codecs. Not exactly
> ideal, but it would work. Unfortunately I can't think of any way for you
> to protect against this in your own code since the problem arises in the
> producer send thread, which your code should never know about.
> Another option would be to setup your producers differently to avoid the
> possibility of unsynchronized access from multiple threads (i.e. don't
> use the same thread pool approach), but whether you can do that will
> depend on your use case.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-21 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179014#comment-14179014
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

[~jkreps],

I am sorry I did not get back to you soon.  The cost of enqueue a message into 
single partition only is ~54%. 

Here is test I have done:

To *single* partition:
Throughput per Thread=2666.5  byte(s)/microsecond
All done...!

To *all* partition:
Throughput per Thread=5818.181818181818  byte(s)/microsecond
All done...!


The cost of sync block in roughly around  

{code}
package org.kafka.test;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import java.util.concurrent.Callable;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

static int numberTh = 75;
static CountDownLatch latch = new CountDownLatch(numberTh);
public static void main(String[] args) throws IOException, 
InterruptedException {

//Thread.sleep(6);

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "logmon.test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
int numberOfLoop = 5000;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 1;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}
ExecutorService service =   new ThreadPoolExecutor(numberTh, 
numberTh,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(numberTh *2));
MyProducer [] producerThResult = new MyProducer [numberTh];
for(int i = 0 ; i < numberTh;i++){
producerThResult[i] = new 
MyProducer(producer,numberOfLoop,builder.toString(), topic);
service.execute(producerThResult[i]);
}   
latch.await();
for (int i = 0; i < producer.length; i++) {
producer[i].close();
}   
service.shutdownNow();
System.out.println("All Producers done...!");
// now interpret the result... of this...
long lowestTime = 0 ;
for(int i =0 ; i < producerThResult.length;i++){
if(i == 1){
lowestTime = 
producerThResult[i].totalTimeinNano;
}else if ( producerThResult[i].totalTimeinNano < 
lowestTime){
lowestTime = 
producerThResult[i].totalTimeinNano;
}
}
long bytesSend = msgLenth * numberOfLoop;
long durationInMs = TimeUnit.MILLISECONDS.convert(lowestTime, 
TimeUnit.NANOSECONDS);

double throughput = (bytesSend * 1.0) / (durationInMs);
System.out.println("Throughput per Thread=" + throughput + "  
byte(s)/microsecond");

System.out.println("All done...!");

}



static class MyProducer implements Callable , Runnable {

Producer[] producer;
long maxloops;
String msg ;
String topic;
long totalTimeinNano = 0;

MyProducer(Producer[] list, long maxloops,String msg,String 
topic){
this.producer = list;
this.maxloops = maxloops;
this.msg = msg;
this.topic = topic;
}
public void run() {
// ALWAYS SEND DATA TO PARTITION 1 only...  
//ProducerRecord record = new ProducerRecord(topic, 
1,null,msg.toString().getBytes());
ProducerRecord recor

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-21 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179014#comment-14179014
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/21/14 8:26 PM:
-

[~jkreps],

I am sorry I did not get back to you soon.  The cost of enqueue a message into 
single partition is ~54% as compare to round-robin.  (test with 32 partition to 
single topic and 3 cluster)  The throughput is measuring the cost  of put data 
into buffer.

Here is test I have done:

To *single* partition:
Throughput per Thread=2666.5  byte(s)/microsecond
All done...!

To *all* partition:
Throughput per Thread=5818.181818181818  byte(s)/microsecond
All done...!

{code}
package org.kafka.test;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import java.util.concurrent.Callable;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

static int numberTh = 75;
static CountDownLatch latch = new CountDownLatch(numberTh);
public static void main(String[] args) throws IOException, 
InterruptedException {

//Thread.sleep(6);

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "logmon.test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
int numberOfLoop = 5000;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 1;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}
ExecutorService service =   new ThreadPoolExecutor(numberTh, 
numberTh,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(numberTh *2));
MyProducer [] producerThResult = new MyProducer [numberTh];
for(int i = 0 ; i < numberTh;i++){
producerThResult[i] = new 
MyProducer(producer,numberOfLoop,builder.toString(), topic);
service.execute(producerThResult[i]);
}   
latch.await();
for (int i = 0; i < producer.length; i++) {
producer[i].close();
}   
service.shutdownNow();
System.out.println("All Producers done...!");
// now interpret the result... of this...
long lowestTime = 0 ;
for(int i =0 ; i < producerThResult.length;i++){
if(i == 1){
lowestTime = 
producerThResult[i].totalTimeinNano;
}else if ( producerThResult[i].totalTimeinNano < 
lowestTime){
lowestTime = 
producerThResult[i].totalTimeinNano;
}
}
long bytesSend = msgLenth * numberOfLoop;
long durationInMs = TimeUnit.MILLISECONDS.convert(lowestTime, 
TimeUnit.NANOSECONDS);

double throughput = (bytesSend * 1.0) / (durationInMs);
System.out.println("Throughput per Thread=" + throughput + "  
byte(s)/microsecond");

System.out.println("All done...!");

}



static class MyProducer implements Callable , Runnable {

Producer[] producer;
long maxloops;
String msg ;
String topic;
long totalTimeinNano = 0;

MyProducer(Producer[] list, long maxloops,String msg,String 
topic){
this.producer = list;
this.maxloops = maxloops;
this.msg = msg;
this.topic = topic;
}
public void run() {
// ALWAYS SEND DATA TO PARTITION 1 only...  

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-21 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179014#comment-14179014
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/21/14 8:28 PM:
-

[~jkreps],

I am sorry I did not get back to you so soon.  The cost of enqueue a message 
into single partition is ~54% as compare to round-robin.  (test with 32 
partitions to single topic and 3 broker cluster)  The throughput is measuring 
the cost  of put data into buffer only not cost of sending data to brokers.

Here is test I have done:

To *single* partition:
Throughput per Thread=2666.5  byte(s)/microsecond
All done...!

To *all* partition:
Throughput per Thread=5818.181818181818  byte(s)/microsecond
All done...!

{code}
package org.kafka.test;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import java.util.concurrent.Callable;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

static int numberTh = 75;
static CountDownLatch latch = new CountDownLatch(numberTh);
public static void main(String[] args) throws IOException, 
InterruptedException {

//Thread.sleep(6);

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "logmon.test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
int numberOfLoop = 5000;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 1;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}
ExecutorService service =   new ThreadPoolExecutor(numberTh, 
numberTh,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(numberTh *2));
MyProducer [] producerThResult = new MyProducer [numberTh];
for(int i = 0 ; i < numberTh;i++){
producerThResult[i] = new 
MyProducer(producer,numberOfLoop,builder.toString(), topic);
service.execute(producerThResult[i]);
}   
latch.await();
for (int i = 0; i < producer.length; i++) {
producer[i].close();
}   
service.shutdownNow();
System.out.println("All Producers done...!");
// now interpret the result... of this...
long lowestTime = 0 ;
for(int i =0 ; i < producerThResult.length;i++){
if(i == 1){
lowestTime = 
producerThResult[i].totalTimeinNano;
}else if ( producerThResult[i].totalTimeinNano < 
lowestTime){
lowestTime = 
producerThResult[i].totalTimeinNano;
}
}
long bytesSend = msgLenth * numberOfLoop;
long durationInMs = TimeUnit.MILLISECONDS.convert(lowestTime, 
TimeUnit.NANOSECONDS);

double throughput = (bytesSend * 1.0) / (durationInMs);
System.out.println("Throughput per Thread=" + throughput + "  
byte(s)/microsecond");

System.out.println("All done...!");

}



static class MyProducer implements Callable , Runnable {

Producer[] producer;
long maxloops;
String msg ;
String topic;
long totalTimeinNano = 0;

MyProducer(Producer[] list, long maxloops,String msg,String 
topic){
this.producer = list;
this.maxloops = maxloops;
this.msg = msg;
this.topic = topic;
}
public void run() {
// ALWAYS SEN

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-21 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179014#comment-14179014
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/21/14 8:29 PM:
-

[~jkreps],

I am sorry I did not get back to you so soon.  The cost of enqueue a message 
into single partition is ~54% as compare to round-robin.  (test with 32 
partitions to single topic and 3 broker cluster)  The throughput is measuring 
the cost  of put data into buffer only not cost of sending data to brokers.

Here is test I have done:

To *single* partition:
Throughput per Thread=2666.5  byte(s)/microsecond
All done...!

To *all* partition:
Throughput per Thread=5818.181818181818  byte(s)/microsecond
All done...!

Here is test program for this:
{code}
package org.kafka.test;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import java.util.concurrent.Callable;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

static int numberTh = 75;
static CountDownLatch latch = new CountDownLatch(numberTh);
public static void main(String[] args) throws IOException, 
InterruptedException {

//Thread.sleep(6);

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "logmon.test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
int numberOfLoop = 5000;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 1;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}
ExecutorService service =   new ThreadPoolExecutor(numberTh, 
numberTh,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(numberTh *2));
MyProducer [] producerThResult = new MyProducer [numberTh];
for(int i = 0 ; i < numberTh;i++){
producerThResult[i] = new 
MyProducer(producer,numberOfLoop,builder.toString(), topic);
service.execute(producerThResult[i]);
}   
latch.await();
for (int i = 0; i < producer.length; i++) {
producer[i].close();
}   
service.shutdownNow();
System.out.println("All Producers done...!");
// now interpret the result... of this...
long lowestTime = 0 ;
for(int i =0 ; i < producerThResult.length;i++){
if(i == 1){
lowestTime = 
producerThResult[i].totalTimeinNano;
}else if ( producerThResult[i].totalTimeinNano < 
lowestTime){
lowestTime = 
producerThResult[i].totalTimeinNano;
}
}
long bytesSend = msgLenth * numberOfLoop;
long durationInMs = TimeUnit.MILLISECONDS.convert(lowestTime, 
TimeUnit.NANOSECONDS);

double throughput = (bytesSend * 1.0) / (durationInMs);
System.out.println("Throughput per Thread=" + throughput + "  
byte(s)/microsecond");

System.out.println("All done...!");

}



static class MyProducer implements Callable , Runnable {

Producer[] producer;
long maxloops;
String msg ;
String topic;
long totalTimeinNano = 0;

MyProducer(Producer[] list, long maxloops,String msg,String 
topic){
this.producer = list;
this.maxloops = maxloops;
this.msg = msg;
this.topic = topic;
}
public void run() {
  

[jira] [Commented] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names

2014-10-21 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179092#comment-14179092
 ] 

Bhavesh Mistry commented on KAFKA-1481:
---

Hi [~junrao],

Can you please let me know if this will also address the [New Java Producer] 
metrics() method and which client.id or topic has special chars ?  So we have 
consistent naming across all JMX name bean or metrics() methods ?

Here is background on this:

{code}
Bhavesh,

Yes, allowing dot in clientId and topic makes it a bit harder to define the
JMX bean names. I see a couple of solutions here.

1. Disable dot in clientId and topic names. The issue is that dot may
already be used in existing deployment.

2. We can represent the JMX bean name differently in the new producer.
Instead of
  kafka.producer.myclientid:type=mytopic
we could change it to
  kafka.producer:clientId=myclientid,topic=mytopic

I felt that option 2 is probably better since it doesn't affect existing
users.

Otis,

We probably can also use option 2 to address KAFKA-1481. For topic/clientid
specific metrics, we could explicitly specify the metric name so that it
contains "topic=mytopic,clientid=myclientid". That seems to be a much
cleaner way than having all parts included in a single string separated by
'|'.

Thanks,

Jun
{code}

Thanks,

Bhavesh 

> Stop using dashes AND underscores as separators in MBean names
> --
>
> Key: KAFKA-1481
> URL: https://issues.apache.org/jira/browse/KAFKA-1481
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1.1
>Reporter: Otis Gospodnetic
>Priority: Critical
>  Labels: patch
> Fix For: 0.8.3
>
> Attachments: KAFKA-1481_2014-06-06_13-06-35.patch, 
> KAFKA-1481_2014-10-13_18-23-35.patch, KAFKA-1481_2014-10-14_21-53-35.patch, 
> KAFKA-1481_2014-10-15_10-23-35.patch, KAFKA-1481_2014-10-20_23-14-35.patch, 
> KAFKA-1481_2014-10-21_09-14-35.patch, 
> KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch, 
> KAFKA-1481_IDEA_IDE_2014-10-15_10-23-35.patch, 
> KAFKA-1481_IDEA_IDE_2014-10-20_20-14-35.patch, 
> KAFKA-1481_IDEA_IDE_2014-10-20_23-14-35.patch
>
>
> MBeans should not use dashes or underscores as separators because these 
> characters are allowed in hostnames, topics, group and consumer IDs, etc., 
> and these are embedded in MBeans names making it impossible to parse out 
> individual bits from MBeans.
> Perhaps a pipe character should be used to avoid the conflict. 
> This looks like a major blocker because it means nobody can write Kafka 0.8.x 
> monitoring tools unless they are doing it for themselves AND do not use 
> dashes AND do not use underscores.
> See: http://search-hadoop.com/m/4TaT4lonIW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1721) Snappy compressor is not thread safe

2014-10-23 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181730#comment-14181730
 ] 

Bhavesh Mistry commented on KAFKA-1721:
---

[~ewencp],

Thanks for fixing this issue.  Snappy Dev has release new version with fix 
https://oss.sonatype.org/content/repositories/releases/org/xerial/snappy/snappy-java/1.1.1.4/
 

Thanks,
Bhavesh

> Snappy compressor is not thread safe
> 
>
> Key: KAFKA-1721
> URL: https://issues.apache.org/jira/browse/KAFKA-1721
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>
> From the mailing list, it can generate this exception:
> 2014-10-20 18:55:21.841 [kafka-producer-network-thread] ERROR
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in
> kafka producer I/O thread:
> *java.lang.NullPointerException*
> at
> org.xerial.snappy.BufferRecycler.releaseInputBuffer(BufferRecycler.java:153)
> at org.xerial.snappy.SnappyOutputStream.close(SnappyOutputStream.java:317)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:160)
> at org.apache.kafka.common.record.Compressor.close(Compressor.java:94)
> at
> org.apache.kafka.common.record.MemoryRecords.close(MemoryRecords.java:119)
> at
> org.apache.kafka.clients.producer.internals.RecordAccumulator.drain(RecordAccumulator.java:285)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> This appears to be an issue with the snappy-java library using ThreadLocal 
> for an internal buffer recycling object which results in that object being 
> shared unsafely across threads if one thread sends to multiple producers:
> {quote}
> I think the issue is that you're
> using all your producers across a thread pool and the snappy library
> uses ThreadLocal BufferRecyclers. When new Snappy streams are allocated,
> they may be allocated from the same thread (e.g. one of your MyProducer
> classes calls Producer.send() on multiple producers from the same
> thread) and therefore use the same BufferRecycler. Eventually you hit
> the code in the stacktrace, and if two producer send threads hit it
> concurrently they improperly share the unsynchronized BufferRecycler.
> This seems like a pain to fix -- it's really a deficiency of the snappy
> library and as far as I can see there's no external control over
> BufferRecycler in their API. One possibility is to record the thread ID
> when we generate a new stream in Compressor and use that to synchronize
> access to ensure no concurrent BufferRecycler access. That could be made
> specific to snappy so it wouldn't impact other codecs. Not exactly
> ideal, but it would work. Unfortunately I can't think of any way for you
> to protect against this in your own code since the problem arises in the
> producer send thread, which your code should never know about.
> Another option would be to setup your producers differently to avoid the
> possibility of unsynchronized access from multiple threads (i.e. don't
> use the same thread pool approach), but whether you can do that will
> depend on your use case.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-23 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182198#comment-14182198
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

[~jkreps],

Sorry to bug you again.  Did you get chance to review the above performance 
number and cost of Sync per thread when Partition is not set and partition set 
to single partition ?  

Thanks,

Bhavesh 

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179014#comment-14179014
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/24/14 6:21 PM:
-

[~jkreps],

I am sorry I did not get back to you so soon.  The cost of enqueue a message 
into single partition is ~54% as compare to round-robin.  (test with 32 
partitions to single topic and 3 broker cluster)  The throughput is measuring 
the cost  of put data into buffer only not cost of sending data to brokers.

Here is test I have done:

To *single* partition:
Throughput per Thread=2666.5  byte(s)/millisecond
All done...!

To *all* partition:
Throughput per Thread=5818.181818181818  byte(s)/millisecond
All done...!

Here is test program for this:
{code}
package org.kafka.test;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import java.util.concurrent.Callable;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

public class TestNetworkDownProducer {

static int numberTh = 75;
static CountDownLatch latch = new CountDownLatch(numberTh);
public static void main(String[] args) throws IOException, 
InterruptedException {

//Thread.sleep(6);

Properties prop = new Properties();
InputStream propFile = 
Thread.currentThread().getContextClassLoader()

.getResourceAsStream("kafkaproducer.properties");

String topic = "logmon.test";
prop.load(propFile);
System.out.println("Property: " + prop.toString());
StringBuilder builder = new StringBuilder(1024);
int msgLenth = 256;
int numberOfLoop = 5000;
for (int i = 0; i < msgLenth; i++)
builder.append("a");

int numberOfProducer = 1;
Producer[] producer = new Producer[numberOfProducer];

for (int i = 0; i < producer.length; i++) {
producer[i] = new KafkaProducer(prop);
}
ExecutorService service =   new ThreadPoolExecutor(numberTh, 
numberTh,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(numberTh *2));
MyProducer [] producerThResult = new MyProducer [numberTh];
for(int i = 0 ; i < numberTh;i++){
producerThResult[i] = new 
MyProducer(producer,numberOfLoop,builder.toString(), topic);
service.execute(producerThResult[i]);
}   
latch.await();
for (int i = 0; i < producer.length; i++) {
producer[i].close();
}   
service.shutdownNow();
System.out.println("All Producers done...!");
// now interpret the result... of this...
long lowestTime = 0 ;
for(int i =0 ; i < producerThResult.length;i++){
if(i == 1){
lowestTime = 
producerThResult[i].totalTimeinNano;
}else if ( producerThResult[i].totalTimeinNano < 
lowestTime){
lowestTime = 
producerThResult[i].totalTimeinNano;
}
}
long bytesSend = msgLenth * numberOfLoop;
long durationInMs = TimeUnit.MILLISECONDS.convert(lowestTime, 
TimeUnit.NANOSECONDS);

double throughput = (bytesSend * 1.0) / (durationInMs);
System.out.println("Throughput per Thread=" + throughput + "  
byte(s)/microsecond");

System.out.println("All done...!");

}



static class MyProducer implements Callable , Runnable {

Producer[] producer;
long maxloops;
String msg ;
String topic;
long totalTimeinNano = 0;

MyProducer(Producer[] list, long maxloops,String msg,String 
topic){
this.producer = list;
this.maxloops = maxloops;
this.msg = msg;
this.topic = topic;
}
public void run() {
  

[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183648#comment-14183648
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

[~jkreps],

Yes , I did this test with 75 threads and on My mac pro with 8 core with Snappy 
compression ON.  Do you have any idea how we can improve this enqueue for 
single partition ?  May be have x # of CPU active buffer ?

Here is info about the box:

{code}
machdep.cpu.max_basic: 13
machdep.cpu.max_ext: 2147483656
machdep.cpu.vendor: GenuineIntel
machdep.cpu.brand_string: Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz
machdep.cpu.family: 6
machdep.cpu.model: 58
machdep.cpu.extmodel: 3
machdep.cpu.extfamily: 0
machdep.cpu.stepping: 9
machdep.cpu.feature_bits: 3219913727 2142954495
machdep.cpu.leaf7_feature_bits: 641
machdep.cpu.extfeature_bits: 672139520 1
machdep.cpu.signature: 198313
machdep.cpu.brand: 0
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA 
CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ 
DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC 
POPCNT AES PCID XSAVE OSXSAVE TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: SMEP ENFSTRG RDWRFSGS
machdep.cpu.extfeatures: SYSCALL XD EM64T LAHF RDTSCP TSCI
machdep.cpu.logical_per_package: 16
machdep.cpu.cores_per_package: 8
{code}

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This mes

[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-27 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186350#comment-14186350
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

[~jkreps],

I understand the current code base is adding bytes to shared memory and doing 
compression (on application thread).  The older consumer seems to do all this 
in back-ground thread.  So What changed to have this in fore-ground ?  Also, if 
you had to re-engineer this code, How would you  re-engineer to remove 
Synchronization and move everything in background so more runable state is give 
to Application Thread and cost of enqueue will very less.  

I am really interested in solving this problem for my application.  So I just 
wanted to know your suggestions/ideas, how would you solve this ?

Thanks for all your help so far !!  

Thanks,

Bhavesh 

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-27 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186350#comment-14186350
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/28/14 4:40 AM:
-

[~jkreps],

I understand the current code base is adding bytes to shared memory and doing 
compression (on application thread).  The older consumer seems to do all this 
in back-ground thread.  So What changed to have this in fore-ground ?  Also, if 
you had to re-engineer this code, How would you  re-engineer to remove 
Synchronization and move everything in background so more runable state is give 
to Application Thread and cost of enqueue will very less.  (Of Course at cost 
of memory).  

I am really interested in solving this problem for my application.  So I just 
wanted to know your suggestions/ideas, how would you solve this ?

Thanks for all your help so far !!  

Thanks,

Bhavesh 


was (Author: bmis13):
[~jkreps],

I understand the current code base is adding bytes to shared memory and doing 
compression (on application thread).  The older consumer seems to do all this 
in back-ground thread.  So What changed to have this in fore-ground ?  Also, if 
you had to re-engineer this code, How would you  re-engineer to remove 
Synchronization and move everything in background so more runable state is give 
to Application Thread and cost of enqueue will very less.  

I am really interested in solving this problem for my application.  So I just 
wanted to know your suggestions/ideas, how would you solve this ?

Thanks for all your help so far !!  

Thanks,

Bhavesh 

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurre

[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-27 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186350#comment-14186350
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/28/14 4:58 AM:
-

[~jkreps],

I understand the current code base is adding bytes to shared memory and doing 
compression (on application thread).  The older consumer seems to do all this 
in back-ground thread.  So What changed to have this in fore-ground ?  Also, if 
you had to re-engineer this code, How would you  re-engineer to remove 
Synchronization and move everything in background so more runable state is give 
to Application Thread and cost of enqueue will very less.  (Of Course at cost 
of memory).  

I am really interested in solving this problem for my application.  So I just 
wanted to know your suggestions/ideas, how would you solve this ?

Thanks for all your help so far !!Only think I can think of is do 
*AsynKafkaProducer* as mentioned in previous comments where [~ewencp] mentioned 
that problem will be those threads that are enqueue message at cost of memory, 
thread context switching etc...

Thanks,

Bhavesh 


was (Author: bmis13):
[~jkreps],

I understand the current code base is adding bytes to shared memory and doing 
compression (on application thread).  The older consumer seems to do all this 
in back-ground thread.  So What changed to have this in fore-ground ?  Also, if 
you had to re-engineer this code, How would you  re-engineer to remove 
Synchronization and move everything in background so more runable state is give 
to Application Thread and cost of enqueue will very less.  (Of Course at cost 
of memory).  

I am really interested in solving this problem for my application.  So I just 
wanted to know your suggestions/ideas, how would you solve this ?

Thanks for all your help so far !!  

Thanks,

Bhavesh 

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> 
>
> Key: KAFKA-1710
> URL: https://issues.apache.org/jira/browse/KAFKA-1710
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Development
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Critical
>  Labels: performance
> Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionTyp

[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-23 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222571#comment-14222571
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/24/14 1:31 AM:
-

The patch provided does not solve the problem.  When you have more than one or 
more producer instance,  the effect amplifies. 

org.apache.kafka.clients.producer.internals.Send.run() takes 100% CPU due to 
infinite  loop when there is no brokers (no work to be done to dump data).


Thanks,
Bhavesh 


was (Author: bmis13):
The patch provided does not solve the problem.  When you have more than one 
producer instance,  the effect amplifies. 

org.apache.kafka.clients.producer.internals.Send.run() takes 100% CPU due to 
infinite  loop when there is no brokers.


Thanks,
Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-23 Thread Bhavesh Mistry (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavesh Mistry reopened KAFKA-1642:
---

The patch provided does not solve the problem.  When you have more than one 
producer instance,  the effect amplifies. 

org.apache.kafka.clients.producer.internals.Send.run() takes 100% CPU due to 
infinite  loop when there is no brokers.


Thanks,
Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-23 Thread Bhavesh Mistry (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavesh Mistry updated KAFKA-1642:
--
Attachment: 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch

Please take look at experimental patch that solve this problem by capturing the 
correct Node state and also not so elegant by exponential backoff run() method 
by sleeping (many of the value is hard coded but it is just experimental).

Also, there is another problem close() method on producer does not exit and JVM 
does not gracefully  shutdown because io thread is spinning in  while loop 
during network outage.  This is also another edge case. 

I hope this will be very helpful and solve problem.

Thanks,

Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223161#comment-14223161
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/24/14 5:26 PM:
-

[~ewencp],

The way to reproduce this is to simulate network instability by turning on and 
off network service (or turn on/off physical cable).   The connect and see if 
recover and disconnect and connect again etc.. you will see the behavior again 
and again. 

The issue is also with connection state management :

{code}
private void initiateConnect(Node node, long now) {
try {
log.debug("Initiating connection to node {} at {}:{}.", node.id(), 
node.host(), node.port());
// TODO FIX java.lang.IllegalStateException: No entry found for 
node -3 (We need put before remove it..)..
this.connectionStates.connecting(node.id(), now);  (This line has 
problem because it will loose previous last attempt made get above exception 
and it will try to connect to that node for ever and ever with exception )
selector.connect(node.id(), new InetSocketAddress(node.host(), 
node.port()), this.socketSendBuffer, this.socketReceiveBuffer);
} catch (IOException e) {
/* attempt failed, we'll try again after the backoff */
connectionStates.disconnectedWhenConnectting(node.id());
/* maybe the problem is our metadata, update it */
metadata.requestUpdate();
log.debug("Error connecting to node {} at {}:{}:", node.id(), 
node.host(), node.port(), e);
}
}
{code}

In my opinion, regardless of what node status is in run() method needs to be 
safe-guarded from still CPU Cycle when there is no state for Node.  (Hence I 
have added exponential sleep as temp solution to not to still CPU cycle , I 
think must protect it some how and check the execution time...)

Please let me know if you need more info  and i will be more than happy to 
reproduce bug and we can have conference call, and I can show you the problem.

Thanks,
Bhavesh 




was (Author: bmis13):
[~ewencp],

The way to reproduce this is to simulate network instability by turning on and 
off network service (or turn on/off physical cable).   The connect and see if 
recover and disconnect and connect again etc.. you will see the behavior again 
and again. 

The issue is also with connection state management :

{code}
private void initiateConnect(Node node, long now) {
try {
log.debug("Initiating connection to node {} at {}:{}.", node.id(), 
node.host(), node.port());
// TODO FIX java.lang.IllegalStateException: No entry found for 
node -3 (We need put before remove it..)..
this.connectionStates.connecting(node.id(), now);  (This line has 
problem because it will loose previous last attempt made get above exception 
and it will try to connect to that node for ever and ever with exception )
selector.connect(node.id(), new InetSocketAddress(node.host(), 
node.port()), this.socketSendBuffer, this.socketReceiveBuffer);
} catch (IOException e) {
/* attempt failed, we'll try again after the backoff */
connectionStates.disconnectedWhenConnectting(node.id());
/* maybe the problem is our metadata, update it */
metadata.requestUpdate();
log.debug("Error connecting to node {} at {}:{}:", node.id(), 
node.host(), node.port(), e);
}
}
{code]

In my opinion, regardless of what node status is in run() method needs to be 
safe-guarded from still CPU Cycle when there is no state for Node.  (Hence I 
have added exponential sleep as temp solution to not to still CPU cycle , I 
think must protect it some how and check the execution time...)

Please let me know if you need more info  and i will be more than happy to 
reproduce bug and we can have conference call, and I can show you the problem.

Thanks,
Bhavesh 



> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-produc

[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223161#comment-14223161
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

[~ewencp],

The way to reproduce this is to simulate network instability by turning on and 
off network service (or turn on/off physical cable).   The connect and see if 
recover and disconnect and connect again etc.. you will see the behavior again 
and again. 

The issue is also with connection state management :

{code}
private void initiateConnect(Node node, long now) {
try {
log.debug("Initiating connection to node {} at {}:{}.", node.id(), 
node.host(), node.port());
// TODO FIX java.lang.IllegalStateException: No entry found for 
node -3 (We need put before remove it..)..
this.connectionStates.connecting(node.id(), now);  (This line has 
problem because it will loose previous last attempt made get above exception 
and it will try to connect to that node for ever and ever with exception )
selector.connect(node.id(), new InetSocketAddress(node.host(), 
node.port()), this.socketSendBuffer, this.socketReceiveBuffer);
} catch (IOException e) {
/* attempt failed, we'll try again after the backoff */
connectionStates.disconnectedWhenConnectting(node.id());
/* maybe the problem is our metadata, update it */
metadata.requestUpdate();
log.debug("Error connecting to node {} at {}:{}:", node.id(), 
node.host(), node.port(), e);
}
}
{code]

In my opinion, regardless of what node status is in run() method needs to be 
safe-guarded from still CPU Cycle when there is no state for Node.  (Hence I 
have added exponential sleep as temp solution to not to still CPU cycle , I 
think must protect it some how and check the execution time...)

Please let me know if you need more info  and i will be more than happy to 
reproduce bug and we can have conference call, and I can show you the problem.

Thanks,
Bhavesh 



> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223161#comment-14223161
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/24/14 5:27 PM:
-

[~ewencp],

The way to reproduce this is to simulate network instability by turning on and 
off network service (or turn on/off physical cable).   The connect and see if 
recover and disconnect and connect again etc.. you will see the behavior again 
and again. 

The issue is also with connection state management :

{code}
private void initiateConnect(Node node, long now) {
try {
log.debug("Initiating connection to node {} at {}:{}.", node.id(), 
node.host(), node.port());
// TODO FIX java.lang.IllegalStateException: No entry found for 
node -3 (We need put before remove it..)..
this.connectionStates.connecting(node.id(), now);  (This line has 
problem because it will loose previous last attempt made get above exception 
and it will try to connect to that node for ever and ever with exception )
selector.connect(node.id(), new InetSocketAddress(node.host(), 
node.port()), this.socketSendBuffer, this.socketReceiveBuffer);
} catch (IOException e) {
/* attempt failed, we'll try again after the backoff */
connectionStates.disconnectedWhenConnectting(node.id());
/* maybe the problem is our metadata, update it */
metadata.requestUpdate();
log.debug("Error connecting to node {} at {}:{}:", node.id(), 
node.host(), node.port(), e);
}
}
{code}

In my opinion, regardless of what node status is in run() method needs to be 
safe-guarded from still CPU Cycle when there is no state for Node.  (Hence I 
have added exponential sleep as temp solution to not to still CPU cycle , I 
think must protect it some how and check the execution time...)

Please let me know if you need more info  and i will be more than happy to 
reproduce bug and we can have conference call, and I can show you the problem.

Based on code diff I have done from 0.8.1.1 tag and this.  This issue also 
occur in  0.8.1.1 as well I think.

Thanks,
Bhavesh 




was (Author: bmis13):
[~ewencp],

The way to reproduce this is to simulate network instability by turning on and 
off network service (or turn on/off physical cable).   The connect and see if 
recover and disconnect and connect again etc.. you will see the behavior again 
and again. 

The issue is also with connection state management :

{code}
private void initiateConnect(Node node, long now) {
try {
log.debug("Initiating connection to node {} at {}:{}.", node.id(), 
node.host(), node.port());
// TODO FIX java.lang.IllegalStateException: No entry found for 
node -3 (We need put before remove it..)..
this.connectionStates.connecting(node.id(), now);  (This line has 
problem because it will loose previous last attempt made get above exception 
and it will try to connect to that node for ever and ever with exception )
selector.connect(node.id(), new InetSocketAddress(node.host(), 
node.port()), this.socketSendBuffer, this.socketReceiveBuffer);
} catch (IOException e) {
/* attempt failed, we'll try again after the backoff */
connectionStates.disconnectedWhenConnectting(node.id());
/* maybe the problem is our metadata, update it */
metadata.requestUpdate();
log.debug("Error connecting to node {} at {}:{}:", node.id(), 
node.host(), node.port(), e);
}
}
{code}

In my opinion, regardless of what node status is in run() method needs to be 
safe-guarded from still CPU Cycle when there is no state for Node.  (Hence I 
have added exponential sleep as temp solution to not to still CPU cycle , I 
think must protect it some how and check the execution time...)

Please let me know if you need more info  and i will be more than happy to 
reproduce bug and we can have conference call, and I can show you the problem.

Thanks,
Bhavesh 



> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are ve

[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223161#comment-14223161
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/24/14 6:57 PM:
-

[~ewencp],

The way to reproduce this is to simulate network instability by turning on and 
off network service (or turn on/off physical cable).   The connect and see if 
recover and disconnect and connect again etc.. you will see the behavior again 
and again. 

The issue is also with connection state management :

{code}
private void initiateConnect(Node node, long now) {
try {
log.debug("Initiating connection to node {} at {}:{}.", node.id(), 
node.host(), node.port());
// TODO FIX java.lang.IllegalStateException: No entry found for 
node -3 (We need put before remove it..)..
this.connectionStates.connecting(node.id(), now);  (This line has 
problem because it will loose previous last attempt made get above exception 
and it will try to connect to that node for ever and ever with exception )
selector.connect(node.id(), new InetSocketAddress(node.host(), 
node.port()), this.socketSendBuffer, this.socketReceiveBuffer);
} catch (IOException e) {
/* attempt failed, we'll try again after the backoff */
connectionStates.disconnectedWhenConnectting(node.id());
/* maybe the problem is our metadata, update it */
metadata.requestUpdate();
log.debug("Error connecting to node {} at {}:{}:", node.id(), 
node.host(), node.port(), e);
}
}
{code}

In my opinion, regardless of what node status is in run() method needs to be 
safe-guarded from still CPU Cycle when there is no state for Node.  (Hence I 
have added exponential sleep as temp solution to not to stealing CPU cycle , I 
think must protect it some how and must check the execution time...)

Please let me know if you need more info  and i will be more than happy to 
reproduce bug and we can have conference call, and I can show you the problem.

Based on code diff I have done from 0.8.1.1 tag and this.  This issue also 
occur in  0.8.1.1 as well I think.

Thanks,
Bhavesh 




was (Author: bmis13):
[~ewencp],

The way to reproduce this is to simulate network instability by turning on and 
off network service (or turn on/off physical cable).   The connect and see if 
recover and disconnect and connect again etc.. you will see the behavior again 
and again. 

The issue is also with connection state management :

{code}
private void initiateConnect(Node node, long now) {
try {
log.debug("Initiating connection to node {} at {}:{}.", node.id(), 
node.host(), node.port());
// TODO FIX java.lang.IllegalStateException: No entry found for 
node -3 (We need put before remove it..)..
this.connectionStates.connecting(node.id(), now);  (This line has 
problem because it will loose previous last attempt made get above exception 
and it will try to connect to that node for ever and ever with exception )
selector.connect(node.id(), new InetSocketAddress(node.host(), 
node.port()), this.socketSendBuffer, this.socketReceiveBuffer);
} catch (IOException e) {
/* attempt failed, we'll try again after the backoff */
connectionStates.disconnectedWhenConnectting(node.id());
/* maybe the problem is our metadata, update it */
metadata.requestUpdate();
log.debug("Error connecting to node {} at {}:{}:", node.id(), 
node.host(), node.port(), e);
}
}
{code}

In my opinion, regardless of what node status is in run() method needs to be 
safe-guarded from still CPU Cycle when there is no state for Node.  (Hence I 
have added exponential sleep as temp solution to not to still CPU cycle , I 
think must protect it some how and check the execution time...)

Please let me know if you need more info  and i will be more than happy to 
reproduce bug and we can have conference call, and I can show you the problem.

Based on code diff I have done from 0.8.1.1 tag and this.  This issue also 
occur in  0.8.1.1 as well I think.

Thanks,
Bhavesh 



> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.

[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223532#comment-14223532
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

Also Regarding KafkaProder.close() method hangs for ever because of following 
loop, and 

{code}
Sender.java

 // okay we stopped accepting requests but there may still be
// requests in the accumulator or waiting for acknowledgment,
// wait until these are completed.
while (this.accumulator.hasUnsent() || this.client.inFlightRequestCount() > 0) {
try {
run(time.milliseconds());
} catch (Exception e) {
log.error("Uncaught error in kafka producer I/O thread: ", e);
}
}

KafkaProducer.java

 /**
* Close this producer. This method blocks until all in-flight requests complete.
*/
@Override
public void close() {
log.trace("Closing the Kafka producer.");
this.sender.initiateClose();
try {
this.ioThread.join();  // THIS IS BLOCKED since ioThread does not give up.
} catch (InterruptedException e) {
throw new KafkaException(e);
}
this.metrics.close();
log.debug("The Kafka producer has closed.");
}
{code}

The issue describe in KAFKA-1788  is likelihood, but if you look the close call 
stack then calling thread that initiated the close() will hang till io thread 
dies (which it never dies when data is there and network is down).  

Thanks,

Bhavesh


> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223532#comment-14223532
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/24/14 9:21 PM:
-

[~ewencp],

Also Regarding KafkaProder.close() method hangs for ever because of following 
loop, and 

{code}
Sender.java

 // okay we stopped accepting requests but there may still be
// requests in the accumulator or waiting for acknowledgment,
// wait until these are completed.
while (this.accumulator.hasUnsent() || this.client.inFlightRequestCount() > 0) {
try {
run(time.milliseconds());
} catch (Exception e) {
log.error("Uncaught error in kafka producer I/O thread: ", e);
}
}

KafkaProducer.java

 /**
* Close this producer. This method blocks until all in-flight requests complete.
*/
@Override
public void close() {
log.trace("Closing the Kafka producer.");
this.sender.initiateClose();
try {
this.ioThread.join();  // THIS IS BLOCKED since ioThread does not give up.
} catch (InterruptedException e) {
throw new KafkaException(e);
}
this.metrics.close();
log.debug("The Kafka producer has closed.");
}
{code}

The issue describe in KAFKA-1788  is likelihood, but if you look the close call 
stack then calling thread that initiated the close() will hang till io thread 
dies (which it never dies when data is there and network is down).  

Thanks,

Bhavesh



was (Author: bmis13):
Also Regarding KafkaProder.close() method hangs for ever because of following 
loop, and 

{code}
Sender.java

 // okay we stopped accepting requests but there may still be
// requests in the accumulator or waiting for acknowledgment,
// wait until these are completed.
while (this.accumulator.hasUnsent() || this.client.inFlightRequestCount() > 0) {
try {
run(time.milliseconds());
} catch (Exception e) {
log.error("Uncaught error in kafka producer I/O thread: ", e);
}
}

KafkaProducer.java

 /**
* Close this producer. This method blocks until all in-flight requests complete.
*/
@Override
public void close() {
log.trace("Closing the Kafka producer.");
this.sender.initiateClose();
try {
this.ioThread.join();  // THIS IS BLOCKED since ioThread does not give up.
} catch (InterruptedException e) {
throw new KafkaException(e);
}
this.metrics.close();
log.debug("The Kafka producer has closed.");
}
{code}

The issue describe in KAFKA-1788  is likelihood, but if you look the close call 
stack then calling thread that initiated the close() will hang till io thread 
dies (which it never dies when data is there and network is down).  

Thanks,

Bhavesh


> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223532#comment-14223532
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/24/14 9:22 PM:
-

[~ewencp],

Also Regarding KafkaProder.close() method hangs for ever because of following 
loop, and 

{code}
Sender.java

 // okay we stopped accepting requests but there may still be
// requests in the accumulator or waiting for acknowledgment,
// wait until these are completed.
while (this.accumulator.hasUnsent() || this.client.inFlightRequestCount() > 0) {
try {
run(time.milliseconds());
} catch (Exception e) {
log.error("Uncaught error in kafka producer I/O thread: ", e);
}
}

KafkaProducer.java

 /**
* Close this producer. This method blocks until all in-flight requests complete.
*/
@Override
public void close() {
log.trace("Closing the Kafka producer.");
this.sender.initiateClose();
try {
this.ioThread.join();  // THIS IS BLOCKED since ioThread does not give up so it 
is all related in my opinion.
} catch (InterruptedException e) {
throw new KafkaException(e);
}
this.metrics.close();
log.debug("The Kafka producer has closed.");
}
{code}

The issue describe in KAFKA-1788  is likelihood, but if you look the close call 
stack then calling thread that initiated the close() will hang till io thread 
dies (which it never dies when data is there and network is down).  

Thanks,

Bhavesh



was (Author: bmis13):
[~ewencp],

Also Regarding KafkaProder.close() method hangs for ever because of following 
loop, and 

{code}
Sender.java

 // okay we stopped accepting requests but there may still be
// requests in the accumulator or waiting for acknowledgment,
// wait until these are completed.
while (this.accumulator.hasUnsent() || this.client.inFlightRequestCount() > 0) {
try {
run(time.milliseconds());
} catch (Exception e) {
log.error("Uncaught error in kafka producer I/O thread: ", e);
}
}

KafkaProducer.java

 /**
* Close this producer. This method blocks until all in-flight requests complete.
*/
@Override
public void close() {
log.trace("Closing the Kafka producer.");
this.sender.initiateClose();
try {
this.ioThread.join();  // THIS IS BLOCKED since ioThread does not give up.
} catch (InterruptedException e) {
throw new KafkaException(e);
}
this.metrics.close();
log.debug("The Kafka producer has closed.");
}
{code}

The issue describe in KAFKA-1788  is likelihood, but if you look the close call 
stack then calling thread that initiated the close() will hang till io thread 
dies (which it never dies when data is there and network is down).  

Thanks,

Bhavesh


> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223571#comment-14223571
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

Here is exact steps how to reproducer it bug: (Must have demon program 
continuously running).

1)  Start with happy Situation where all borkers are up everything is running 
fine.  And verify all top -pid JAVA_PID and your kit (kafka network threads  
are taking less than 4% CPU).
2)  Shutdown network (turn off network or pull the eth0 cable)  wait for while 
and you will see that CPU spike to 325% under top  (if you have 4 producer) and 
verify your kit is showing 25% CPU consumption for for each Kafka io thread.
3) Connect back the network ( Spike will still be there but CPU after while 
come down to 100% or so ) and remain connected for while.  
4) again simulate network failure (to simulate network instability) repeat 
steps again 1 to 4 but wait for 10 or so minutes in between and you will see 
the trends of CPU spike along with above exception. 
java.lang.IllegalStateException: No entry found for node -2

Also, I see that Kafka is logging excessively when network is down (your kit 
shows it is taking more CPU Cycle  as compare  to normal)

Thanks,
Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223626#comment-14223626
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

Also, there is issue in my last patch.  I did not update the 
lastConnectAttemptMs...in connecting.

{code}
 /**
 * Enter the connecting state for the given node.
 * @param node The id of the node we are connecting to
 * @param now The current time.
 */
public void connecting(int node, long now) {
NodeConnectionState nodeConn = nodeState.get(node); 
if(nodeConn == null){
nodeState.put(node, new 
NodeConnectionState(ConnectionState.CONNECTING, now));
}else{
nodeConn.state = ConnectionState.CONNECTING;
nodeConn.lastConnectAttemptMs = now;  (This will capture and 
update last connection attempt) 

}
}
{code}

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223626#comment-14223626
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/24/14 10:16 PM:
--

Also, there is issue in my experimental patch.  I did not update the 
lastConnectAttemptMs...in connecting state method to solve the issue with 
illegal sate exp:
{code}
 /**
 * Enter the connecting state for the given node.
 * @param node The id of the node we are connecting to
 * @param now The current time.
 */
public void connecting(int node, long now) {
NodeConnectionState nodeConn = nodeState.get(node); 
if(nodeConn == null){
nodeState.put(node, new 
NodeConnectionState(ConnectionState.CONNECTING, now));
}else{
nodeConn.state = ConnectionState.CONNECTING;
nodeConn.lastConnectAttemptMs = now;  (This will capture and 
update last connection attempt) 

}
}
{code}


was (Author: bmis13):
Also, there is issue in my last patch.  I did not update the 
lastConnectAttemptMs...in connecting.

{code}
 /**
 * Enter the connecting state for the given node.
 * @param node The id of the node we are connecting to
 * @param now The current time.
 */
public void connecting(int node, long now) {
NodeConnectionState nodeConn = nodeState.get(node); 
if(nodeConn == null){
nodeState.put(node, new 
NodeConnectionState(ConnectionState.CONNECTING, now));
}else{
nodeConn.state = ConnectionState.CONNECTING;
nodeConn.lastConnectAttemptMs = now;  (This will capture and 
update last connection attempt) 

}
}
{code}

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavesh Mistry updated KAFKA-1642:
--
Affects Version/s: 0.8.1.1

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223779#comment-14223779
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

[~ewencp],

Thanks for looking into this really appreciate your response. 

Also, do you think rapid connect and disconnect is also due to incorrect Node 
state management ?  connecting method and initiateConnection also ?

Thanks,

Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223779#comment-14223779
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/25/14 1:31 AM:
-

[~ewencp],

Thanks for looking into this really appreciate your response. 

Also, do you think rapid connect and disconnect is also due to incorrect Node 
state management ?  connecting method and initiateConnection also ?

Also, Can we also take the defensive coding and have protection in this tight 
infinite loop to throttle CPU cycle if it ends up with start-end duration is 
below some xx ms.  This will actually prevent this issues.We had this issue 
on Prod so I just wanted to highlight the impact of 325% CPU and excessive 
logging. 

Thanks,

Bhavesh 


was (Author: bmis13):
[~ewencp],

Thanks for looking into this really appreciate your response. 

Also, do you think rapid connect and disconnect is also due to incorrect Node 
state management ?  connecting method and initiateConnection also ?

Thanks,

Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224039#comment-14224039
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

Here are some more cases to reproduce this simulating network connection issue 
with one of brokers only and still problem persist:

Case 1:  brokers connection is down (note according to ZK leader for partition 
still with b1 ) 
Have tree brokers: b1, b2, b3
1)  Start your daemon program and keep sending data to all the brokers and 
continue sending some data 
2)  Observed that you have data  netstat -a | grep b1|b2|b3   (keep pumping 
data for 5 minutes and observed normal behavior using top -pid or top -p 
java_pid )
3) Simulate a network connection or problem establishing new TCP connection via 
following as java program still continues to pump data aggressively (please 
note TCP connection to B1 still active and connected)
a)  sudo vi /etc/hosts 2) add entry "b1 127.0.0.1" 
b) /etc/init.d/network restart  after while (5 to 7 minutes you will see the 
issue but keep pumping data, and also repeat this for b2 it will be more CPU 
consumption) 
 
4) Under a heavy dumping data, now producer will try to establish new TCP 
connection to B1 and it will get connection refused (Note that CPU spikes up 
again and remain in state) just because could not establish.

Case 2) Simulate Firewall rule such as you are only allowed (4 TCP connection 
to each brokers) 

Do step 1,2 and 3 above.
4) use Iptable rule to reject 
To start an "enforcing fire wall":
iptables -A OUTPUT -p tcp -m tcp -d b1 --dport 9092 -j REJECT
5) Still pump data will while iptable rejects ( you will see CPU spike to to 
200% more depending on # of producer)
To "recover" :
iptables -D OUTPUT -p tcp -m tcp -d b1 --dport 9092 -j REJECT


> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224041#comment-14224041
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

[~ewencp],

I hope above steps will give you comprehensive steps to reproduce problems with 
run() method.  It would be really great if we can make the client more 
resilient and  robust so network and brokers instability does not cause CPU 
spikes and degrade application performance. Hence, I would strongly at least 
detect the time run(time) is taking and do based on some configuration, we can 
do CPU Throttling just to be more defensive or at lest detect that io thread is 
taking CPU cycle.

By the way the experimental patch still works for steps describe above. 


Thanks,

Bhavesh  

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224041#comment-14224041
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/25/14 4:37 AM:
-

[~ewencp],

I hope above steps will give you comprehensive steps to reproduce problems with 
run() method.  It would be really great if we can make the client more 
resilient and  robust so network and brokers instability does not cause CPU 
spikes and degrade application performance. Hence, I would strongly at least 
detect the time run(time) is taking and do based on some configuration, we can 
do CPU Throttling just to be more defensive or at lest detect that io thread is 
taking CPU cycle.

By the way the experimental patch still works for steps describe above as well 
due to hard coded back-off. 

Any time you have patch or any thing, please let me know I will test it out.  
Once thanks for your detail analysis.  

Please look into to ClusterConnectionStates and how it manage the state of node 
when disconnecting immediately . 

please look into  connecting(int node, long now) and this (I feel connecting 
needs to come before not after).
selector.connect(node.id(), new InetSocketAddress(node.host(), node.port()), 
this.socketSendBuffer, this.socketReceiveBuffer);
this.connectionStates.connecting(node.id(), now);

Thanks,

Bhavesh  


was (Author: bmis13):
[~ewencp],

I hope above steps will give you comprehensive steps to reproduce problems with 
run() method.  It would be really great if we can make the client more 
resilient and  robust so network and brokers instability does not cause CPU 
spikes and degrade application performance. Hence, I would strongly at least 
detect the time run(time) is taking and do based on some configuration, we can 
do CPU Throttling just to be more defensive or at lest detect that io thread is 
taking CPU cycle.

By the way the experimental patch still works for steps describe above. 


Thanks,

Bhavesh  

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224041#comment-14224041
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/25/14 4:39 AM:
-

[~ewencp],

I hope above steps will give you comprehensive steps to reproduce problems with 
run() method.  It would be really great if we can make the client more 
resilient and  robust so network and brokers instability does not cause CPU 
spikes and degrade application performance. Hence, I would strongly at least 
detect the time run(time) is taking and do some stats based on some 
configuration, we can do CPU Throttling (if need) just to be more defensive or 
at lest detect that io thread is taking CPU cycle.

By the way the experimental patch still works for steps describe above as well 
due to hard coded back-off. 

Any time you have patch or any thing, please let me know I will test it out ( 
you have my email id) .  Once again thanks for your detail analysis and looking 
at this at short notice.  

Please look into to ClusterConnectionStates and how it manage the state of node 
when disconnecting immediately . 

please look into  connecting(int node, long now) and this (I feel connecting 
needs to come before not after).
selector.connect(node.id(), new InetSocketAddress(node.host(), node.port()), 
this.socketSendBuffer, this.socketReceiveBuffer);
this.connectionStates.connecting(node.id(), now);

Thanks,

Bhavesh  


was (Author: bmis13):
[~ewencp],

I hope above steps will give you comprehensive steps to reproduce problems with 
run() method.  It would be really great if we can make the client more 
resilient and  robust so network and brokers instability does not cause CPU 
spikes and degrade application performance. Hence, I would strongly at least 
detect the time run(time) is taking and do based on some configuration, we can 
do CPU Throttling just to be more defensive or at lest detect that io thread is 
taking CPU cycle.

By the way the experimental patch still works for steps describe above as well 
due to hard coded back-off. 

Any time you have patch or any thing, please let me know I will test it out.  
Once thanks for your detail analysis.  

Please look into to ClusterConnectionStates and how it manage the state of node 
when disconnecting immediately . 

please look into  connecting(int node, long now) and this (I feel connecting 
needs to come before not after).
selector.connect(node.id(), new InetSocketAddress(node.host(), node.port()), 
this.socketSendBuffer, this.socketReceiveBuffer);
this.connectionStates.connecting(node.id(), now);

Thanks,

Bhavesh  

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224041#comment-14224041
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/25/14 4:40 AM:
-

[~ewencp],

I hope above steps will give you comprehensive steps to reproduce problems with 
run() method.  It would be really great if we can make the client more 
resilient and  robust so network and brokers instability does not cause CPU 
spikes and degrade application performance. Hence, I would strongly at least 
detect the time run(time) is taking and do some stats based on some 
configuration, we can do CPU Throttling (if need) just to be more defensive or 
at lest detect that io thread is taking CPU cycle.

By the way the experimental patch still works for steps describe above as well 
due to hard coded back-off. 

Any time you have patch or any thing, please let me know I will test it out ( 
you have my email id) .  Once again thanks for your detail analysis and looking 
at this at short notice.  

Please look into to ClusterConnectionStates and how it manage the state of node 
when disconnecting immediately . 

please look into  connecting(int node, long now) and this (I feel connecting 
needs to come before not after).
selector.connect(node.id(), new InetSocketAddress(node.host(), node.port()), 
this.socketSendBuffer, this.socketReceiveBuffer);
this.connectionStates.connecting(node.id(), now);

Also, I still feel that produce.close() is also needs to be looked at (join() 
method with come configuration time out)

Thanks,

Bhavesh  


was (Author: bmis13):
[~ewencp],

I hope above steps will give you comprehensive steps to reproduce problems with 
run() method.  It would be really great if we can make the client more 
resilient and  robust so network and brokers instability does not cause CPU 
spikes and degrade application performance. Hence, I would strongly at least 
detect the time run(time) is taking and do some stats based on some 
configuration, we can do CPU Throttling (if need) just to be more defensive or 
at lest detect that io thread is taking CPU cycle.

By the way the experimental patch still works for steps describe above as well 
due to hard coded back-off. 

Any time you have patch or any thing, please let me know I will test it out ( 
you have my email id) .  Once again thanks for your detail analysis and looking 
at this at short notice.  

Please look into to ClusterConnectionStates and how it manage the state of node 
when disconnecting immediately . 

please look into  connecting(int node, long now) and this (I feel connecting 
needs to come before not after).
selector.connect(node.id(), new InetSocketAddress(node.host(), node.port()), 
this.socketSendBuffer, this.socketReceiveBuffer);
this.connectionStates.connecting(node.id(), now);

Thanks,

Bhavesh  

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224046#comment-14224046
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

Also, Are you going to port back the back to 0.8.1.1 version as well ?  Please 
let me know also.

Thanks,
Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224046#comment-14224046
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/25/14 4:43 AM:
-

Also, Are you going to port back the patch to 0.8.1.1 version as well ?  Please 
let me know also.

Thanks,
Bhavesh 


was (Author: bmis13):
Also, Are you going to port back the back to 0.8.1.1 version as well ?  Please 
let me know also.

Thanks,
Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-24 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224041#comment-14224041
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/25/14 5:37 AM:
-

[~ewencp],

I hope above steps will give you comprehensive steps to reproduce problems with 
run() method.  It would be really great if we can make the client more 
resilient and  robust so network and brokers instability does not cause CPU 
spikes and degrade application performance. Hence, I would strongly at least 
detect the time run(time) is taking and do some stats based on some 
configuration, we can do CPU Throttling (if need) just to be more defensive or 
at lest detect that io thread is taking CPU cycle.

By the way the experimental patch still works for steps describe above as well 
due to hard coded back-off. 

Any time you have patch or any thing, please let me know I will test it out ( 
you have my email id) .  Once again thanks for your detail analysis and looking 
at this at short notice.  

Please look into to ClusterConnectionStates and how it manage the state of node 
when disconnecting immediately . 

please look into  connecting(int node, long now) and this (I feel connecting 
needs to come before not after).
selector.connect(node.id(), new InetSocketAddress(node.host(), node.port()), 
this.socketSendBuffer, this.socketReceiveBuffer);
this.connectionStates.connecting(node.id(), now);

Also, I still feel that produce.close() is also needs to be looked at (join() 
method with some configuration time out so thread does not hang)

Thanks,

Bhavesh  


was (Author: bmis13):
[~ewencp],

I hope above steps will give you comprehensive steps to reproduce problems with 
run() method.  It would be really great if we can make the client more 
resilient and  robust so network and brokers instability does not cause CPU 
spikes and degrade application performance. Hence, I would strongly at least 
detect the time run(time) is taking and do some stats based on some 
configuration, we can do CPU Throttling (if need) just to be more defensive or 
at lest detect that io thread is taking CPU cycle.

By the way the experimental patch still works for steps describe above as well 
due to hard coded back-off. 

Any time you have patch or any thing, please let me know I will test it out ( 
you have my email id) .  Once again thanks for your detail analysis and looking 
at this at short notice.  

Please look into to ClusterConnectionStates and how it manage the state of node 
when disconnecting immediately . 

please look into  connecting(int node, long now) and this (I feel connecting 
needs to come before not after).
selector.connect(node.id(), new InetSocketAddress(node.host(), node.port()), 
this.socketSendBuffer, this.socketReceiveBuffer);
this.connectionStates.connecting(node.id(), now);

Also, I still feel that produce.close() is also needs to be looked at (join() 
method with come configuration time out)

Thanks,

Bhavesh  

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-26 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226751#comment-14226751
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 11/26/14 8:08 PM:
-

[~ewencp],

Even setting long following parameter, states of system does get impacted does 
not matter what reconnect.backoff.ms and retry.backoff.ms is set to.  Once Node 
state is removed, the time out is set to 0.  Please see the following logs.  

#15 minutes
reconnect.backoff.ms=90
retry.backoff.ms=90

{code}
2014-11-26 11:01:27.898 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:02:27.903 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:03:27.903 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:04:27.903 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:05:27.904 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:06:27.905 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:07:27.906 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:08:27.908 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:09:27.908 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:10:27.909 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:11:27.909 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:12:27.910 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:13:27.911 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:14:27.912 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:15:27.914 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:00:27.613 [kafka-producer-network-thread | heartbeat] ERROR 
org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
producer I/O thread: 
 2014-11-26 11:00:27.613 [kafka-producer-network-thread | rawlog] ERROR 
org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
producer I/O thread: 
java.lang.IllegalStateException: No entry found for node -1
at 
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:131)
at 
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:120)
at 
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:407)
at 
org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:393)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:187)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:744)
java.lang.IllegalStateException: No entry found for node -3
at 
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:131)
at 
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:120)
at 
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:407)
at 
org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:393)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:187)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:744)
 2014-11-26 11:00:27.613 [kafka-producer-network-thread | heartbeat] ERROR 
org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
producer I/O thread: 
 2014-11-26 11:00:27.613 [kafka-producer-network-thread | error] ERROR 
org.apache.k

[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-26 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226751#comment-14226751
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

[~ewencp],

Even setting long following parameter, states of system does get impacted does 
not matter what reconnect.backoff.ms and retry.backoff.ms is set to.  Once Node 
state is removed, the time out is set to 0.  Please see the following logs.  

# 15 minutes
reconnect.backoff.ms=90
retry.backoff.ms=90

{code}
2014-11-26 11:01:27.898 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:02:27.903 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:03:27.903 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:04:27.903 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:05:27.904 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:06:27.905 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:07:27.906 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:08:27.908 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:09:27.908 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:10:27.909 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:11:27.909 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:12:27.910 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:13:27.911 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:14:27.912 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:15:27.914 Kafka Drop message topic=.rawlog
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 6 ms.
 2014-11-26 11:00:27.613 [kafka-producer-network-thread | heartbeat] ERROR 
org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
producer I/O thread: 
 2014-11-26 11:00:27.613 [kafka-producer-network-thread | rawlog] ERROR 
org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
producer I/O thread: 
java.lang.IllegalStateException: No entry found for node -1
at 
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:131)
at 
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:120)
at 
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:407)
at 
org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:393)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:187)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:744)
java.lang.IllegalStateException: No entry found for node -3
at 
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:131)
at 
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:120)
at 
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:407)
at 
org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:393)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:187)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:744)
 2014-11-26 11:00:27.613 [kafka-producer-network-thread | heartbeat] ERROR 
org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
producer I/O thread: 
 2014-11-26 11:00:27.613 [kafka-producer-network-thread | error] ERROR 
org.apache.kafka.clients.producer.internals.Sender - Uncaught 

[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-27 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228040#comment-14228040
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

[~soumen.sarkar],

Time out is one thing, but also IO Thread needs to be safe guarded to see how 
aggressive it is based on network and data to be send.  So it does not consume 
so much CPU cycle.

Thanks,
Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1, 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-30 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229233#comment-14229233
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

I just discovered yesterday that 0.8.1.1 release does not have new producer 
code base jar officially released jar (kafka-clients) although code is there in 
0.8.1.1 branch.   That created confusion about porting to  0.8.1.1.
  
Thanks,

Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-11-30 Thread Bhavesh Mistry (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavesh Mistry updated KAFKA-1642:
--
Affects Version/s: (was: 0.8.1.1)

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-12-02 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232061#comment-14232061
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

Hi  [~ewencp],

I will not have time to validate this patch till next week.  

Here is my comments:

1) You still have not address the Producer.close() method issue that in event 
of network connection lost or other events happens, IO thread will not be 
killed and close method hangs. In patch that I have provided, I had timeout for 
join method and interrupted IO thread.  I think we need similar for this.
2) Also, can we please add JMX monitoring for IO tread to know how quick it is 
running.  It will great to add this and run() method will report duration to 
metric.
{code}
try{
ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
if(bean.isThreadCpuTimeSupported() && 
bean.isThreadCpuTimeEnabled()){
this.ioTheadCPUTime = metrics.sensor("iothread-cpu");
this.ioTheadCPUTime.add("iothread-cpu-ms", "The Rate Of CPU 
Cycle used by iothead in NANOSECONDS", new Rate(TimeUnit.NANOSECONDS) {
public double measure(MetricConfig config, long now) {
return (now - metadata.lastUpdate()) / 1000.0;
}
}); 
}
}catch(Throwable th){
log.warn("Not able to set the CPU time... etc");
}
{code}

3)  Please check the timeout final value in *pollTimeout* if it is zero for 
constantly then we need to slow IO thread down.
4)  Defensive check in for back off  in run() method when IO thread is 
aggressive:  

5)  When all nodes are disconnected, do you still want to spin the IO Thread ?

6)  When you have a firewall rule that says "you can only have 2 concurrent TCP 
connections from Client to Brokers" and client still have live TCP connection 
to same not (Broker), but new TCP connection is rejected. Node State will be 
marked as Disconnected in initiateConnect ?  Are you handling that gracefully  ?

By the way, thank you very much for quick reply and with new patch.  I 
appreciate your help.

Thanks,

Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-12-02 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232061#comment-14232061
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 12/2/14 8:04 PM:


Hi  [~ewencp],

I will not have time to validate this patch till next week.  

Here is my comments:

1) Producer.close() method issue is not address with patch. In event of network 
connection lost or other events happens, IO thread will not be killed and close 
method hangs. In patch that I have provided, I had timeout for join method and 
interrupted IO thread.  I think we need similar solution.

2) Also, can we please add JMX monitoring for IO tread to know how quick it is 
running.  It will great to add this and run() method will report duration to 
metric in nano sec.
{code}
try{
ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
if(bean.isThreadCpuTimeSupported() && 
bean.isThreadCpuTimeEnabled()){
this.ioTheadCPUTime = metrics.sensor("iothread-cpu");
this.ioTheadCPUTime.add("iothread-cpu-ms", "The Rate Of CPU 
Cycle used by iothead in NANOSECONDS", new Rate(TimeUnit.NANOSECONDS) {
public double measure(MetricConfig config, long now) {
return (now - metadata.lastUpdate()) / 1000.0;
}
}); 
}
}catch(Throwable th){
log.warn("Not able to set the CPU time... etc");
}
{code}

3)  Please check the timeout final value in *pollTimeout* if it is zero for 
constantly then we need to slow IO thread down.

4)  Defensive check is need for back off  in run() method when IO thread is 
aggressive.  

{code}

while (running) {
long start = time.milliseconds();
try {
run(time.milliseconds());
} catch (Exception e) {
log.error("Uncaught error in kafka producer I/O thread: ", e);
}finally{
long durationInMs = time.milliseconds() - start;
// TODO Fix ME HERE GET DO exponential back-off sleep etc to 
prevent still CPU CYCLE HERE ?? How Much ...for the edge case...
if(durationInMs < 200){
if(client.isAllRegistredNodesAreDown()){
countinuousRetry++;
 /// TODO MAKE THIS CONSTANT CONFIGURATION. 
when do we rest this interval ? so we can try aggressive again...
sleepInMs = ((long) Math.pow(2, 
countinuousRetry) * 500);
}else{
sleepInMs =  500 ; 
countinuousRetry = 0;
}

// Wait until the desired next time arrives using 
nanosecond
// accuracy timer (wait(time) isn't accurate enough on 
most platforms) 
try {
// TODO SLEEP IS NOT GOOD SOLUTON..
Thread.sleep(sleepInMs);
} catch (InterruptedException e) {
log.error("While sleeping some 
one interupted this tread probally close method on prodcuer close () ");
}  
}
}
}
{code}

5)  When all nodes are disconnected, do you still want to spin the IO Thread ?

6)  When you have a firewall rule that says "you can only have 2 concurrent TCP 
connections from Client to Brokers" and client still have live TCP connection 
to same node (Broker), but new TCP connections are rejected. Node State will be 
marked as Disconnected in initiateConnect ?  Is this case handled gracefully  ?

By the way, thank you very much for quick reply and with new patch.  I 
appreciate your help.

Thanks,

Bhavesh 


was (Author: bmis13):
Hi  [~ewencp],

I will not have time to validate this patch till next week.  

Here is my comments:

1) You still have not address the Producer.close() method issue that in event 
of network connection lost or other events happens, IO thread will not be 
killed and close method hangs. In patch that I have provided, I had timeout for 
join method and interrupted IO thread.  I think we need similar for this.
2) Also, can we please add JMX monitoring for IO tread to know how quick it is 
running.  It will great to add this and run() method will report duration to 
metric.
{code}
try{
ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
if(bean.isThreadCpuTimeSupported() && 
bean.isThreadCpuTimeEnabled()){
this.ioTheadCPUTime = me

[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-12-04 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234297#comment-14234297
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

[~ewencp],

1) I will posted toward KAFKA-1788 and perhaps link the issue.
2) True , some sort of measure would be great 5,10...25 50, 95 and 99 
percentile would be great of execution time.  The point is just measure the 
duration report the rate of execution. 
3) Agree with what you are saying and I have observed same behavior.  But only 
recommendation is to add some intelligence to *timeouts* to detect if for long 
period and consecutive timeout is zero then there is problem. (Little more 
defensive) 
4) Again I agree with you point, but based in your previous comments you had 
mentioned that you may consider having back-off logic further up the chain. So 
I was just checking run() is best place to do that check.  Again, may be add 
intelligence here if you get consecutive “Exception” then likelihood of high 
CPU is high.  
 
5) Ok.  I agree what you are saying is data needs to be de-queue so more data 
can be en-queue even in event of network lost.  Is my understanding correct ?

6) All I am saying is network firewall rule (such as only 2 TCP connections per 
source host) or Brokers running out of File Descriptor so new connection to 
broker is not established but Client have live and active TCP connection to 
same broker.  But based on what I see in the method * initiateConnect* will 
mark the entire Broker or Node status as disconnected.  Is this expected 
behavior?  So question is: will client continue to send data ?

Thank you very much for entertaining my questions so far and I will test out 
the patch next week.

Thanks,

Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1788) producer record can stay in RecordAccumulator forever if leader is no available

2014-12-04 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234595#comment-14234595
 ] 

Bhavesh Mistry commented on KAFKA-1788:
---

We also need to fix the Producer Close which hangs JVM because io.join() thread 
does not exit.  Please refer to KAFKA-1642 for more details.  So Kakfa core Dev 
needs to give guidance on how to solve this problem.

Please see below comments from that linked issue.


1) Producer.close() method issue is not address with patch. In event of network 
connection lost or other events happens, IO thread will not be killed and close 
method hangs. In patch that I have provided, I had timeout for join method and 
interrupted IO thread. I think we need similar solution.

[~ewencp],

1. I'm specifically trying to address the CPU usage here. I realize from your 
perspective they are closely related since they're both can be triggered by a 
loss of network connectivity, but internally they're really separate issues – 
the CPU usage has to do with incorrect timeouts and the join() issues is due to 
the lack of timeouts on produce operations. That's why I pointed you toward 
KAFKA-1788. If a timeout is added for data in the producer, that would resolve 
the close issue as well since any data waiting in the producer would eventually 
timeout and the IO thread could exit. I think that's the cleanest solution 
since it solves both problems with a single setting (the amount of time your 
willing to wait before discarding data). If you think a separate timeout 
specifically for Producer.close() is worthwhile I'd suggest filing a separate 
JIRA for that.



> producer record can stay in RecordAccumulator forever if leader is no 
> available
> ---
>
> Key: KAFKA-1788
> URL: https://issues.apache.org/jira/browse/KAFKA-1788
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Jun Rao
>  Labels: newbie++
> Fix For: 0.8.3
>
>
> In the new producer, when a partition has no leader for a long time (e.g., 
> all replicas are down), the records for that partition will stay in the 
> RecordAccumulator until the leader is available. This may cause the 
> bufferpool to be full and the callback for the produced message to block for 
> a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-12-08 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239063#comment-14239063
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

[~stevenz3wu],

0.8.2 is very well tested and worked well under heavy load.  This bug is rare 
only happen when broker or network has issue.  We have been producing about 7 
to 10 TB per day using this new producer, so 0.8.2 is very safe to use in 
production.  It has survived  pick traffic of the year on large e-commerce 
site.  So I am fairly confident that  New Java API is indeed does true 
round-robin and much faster than Scala Based API.

[~ewencp],  I will verify the patch by end of this Friday, but do let me know 
your understanding based on my last comment.

Thanks,

Bhavesh

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-12-08 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239063#comment-14239063
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 12/9/14 6:53 AM:


[~stevenz3wu],

0.8.2 is very well tested and worked well under heavy load.  This bug is rare 
only happen when broker or network has issue.  We have been producing about 7 
to 10 TB per day using this new producer, so 0.8.2 is very safe to use in 
production.  It has survived  pick traffic of the year on large e-commerce 
site.  So I am fairly confident that  New Java API is indeed does true 
round-robin and much faster than Scala Based API.

[~ewencp],  I will verify the patch by end of this Friday, but do let me know 
your understanding based on my last comment. The goal is to rest this issue and 
cover all the use case.

Thanks,

Bhavesh


was (Author: bmis13):
[~stevenz3wu],

0.8.2 is very well tested and worked well under heavy load.  This bug is rare 
only happen when broker or network has issue.  We have been producing about 7 
to 10 TB per day using this new producer, so 0.8.2 is very safe to use in 
production.  It has survived  pick traffic of the year on large e-commerce 
site.  So I am fairly confident that  New Java API is indeed does true 
round-robin and much faster than Scala Based API.

[~ewencp],  I will verify the patch by end of this Friday, but do let me know 
your understanding based on my last comment.

Thanks,

Bhavesh

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.2
>Reporter: Bhavesh Mistry
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1788) producer record can stay in RecordAccumulator forever if leader is no available

2014-12-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249266#comment-14249266
 ] 

Bhavesh Mistry commented on KAFKA-1788:
---

[~jkreps],

Can we just take quick look at the NodeConnectionState ?  If all registered 
Nodes are down, then  exit it quickly or attempt to connect ?  This will have 
accurate status of al Nodes registered... (may we can do TCP ping for all 
nodes).  I am not sure if producer key is fixed to only one brokers then does 
it still have all Node status ?

Here is reference code:
https://github.com/apache/kafka/blob/0.8.2/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java
 
https://github.com/apache/kafka/blob/0.8.2/clients/src/main/java/org/apache/kafka/clients/NodeConnectionState.java

I did this in experimental path for o KAFKA-1642   (but used hard coded timeout 
for join method).  

Thanks,

Bhavesh 

> producer record can stay in RecordAccumulator forever if leader is no 
> available
> ---
>
> Key: KAFKA-1788
> URL: https://issues.apache.org/jira/browse/KAFKA-1788
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Jun Rao
>  Labels: newbie++
> Fix For: 0.8.3
>
>
> In the new producer, when a partition has no leader for a long time (e.g., 
> all replicas are down), the records for that partition will stay in the 
> RecordAccumulator until the leader is available. This may cause the 
> bufferpool to be full and the callback for the produced message to block for 
> a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1788) producer record can stay in RecordAccumulator forever if leader is no available

2014-12-16 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249266#comment-14249266
 ] 

Bhavesh Mistry edited comment on KAFKA-1788 at 12/17/14 1:26 AM:
-

[~jkreps],

Can we just take quick look at the NodeConnectionState ?  If all registered 
Nodes are down, then  exit it quickly or attempt to connect ?  This will have 
accurate status of all Nodes registered... (may we can do TCP ping for all 
nodes).  I am not sure if producer key is fixed to only one brokers then does 
it still have all Node status ?

Here is reference code:
https://github.com/apache/kafka/blob/0.8.2/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java
 
https://github.com/apache/kafka/blob/0.8.2/clients/src/main/java/org/apache/kafka/clients/NodeConnectionState.java

I did this in experimental path for o KAFKA-1642   (but used hard coded timeout 
for join method for IO thread and interrupted if it does not get killed ).  

Thanks,

Bhavesh 


was (Author: bmis13):
[~jkreps],

Can we just take quick look at the NodeConnectionState ?  If all registered 
Nodes are down, then  exit it quickly or attempt to connect ?  This will have 
accurate status of al Nodes registered... (may we can do TCP ping for all 
nodes).  I am not sure if producer key is fixed to only one brokers then does 
it still have all Node status ?

Here is reference code:
https://github.com/apache/kafka/blob/0.8.2/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java
 
https://github.com/apache/kafka/blob/0.8.2/clients/src/main/java/org/apache/kafka/clients/NodeConnectionState.java

I did this in experimental path for o KAFKA-1642   (but used hard coded timeout 
for join method).  

Thanks,

Bhavesh 

> producer record can stay in RecordAccumulator forever if leader is no 
> available
> ---
>
> Key: KAFKA-1788
> URL: https://issues.apache.org/jira/browse/KAFKA-1788
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Jun Rao
>  Labels: newbie++
> Fix For: 0.8.3
>
>
> In the new producer, when a partition has no leader for a long time (e.g., 
> all replicas are down), the records for that partition will stay in the 
> RecordAccumulator until the leader is available. This may cause the 
> bufferpool to be full and the callback for the produced message to block for 
> a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-12-23 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257686#comment-14257686
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---

[~ewencp],

Patch indeed solve the high CPU Problem reported by this bug.  I have tested 
all brokers down, one broker down and two broker down:

Here are some interesting Observations from YourKit:

0)  Overall, patch has also brought down  overall consumption in Normal Healthy 
or Happy case where every thing is up and running.  In old code (without 
patch), I use to see about 10% of overall CPU used by process by io threads (4 
in my case), it has reduce to 5% or less now with path.   

1)  When two brokers are down, then occasionally I see IO thread blocked. ( 
I did not see this when one brokers is down) 

{code}
kafka-producer-network-thread | rawlog [BLOCKED] [DAEMON]
org.apache.kafka.clients.producer.internals.Metadata.fetch() Metadata.java:70
java.lang.Thread.run() Thread.java:744
{code}

2)  record-error-rate metric remain zero despite following firewall rule.  
In my opinion, it should have called  
org.apache.kafka.clients.producer.Callback  but I did not see that happening 
either in either one or two brokers down.  Should I file another issue for this 
? Please confirm.

{code}
00100 reject tcp from me to b1.ip dst-port 9092
00200 reject tcp from me to b2.ip dst-port 9092
{code}

{code}
class LoggingCallBaHandler implements Callback {

/**
 * A callback method the user can implement to provide 
asynchronous
 * handling of request completion. This method will be called 
when the
 * record sent to the server has been acknowledged. Exactly one 
of the
 * arguments will be non-null.
 * 
 * @param metadata
 *The metadata for the record that was sent (i.e. 
the
 *partition and offset). Null if an error occurred.
 * @param exception
 *The exception thrown during processing of this 
record.
 *Null if no error occurred.
 */
@Override
public void onCompletion(RecordMetadata metadata, Exception 
exception) {
if(exception != null){
exception.printStackTrace();
}
}
}
{code}

I do not see any exception at all on consolenot sure why ?

3)  Application does NOT gracefully shutdown when there one or more brokers 
are down. (io Thread never exits this is know issue ) 

{code}
"SIGTERM handler" daemon prio=5 tid=0x7f8bd79e4000 nid=0x17907 waiting for 
monitor entry [0x00011e906000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bd5159000 nid=0x1cb0b waiting for 
monitor entry [0x00011e803000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdd147800 nid=0x15d0b waiting for 
monitor entry [0x00011e30a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdf82 nid=0x770b waiting for 
monitor entry [0x00011e207000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdc393800 nid=0x1c30f waiting for 
monitor entry [0x00011e104000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)

[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-12-23 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257686#comment-14257686
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 12/23/14 11:36 PM:
--

[~ewencp],

Patch indeed solve the high CPU Problem reported by this bug.  I have tested 
all brokers down, one broker down and two broker down (except for last use 
cases where one of the brokers runs out of Socket File Descriptor a rear case) :

Here are some interesting Observations from YourKit:

0)  Overall, patch has also brought down  overall consumption in Normal Healthy 
or Happy case where every thing is up and running.  In old code (without 
patch), I use to see about 10% of overall CPU used by process by io threads (4 
in my case), it has reduce to 5% or less now with path.   

1)  When two brokers are down, then occasionally I see IO thread blocked. ( 
I did not see this when one brokers is down) 

{code}
kafka-producer-network-thread | rawlog [BLOCKED] [DAEMON]
org.apache.kafka.clients.producer.internals.Metadata.fetch() Metadata.java:70
java.lang.Thread.run() Thread.java:744
{code}

2)  record-error-rate metric remain zero despite following firewall rule.  
In my opinion, it should have called  
org.apache.kafka.clients.producer.Callback  but I did not see that happening 
either in either one or two brokers down.  Should I file another issue for this 
? Please confirm.

{code}
00100 reject tcp from me to b1.ip dst-port 9092
00200 reject tcp from me to b2.ip dst-port 9092
{code}

{code}
class LoggingCallBaHandler implements Callback {

/**
 * A callback method the user can implement to provide 
asynchronous
 * handling of request completion. This method will be called 
when the
 * record sent to the server has been acknowledged. Exactly one 
of the
 * arguments will be non-null.
 * 
 * @param metadata
 *The metadata for the record that was sent (i.e. 
the
 *partition and offset). Null if an error occurred.
 * @param exception
 *The exception thrown during processing of this 
record.
 *Null if no error occurred.
 */
@Override
public void onCompletion(RecordMetadata metadata, Exception 
exception) {
if(exception != null){
exception.printStackTrace();
}
}
}
{code}

I do not see any exception at all on consolenot sure why ?

3)  Application does NOT gracefully shutdown when there one or more brokers 
are down. (io Thread never exits this is know issue ) 

{code}
"SIGTERM handler" daemon prio=5 tid=0x7f8bd79e4000 nid=0x17907 waiting for 
monitor entry [0x00011e906000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bd5159000 nid=0x1cb0b waiting for 
monitor entry [0x00011e803000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdd147800 nid=0x15d0b waiting for 
monitor entry [0x00011e30a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdf82 nid=0x770b waiting for 
monitor entry [0x00011e207000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdc393800 nid=0x1c30f waiting for 
monitor entry [0x00011e104000]
   java.lang.Thread.State: BLOCKED (on object

[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-12-23 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257686#comment-14257686
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 12/23/14 11:39 PM:
--

[~ewencp],

Patch indeed solve the high CPU Problem reported by this bug.  I have tested 
all brokers down, one broker down and two broker down (except for last use 
cases where one of the brokers runs out of Socket File Descriptor a rear case) 
:  I am sorry for last response, I got busy with other stuff so testing got 
delayed.

Here are some interesting Observations from YourKit:

0)  Overall, patch has also brought down  overall consumption in Normal Healthy 
or Happy case where every thing is up and running.  In old code (without 
patch), I use to see about 10% of overall CPU used by process by io threads (4 
in my case), it has reduce to 5% or less now with path.   

1)  When two brokers are down, then occasionally I see IO thread blocked. ( 
I did not see this when one brokers is down) 

{code}
kafka-producer-network-thread | rawlog [BLOCKED] [DAEMON]
org.apache.kafka.clients.producer.internals.Metadata.fetch() Metadata.java:70
java.lang.Thread.run() Thread.java:744
{code}

2)  record-error-rate metric remain zero despite following firewall rule.  
In my opinion, it should have called  
org.apache.kafka.clients.producer.Callback  but I did not see that happening 
either in either one or two brokers down.  Should I file another issue for this 
? Please confirm.

{code}
00100 reject tcp from me to b1.ip dst-port 9092
00200 reject tcp from me to b2.ip dst-port 9092
{code}

{code}
class LoggingCallBaHandler implements Callback {

/**
 * A callback method the user can implement to provide 
asynchronous
 * handling of request completion. This method will be called 
when the
 * record sent to the server has been acknowledged. Exactly one 
of the
 * arguments will be non-null.
 * 
 * @param metadata
 *The metadata for the record that was sent (i.e. 
the
 *partition and offset). Null if an error occurred.
 * @param exception
 *The exception thrown during processing of this 
record.
 *Null if no error occurred.
 */
@Override
public void onCompletion(RecordMetadata metadata, Exception 
exception) {
if(exception != null){
exception.printStackTrace();
}
}
}
{code}

I do not see any exception at all on consolenot sure why ?

3)  Application does NOT gracefully shutdown when there one or more brokers 
are down. (io Thread never exits this is know issue ) 

{code}
"SIGTERM handler" daemon prio=5 tid=0x7f8bd79e4000 nid=0x17907 waiting for 
monitor entry [0x00011e906000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bd5159000 nid=0x1cb0b waiting for 
monitor entry [0x00011e803000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdd147800 nid=0x15d0b waiting for 
monitor entry [0x00011e30a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdf82 nid=0x770b waiting for 
monitor entry [0x00011e207000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdc393800 nid=0x1c30f waiting 

[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-12-23 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257686#comment-14257686
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 12/23/14 11:41 PM:
--

[~ewencp],

Patch indeed solve the high CPU Problem reported by this bug.  I have tested 
all brokers down, one broker down and two broker down (except for last use 
cases where one of the brokers runs out of Socket File Descriptor a rear case) 
:  I am sorry for last response, I got busy with other stuff so testing got 
delayed.

Here are some interesting Observations from YourKit:

0)  Overall, patch has also brought down  overall consumption in Normal Healthy 
or Happy case where every thing is up and running.  In old code (without 
patch), I use to see about 10% of overall CPU used by process by io threads (4 
in my case), it has reduce to 5% or less now with path.   

1)  When two brokers are down, then occasionally I see IO thread blocked. ( 
I did not see this when one brokers is down) 

{code}
kafka-producer-network-thread | rawlog [BLOCKED] [DAEMON]
org.apache.kafka.clients.producer.internals.Metadata.fetch() Metadata.java:70
java.lang.Thread.run() Thread.java:744
{code}

2)  record-error-rate metric remain zero despite following firewall rule.  
In my opinion, it should have called  
org.apache.kafka.clients.producer.Callback  but I did not see that happening 
either in either one or two brokers down.  Should I file another issue for this 
? Please confirm.

{code}
00100 reject tcp from me to b1.ip dst-port 9092
00200 reject tcp from me to b2.ip dst-port 9092
{code}

{code}
class LoggingCallBaHandler implements Callback {

/**
 * A callback method the user can implement to provide 
asynchronous
 * handling of request completion. This method will be called 
when the
 * record sent to the server has been acknowledged. Exactly one 
of the
 * arguments will be non-null.
 * 
 * @param metadata
 *The metadata for the record that was sent (i.e. 
the
 *partition and offset). Null if an error occurred.
 * @param exception
 *The exception thrown during processing of this 
record.
 *Null if no error occurred.
 */
@Override
public void onCompletion(RecordMetadata metadata, Exception 
exception) {
if(exception != null){
exception.printStackTrace();
}
}
}
{code}

I do not see any exception at all on consolenot sure why ?

3)  Application does NOT gracefully shutdown when there one or more brokers 
are down. (io Thread never exits this is know issue ) 

{code}
"SIGTERM handler" daemon prio=5 tid=0x7f8bd79e4000 nid=0x17907 waiting for 
monitor entry [0x00011e906000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bd5159000 nid=0x1cb0b waiting for 
monitor entry [0x00011e803000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdd147800 nid=0x15d0b waiting for 
monitor entry [0x00011e30a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdf82 nid=0x770b waiting for 
monitor entry [0x00011e207000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdc393800 nid=0x1c30f waiting 

[jira] [Comment Edited] (KAFKA-1788) producer record can stay in RecordAccumulator forever if leader is no available

2014-12-23 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257691#comment-14257691
 ] 

Bhavesh Mistry edited comment on KAFKA-1788 at 12/23/14 11:44 PM:
--

HI All,

I did NOT try this patch, but when one or two or all brokers are down then I 
see application will not shutdown due to close() method:


Application does not gracefully shutdown when there one or more brokers are 
down. (io Thread never exits this is know issue ) 

{code}
"SIGTERM handler" daemon prio=5 tid=0x7f8bd79e4000 nid=0x17907 waiting for 
monitor entry [0x00011e906000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bd5159000 nid=0x1cb0b waiting for 
monitor entry [0x00011e803000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdd147800 nid=0x15d0b waiting for 
monitor entry [0x00011e30a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdf82 nid=0x770b waiting for 
monitor entry [0x00011e207000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdc393800 nid=0x1c30f waiting for 
monitor entry [0x00011e104000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"Thread-4" prio=5 tid=0x7f8bdb39f000 nid=0xa107 in Object.wait() 
[0x00011ea89000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.$$YJP$$wait(Native Method)
at java.lang.Object.wait(Object.java)
at java.lang.Thread.join(Thread.java:1280)
- locked <0x000705c2f650> (a 
org.apache.kafka.common.utils.KafkaThread)
at java.lang.Thread.join(Thread.java:1354)
at 
org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:322)
at 

"kafka-producer-network-thread | error" daemon prio=5 tid=0x7f8bd814e000 
nid=0x7403 runnable [0x00011e6c]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.KQueueArrayWrapper.$$YJP$$kevent0(Native Method)
at sun.nio.ch.KQueueArrayWrapper.kevent0(KQueueArrayWrapper.java)
at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:200)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:103)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x000705c109f8> (a sun.nio.ch.Util$2)
- locked <0x000705c109e8> (a java.util.Collections$UnmodifiableSet)
- locked <0x000705c105c8> (a sun.nio.ch.KQueueSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at org.apache.kafka.common.network.Selector.select(Selector.java:322)
at org.apache.kafka.common.network.Selector.poll(Selector.java:212)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
at java.lang.Thread.run(Thread.java:744)
{code}

Thanks,
Bhavesh 


was (Author: bmis13):
HI All,

I did NOT try this patch, but when one or two or all brokers are down then I 
see application will not shutdown due to close() method:


Application does not gracefully shutdown whe

[jira] [Commented] (KAFKA-1788) producer record can stay in RecordAccumulator forever if leader is no available

2014-12-23 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257691#comment-14257691
 ] 

Bhavesh Mistry commented on KAFKA-1788:
---

HI All,

I did NOT try this patch, but when one or two or all brokers are down then I 
see application will not shutdown due to close() method:


Application does not gracefully shutdown when there one or more brokers are 
down. (io Thread never exits this is know issue ) 

{code}
"SIGTERM handler" daemon prio=5 tid=0x7f8bd79e4000 nid=0x17907 waiting for 
monitor entry [0x00011e906000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bd5159000 nid=0x1cb0b waiting for 
monitor entry [0x00011e803000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdd147800 nid=0x15d0b waiting for 
monitor entry [0x00011e30a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdf82 nid=0x770b waiting for 
monitor entry [0x00011e207000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdc393800 nid=0x1c30f waiting for 
monitor entry [0x00011e104000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"Thread-4" prio=5 tid=0x7f8bdb39f000 nid=0xa107 in Object.wait() 
[0x00011ea89000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.$$YJP$$wait(Native Method)
at java.lang.Object.wait(Object.java)
at java.lang.Thread.join(Thread.java:1280)
- locked <0x000705c2f650> (a 
org.apache.kafka.common.utils.KafkaThread)
at java.lang.Thread.join(Thread.java:1354)
at 
org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:322)
at 

"kafka-producer-network-thread | error" daemon prio=5 tid=0x7f8bd814e000 
nid=0x7403 runnable [0x00011e6c]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.KQueueArrayWrapper.$$YJP$$kevent0(Native Method)
at sun.nio.ch.KQueueArrayWrapper.kevent0(KQueueArrayWrapper.java)
at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:200)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:103)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x000705c109f8> (a sun.nio.ch.Util$2)
- locked <0x000705c109e8> (a java.util.Collections$UnmodifiableSet)
- locked <0x000705c105c8> (a sun.nio.ch.KQueueSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at org.apache.kafka.common.network.Selector.select(Selector.java:322)
at org.apache.kafka.common.network.Selector.poll(Selector.java:212)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
at java.lang.Thread.run(Thread.java:744)
{code}


> producer record can stay in RecordAccumulator forever if leader is no 
> available
> ---
>
> Key: KAFKA-1788
> URL: https://issues.apache.org/jira/browse/KAFKA-1788
>   

[jira] [Comment Edited] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-12-23 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257686#comment-14257686
 ] 

Bhavesh Mistry edited comment on KAFKA-1642 at 12/24/14 12:01 AM:
--

[~ewencp],

Patch indeed solve the high CPU Problem reported by this bug.  I have tested 
all brokers down, one broker down and two broker down (except for last use 
cases where one of the brokers runs out of Socket File Descriptor a rear case) 
:  I am sorry for last response, I got busy with other stuff so testing got 
delayed.

Here are some interesting Observations from YourKit:

0)  Overall, patch has also brought down  overall consumption in Normal Healthy 
or Happy case where every thing is up and running.  In old code (without 
patch), I use to see about 10% of overall CPU used by process by io threads (4 
in my case), it has reduce to 5% or less now with path.   

1)  When two brokers are down, then occasionally I see IO thread blocked. ( 
I did not see this when one brokers is down) 

{code}
kafka-producer-network-thread | rawlog [BLOCKED] [DAEMON]
org.apache.kafka.clients.producer.internals.Metadata.fetch() Metadata.java:70
java.lang.Thread.run() Thread.java:744
{code}

2)  record-error-rate metric remain zero despite following firewall rule.  
In my opinion, it should have called  
org.apache.kafka.clients.producer.Callback  but I did not see that happening 
either in either one or two brokers down.  Should I file another issue for this 
? Please confirm.

{code}

sudo ipfw add reject tcp from me to b1.ip dst-port 9092
sudo ipfw add reject tcp from me to b2.ip dst-port 9092

00100 reject tcp from me to b1.ip dst-port 9092
00200 reject tcp from me to b2.ip dst-port 9092
{code}

{code}
class LoggingCallBaHandler implements Callback {

/**
 * A callback method the user can implement to provide 
asynchronous
 * handling of request completion. This method will be called 
when the
 * record sent to the server has been acknowledged. Exactly one 
of the
 * arguments will be non-null.
 * 
 * @param metadata
 *The metadata for the record that was sent (i.e. 
the
 *partition and offset). Null if an error occurred.
 * @param exception
 *The exception thrown during processing of this 
record.
 *Null if no error occurred.
 */
@Override
public void onCompletion(RecordMetadata metadata, Exception 
exception) {
if(exception != null){
exception.printStackTrace();
}
}
}
{code}

I do not see any exception at all on consolenot sure why ?

3)  Application does NOT gracefully shutdown when there one or more brokers 
are down. (io Thread never exits this is know issue ) 

{code}
"SIGTERM handler" daemon prio=5 tid=0x7f8bd79e4000 nid=0x17907 waiting for 
monitor entry [0x00011e906000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bd5159000 nid=0x1cb0b waiting for 
monitor entry [0x00011e803000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdd147800 nid=0x15d0b waiting for 
monitor entry [0x00011e30a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:744)

"SIGTERM handler" daemon prio=5 tid=0x7f8bdf82 nid=0x770b waiting for 
monitor entry [0x00011e207000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Shutdown.exit(Shutdown.java:212)
- waiting to lock <0x00070008f7c0> (a java.lang.Class for 
java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at 

  1   2   >