why kafka producer api use cpu so high?

2014-05-11 Thread
I write a very simple code , like this :
public class LogProducer {  
 
private Producer inner;  
public LogProducer() throws Exception{  
Properties properties = new Properties();  

properties.load(ClassLoader.getSystemResourceAsStream("producer.properties"));  
ProducerConfig config = new ProducerConfig(properties);  
inner = new Producer(config);  
}  
 
  
public void send(String topicName,String message) {  
if(topicName == null || message == null){  
return;  
}  
KeyedMessage km = new KeyedMessage(topicName,message);  
inner.send(km);  
}  
public void close(){  
inner.close();  
}  
  
/** 
 * @param args 
 */  
public static void main(String[] args) {  
LogProducer producer = null;  
try{  
producer = new LogProducer();  
int i=0;  
while(true){  
producer.send("test", "this is a 
sample");
  
}  
}catch(Exception e){  
e.printStackTrace();  
}finally{  
if(producer != null){  
producer.close();  
}  
}  
 
}  
 
}  
~~
and the producer.properties like this:
metadata.broker.list=127.0.0.1:9092
producer.type=async
serializer.class=kafka.serializer.StringEncoder
batch.num.messages=200
compression.codec=snappy

I run this procedure on linux, which is 4 core cpu , 16GB memory.
I find this procedure using one core cpu totally , this is "top" command ouput:


[root@localhost ~]# top
top - 13:51:09 up 5 days, 13:27,  3 users,  load average: 0.96, 0.48, 0.35
Tasks: 367 total,   3 running, 364 sleeping,   0 stopped,   0 zombie
Cpu0  :  7.0%us,  0.3%sy,  0.0%ni, 92.0%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  5.0%us,  0.0%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  5.0%us,  0.0%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16307528k total,  9398376k used,  6909152k free,   249952k buffers
Swap:  8224760k total,0k used,  8224760k free,  6071348k cached

why producer api use  cpu so high ? or maybe I make something wrong ?

by the way , the kafka version 0.8.0  .

Re: why kafka producer api use cpu so high?

2014-05-11 Thread Timothy Chen
What is your compression configuration for your producer?

One of the biggest CPU source for the producer is doing compression
and also checksuming.

Tim

On Sun, May 11, 2014 at 12:24 AM,   wrote:
> I write a very simple code , like this :
> public class LogProducer {
>
> private Producer inner;
> public LogProducer() throws Exception{
> Properties properties = new Properties();
> 
> properties.load(ClassLoader.getSystemResourceAsStream("producer.properties"));
> ProducerConfig config = new ProducerConfig(properties);
> inner = new Producer(config);
> }
>
>
> public void send(String topicName,String message) {
> if(topicName == null || message == null){
> return;
> }
> KeyedMessage km = new KeyedMessage String>(topicName,message);
> inner.send(km);
> }
> public void close(){
> inner.close();
> }
>
> /**
>  * @param args
>  */
> public static void main(String[] args) {
> LogProducer producer = null;
> try{
> producer = new LogProducer();
> int i=0;
> while(true){
> producer.send("test", "this is a 
> sample");
> }
> }catch(Exception e){
> e.printStackTrace();
> }finally{
> if(producer != null){
> producer.close();
> }
> }
>
> }
>
> }
> ~~
> and the producer.properties like this:
> metadata.broker.list=127.0.0.1:9092
> producer.type=async
> serializer.class=kafka.serializer.StringEncoder
> batch.num.messages=200
> compression.codec=snappy
>
> I run this procedure on linux, which is 4 core cpu , 16GB memory.
> I find this procedure using one core cpu totally , this is "top" command 
> ouput:
>
>
> [root@localhost ~]# top
> top - 13:51:09 up 5 days, 13:27,  3 users,  load average: 0.96, 0.48, 0.35
> Tasks: 367 total,   3 running, 364 sleeping,   0 stopped,   0 zombie
> Cpu0  :  7.0%us,  0.3%sy,  0.0%ni, 92.0%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  :  5.0%us,  0.0%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  :  5.0%us,  0.0%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  : 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  16307528k total,  9398376k used,  6909152k free,   249952k buffers
> Swap:  8224760k total,0k used,  8224760k free,  6071348k cached
>
> why producer api use  cpu so high ? or maybe I make something wrong ?
>
> by the way , the kafka version 0.8.0  .


Re: why kafka producer api use cpu so high?

2014-05-11 Thread cac...@gmail.com
This code says to send this message infinitely as fast as the machine can
thereby consuming as much of one CPU as possible. You may want to consider
an alternate test, perhaps one that records the number of messages sent in
a given time period.

> > public static void main(String[] args) {
> > LogProducer producer = null;
> > try{
> > producer = new LogProducer();
> > int i=0;
> > while(true){
> > producer.send("test", "this is a
> sample");
> > }
> > }catch(Exception e){
> > e.printStackTrace();
> > }finally{
> > if(producer != null){
> > producer.close();
> > }
> > }
> >
> > }
> >
> > }
>
>


Re: Re: why kafka producer api use cpu so high?

2014-05-11 Thread
because my app can generate 50MB log every second and one record of log is 
about 1KB , so I must send this log as fast as machine can.

this is very difficult,   on one hand  I want to send log as fast as possible,  
on the other hand  I want kafka producer api use cpu as low as possible.

if kafka api using cpu so high , it will impact my app.  

so can kafka solve this problem ? send 50MB log to kafka server every second 
,and using low cpu.





From: cac...@gmail.com
Date: 2014-05-11 16:52
To: users
Subject: Re: why kafka producer api use cpu so high?
This code says to send this message infinitely as fast as the machine can
thereby consuming as much of one CPU as possible. You may want to consider
an alternate test, perhaps one that records the number of messages sent in
a given time period.

> > public static void main(String[] args) {
> > LogProducer producer = null;
> > try{
> > producer = new LogProducer();
> > int i=0;
> > while(true){
> > producer.send("test", "this is a
> sample");
> > }
> > }catch(Exception e){
> > e.printStackTrace();
> > }finally{
> > if(producer != null){
> > producer.close();
> > }
> > }
> >
> > }
> >
> > }
>
>

Re: Re: why kafka producer api use cpu so high?

2014-05-11 Thread
I use snappy for compression.
but even without compression, this procedure also use 50% one core cpu.

when using snappy ,this procedure use 100% one core cpu.





From: Timothy Chen
Date: 2014-05-11 15:53
To: users@kafka.apache.org
Subject: Re: why kafka producer api use cpu so high?
What is your compression configuration for your producer?

One of the biggest CPU source for the producer is doing compression
and also checksuming.

Tim

On Sun, May 11, 2014 at 12:24 AM,   wrote:
> I write a very simple code , like this :
> public class LogProducer {
>
> private Producer inner;
> public LogProducer() throws Exception{
> Properties properties = new Properties();
> 
> properties.load(ClassLoader.getSystemResourceAsStream("producer.properties"));
> ProducerConfig config = new ProducerConfig(properties);
> inner = new Producer(config);
> }
>
>
> public void send(String topicName,String message) {
> if(topicName == null || message == null){
> return;
> }
> KeyedMessage km = new KeyedMessage String>(topicName,message);
> inner.send(km);
> }
> public void close(){
> inner.close();
> }
>
> /**
>  * @param args
>  */
> public static void main(String[] args) {
> LogProducer producer = null;
> try{
> producer = new LogProducer();
> int i=0;
> while(true){
> producer.send("test", "this is a 
> sample");
> }
> }catch(Exception e){
> e.printStackTrace();
> }finally{
> if(producer != null){
> producer.close();
> }
> }
>
> }
>
> }
> ~~
> and the producer.properties like this:
> metadata.broker.list=127.0.0.1:9092
> producer.type=async
> serializer.class=kafka.serializer.StringEncoder
> batch.num.messages=200
> compression.codec=snappy
>
> I run this procedure on linux, which is 4 core cpu , 16GB memory.
> I find this procedure using one core cpu totally , this is "top" command 
> ouput:
>
>
> [root@localhost ~]# top
> top - 13:51:09 up 5 days, 13:27,  3 users,  load average: 0.96, 0.48, 0.35
> Tasks: 367 total,   3 running, 364 sleeping,   0 stopped,   0 zombie
> Cpu0  :  7.0%us,  0.3%sy,  0.0%ni, 92.0%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  :  5.0%us,  0.0%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  :  5.0%us,  0.0%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  : 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  16307528k total,  9398376k used,  6909152k free,   249952k buffers
> Swap:  8224760k total,0k used,  8224760k free,  6071348k cached
>
> why producer api use  cpu so high ? or maybe I make something wrong ?
>
> by the way , the kafka version 0.8.0  .

Re: Re: why kafka producer api use cpu so high?

2014-05-11 Thread Eric Sammer
If a process is CPU bound (which this producer almost certainly will be),
it's going to consume as much CPU as it can to do what what it does. The
test is flawed. Because there's no end state, the while loop is just going
to burn CPU and, because it's singly threaded, it will take a single core.
A better test is to find out a rough number of events per second your
process needs to produce and write the testing accordingly. That will tell
you, when producing ~50MB/sec worth of events, this is how much the
producer will chew up.

The other thing worth pointing out is that sending a single event at a time
comes with a fair bit of overhead which, in turn, naturally drives up CPU
time. If you use the list form of send() you're going to be amortize the
cost of the RPC and other internal bits leading to more efficient use of
system resources. Again, it may still burn a full core because what you're
doing is CPU bound, but it will do more during that time.


On Sun, May 11, 2014 at 1:04 AM,  wrote:

> I use snappy for compression.
> but even without compression, this procedure also use 50% one core cpu.
>
> when using snappy ,this procedure use 100% one core cpu.
>
>
>
>
>
> From: Timothy Chen
> Date: 2014-05-11 15:53
> To: users@kafka.apache.org
> Subject: Re: why kafka producer api use cpu so high?
> What is your compression configuration for your producer?
>
> One of the biggest CPU source for the producer is doing compression
> and also checksuming.
>
> Tim
>
> On Sun, May 11, 2014 at 12:24 AM,   wrote:
> > I write a very simple code , like this :
> > public class LogProducer {
> >
> > private Producer inner;
> > public LogProducer() throws Exception{
> > Properties properties = new Properties();
> >
> properties.load(ClassLoader.getSystemResourceAsStream("producer.properties"));
> > ProducerConfig config = new ProducerConfig(properties);
> > inner = new Producer(config);
> > }
> >
> >
> > public void send(String topicName,String message) {
> > if(topicName == null || message == null){
> > return;
> > }
> > KeyedMessage km = new KeyedMessage String>(topicName,message);
> > inner.send(km);
> > }
> > public void close(){
> > inner.close();
> > }
> >
> > /**
> >  * @param args
> >  */
> > public static void main(String[] args) {
> > LogProducer producer = null;
> > try{
> > producer = new LogProducer();
> > int i=0;
> > while(true){
> > producer.send("test", "this is a
> sample");
> > }
> > }catch(Exception e){
> > e.printStackTrace();
> > }finally{
> > if(producer != null){
> > producer.close();
> > }
> > }
> >
> > }
> >
> > }
> > ~~
> > and the producer.properties like this:
> > metadata.broker.list=127.0.0.1:9092
> > producer.type=async
> > serializer.class=kafka.serializer.StringEncoder
> > batch.num.messages=200
> > compression.codec=snappy
> >
> > I run this procedure on linux, which is 4 core cpu , 16GB memory.
> > I find this procedure using one core cpu totally , this is "top" command
> ouput:
> >
> >
> > [root@localhost ~]# top
> > top - 13:51:09 up 5 days, 13:27,  3 users,  load average: 0.96, 0.48,
> 0.35
> > Tasks: 367 total,   3 running, 364 sleeping,   0 stopped,   0 zombie
> > Cpu0  :  7.0%us,  0.3%sy,  0.0%ni, 92.0%id,  0.7%wa,  0.0%hi,  0.0%si,
>  0.0%st
> > Cpu1  :  5.0%us,  0.0%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> > Cpu2  :  5.0%us,  0.0%sy,  0.0%ni, 95.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> > Cpu3  : 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> > Mem:  16307528k total,  9398376k used,  6909152k free,   249952k buffers
> > Swap:  8224760k total,0k used,  8224760k free,  6071348k cached
> >
> > why producer api use  cpu so high ? or maybe I make something wrong ?
> >
> > by the way , the kafka version 0.8.0  .
>



-- 
E. Sammer
CTO - ScalingData