Re: block when multi-thread send msg using a single async producer on kafka 0.7.1

2013-03-05 Thread Neha Narkhede
That is because the zookeeper read is the slowest thing happening on the
producer. I think creating a new producer per thread is a better model.

Thanks,
Neha


On Tue, Mar 5, 2013 at 1:51 AM, mmLiu  wrote:

> I notice that if we call *send* method of an async producer in multi-thread
> environment, many of these threads will block at
> kafka.producer.ZKBrokerPartitionInfo.getBrokerInfo(ZKBrokerPartitionInfo<
> https://github.com/apache/kafka/blob/0.7.1/core/src/main/scala/kafka/producer/Producer.scala#L121
> >
> .scala:119)
>
> should I create a new Producer for each msg(or each thread)  I send?
>
> --
> Best Regards
>
> --
> 刘明敏 | mmLiu
>


Re: 0.8.0 HEAD 3/4/2013 performance jump?

2013-03-05 Thread Neha Narkhede
My take on this is that since 0.8 is very new, most people are going to be
on 0.7 for a while. When those people try out Kafka 0.8, it is best if they
see performance/guarantees similar to 0.7. Gradually, people are going to
want to move to 0.8 which is when we can revisit changing the default num
acks to 1.

Thanks,
Neha


On Tue, Mar 5, 2013 at 8:30 AM, Chris Curtin  wrote:

> What about making it explicit in the Producer Constructor? So in addition
> to passing the Config object you set the ACK rule?
>
> Someone with a working 0.7x application is going to have to make a number
> of changes anyway so this shouldn't significantly impact the upgrade
> process.
>
> I know you're pushing for 0.8 stability but it would make it obvious to
> everyone the impact of this important new feature.
>
> Chris
>
>
> On Tue, Mar 5, 2013 at 11:13 AM, Jun Rao  wrote:
>
> > Chris, Joe,
> >
> > Yes, the default ack is currently 0. Let me explain the ack mode a bit
> more
> > so that we are on the same page (details are covered in my ApachCon
> > presentation
> > http://www.slideshare.net/junrao/kafka-replication-apachecon2013) .
> There
> > are only 3 ack modes that make sense.
> >
> > ack=0: producer waits until the message is in the producer's socket
> buffer
> > ack=1: producer waits until the message is received by the leader
> > ack=-1: producer waits until the message is committed
> >
> > The tradeoffs are:
> >
> > ack=0: lowest latency; some data loss during broker failure
> > ack=1: lower latency; a few data loss during broker failure
> > ack=-1: low latency; no data loss during broker failure
> >
> > All cases work with replication factor 1, which is the default setting
> out
> > of box. With ack=1/-1, the producer may see some error when the leader
> > hasn't been being elected. However, the number of errors should be small
> > since typically leaders are elected very quickly.
> >
> > The argument for making the default ack 0 is that (1) this is the same
> > behavior you get in 0.7 and (2) the producer runs fastest in this mode.
> >
> > The argument for making the default ack 1 or -1 is that they gave you
> > better reliability.
> >
> > I am not sure what's the best thing to do that here since correct setting
> > really depends on the application. What do people feel?
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
>


Re: SimpleConsumer error conditions and handling

2013-03-05 Thread Neha Narkhede
Chris,

First of all, thanks for running these tests and reporting the issues. We
appreciate the help in testing Kafka 0.8.

>
> First test: connect to a Broker that is a 'copy' of the topic/partition but
> not leader. Get an error '5' which maps to
> 'ErrorMapping.LeaderNotAvailableCode'


Please can you file a JIRA for this ? Ideally, you should get back
 ErrorMapping.NotLeaderForPartitionCode.

>
> Knowing it was a clean shutdown would also allow me to treat the clean
> shutdown as a normal occurrence vs. an exception when something goes wrong.
>
> Kafka handles controlled/clean shutdown by changing the leaders without
introducing downtime for any partition. The expectation
from the client is that you build the logic that will, on any failure,
fetch the metadata and refetch the data. It actually doesn't matter if
the connection was refused or if Kafka gives back an error code.

Thanks,
Neha


Re: Kafka 0.8.0

2013-03-05 Thread Neha Narkhede
Snehalata,

Can you be more specific on the tools and configs you used ? For example,
how did you setup kafka, which producer tool did you run with what command
line options ? Also, will be helpful if you can attach the server/producer
logs.

Thanks,
Neha


On Mon, Mar 4, 2013 at 9:27 PM, Snehalata Nagaje <
snehalata.nag...@harbingergroup.com> wrote:

> Hello,
>
>
>
> I am using kafka 0.8.0 from git repository.
>
>
>
> And trying to post some messages, to server. It gives me error as
>
>
>
> Failed send messages after 3 tries.
>
>
>
> But when I saw, the log files, I am able to see message is posted 3 times.
>
>
>
> And also I am getting valid offset of same no of messages as 3
>
>
>
> Can you please provide any inputs on this.
>
>
>
> Thanks,
>
> Snehalata
>
>
> Disclaimer:
> This e-mail may contain Privileged/Confidential information and is
> intended only for the individual(s) named. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon
> this information by persons or entities other than the intended recipient
> is prohibited. Please notify the sender, if you have received this e-mail
> by mistake and delete it from your system. Information in this message that
> does not relate to the official business of the company shall be understood
> as neither given nor endorsed by it. E-mail transmission cannot be
> guaranteed to be secure or error-free. The sender does not accept liability
> for any errors or omissions in the contents of this message which arise as
> a result of e-mail transmission. If verification is required please request
> a hard-copy version. Visit us at http://www.harbingergroup.com/
>


Re: 0.8.0 HEAD 3/4/2013 performance jump?

2013-03-05 Thread Chris Curtin
What about making it explicit in the Producer Constructor? So in addition
to passing the Config object you set the ACK rule?

Someone with a working 0.7x application is going to have to make a number
of changes anyway so this shouldn't significantly impact the upgrade
process.

I know you're pushing for 0.8 stability but it would make it obvious to
everyone the impact of this important new feature.

Chris


On Tue, Mar 5, 2013 at 11:13 AM, Jun Rao  wrote:

> Chris, Joe,
>
> Yes, the default ack is currently 0. Let me explain the ack mode a bit more
> so that we are on the same page (details are covered in my ApachCon
> presentation
> http://www.slideshare.net/junrao/kafka-replication-apachecon2013) . There
> are only 3 ack modes that make sense.
>
> ack=0: producer waits until the message is in the producer's socket buffer
> ack=1: producer waits until the message is received by the leader
> ack=-1: producer waits until the message is committed
>
> The tradeoffs are:
>
> ack=0: lowest latency; some data loss during broker failure
> ack=1: lower latency; a few data loss during broker failure
> ack=-1: low latency; no data loss during broker failure
>
> All cases work with replication factor 1, which is the default setting out
> of box. With ack=1/-1, the producer may see some error when the leader
> hasn't been being elected. However, the number of errors should be small
> since typically leaders are elected very quickly.
>
> The argument for making the default ack 0 is that (1) this is the same
> behavior you get in 0.7 and (2) the producer runs fastest in this mode.
>
> The argument for making the default ack 1 or -1 is that they gave you
> better reliability.
>
> I am not sure what's the best thing to do that here since correct setting
> really depends on the application. What do people feel?
>
> Thanks,
>
> Jun
>
>
>


Re: 0.8.0 HEAD 3/4/2013 performance jump?

2013-03-05 Thread Colin Blower

I vote for ack=1.

It is a reasonable tradeoff between performance and reliability.

On 03/05/2013 08:13 AM, Jun Rao wrote:

Chris, Joe,

Yes, the default ack is currently 0. Let me explain the ack mode a bit more
so that we are on the same page (details are covered in my ApachCon
presentation
http://www.slideshare.net/junrao/kafka-replication-apachecon2013) . There
are only 3 ack modes that make sense.

ack=0: producer waits until the message is in the producer's socket buffer
ack=1: producer waits until the message is received by the leader
ack=-1: producer waits until the message is committed

The tradeoffs are:

ack=0: lowest latency; some data loss during broker failure
ack=1: lower latency; a few data loss during broker failure
ack=-1: low latency; no data loss during broker failure

All cases work with replication factor 1, which is the default setting out
of box. With ack=1/-1, the producer may see some error when the leader
hasn't been being elected. However, the number of errors should be small
since typically leaders are elected very quickly.

The argument for making the default ack 0 is that (1) this is the same
behavior you get in 0.7 and (2) the producer runs fastest in this mode.

The argument for making the default ack 1 or -1 is that they gave you
better reliability.

I am not sure what's the best thing to do that here since correct setting
really depends on the application. What do people feel?

Thanks,

Jun





--
*Colin Blower*
/Software Engineer/
Barracuda Networks Inc.
+1 408-342-5576 (o)

Copy, by Barracuda, helps you store, protect, and share all your amazing
things. Start today: www.copy.com.


Re: 0.8.0 HEAD 3/4/2013 performance jump?

2013-03-05 Thread Jun Rao
Chris, Joe,

Yes, the default ack is currently 0. Let me explain the ack mode a bit more
so that we are on the same page (details are covered in my ApachCon
presentation
http://www.slideshare.net/junrao/kafka-replication-apachecon2013) . There
are only 3 ack modes that make sense.

ack=0: producer waits until the message is in the producer's socket buffer
ack=1: producer waits until the message is received by the leader
ack=-1: producer waits until the message is committed

The tradeoffs are:

ack=0: lowest latency; some data loss during broker failure
ack=1: lower latency; a few data loss during broker failure
ack=-1: low latency; no data loss during broker failure

All cases work with replication factor 1, which is the default setting out
of box. With ack=1/-1, the producer may see some error when the leader
hasn't been being elected. However, the number of errors should be small
since typically leaders are elected very quickly.

The argument for making the default ack 0 is that (1) this is the same
behavior you get in 0.7 and (2) the producer runs fastest in this mode.

The argument for making the default ack 1 or -1 is that they gave you
better reliability.

I am not sure what's the best thing to do that here since correct setting
really depends on the application. What do people feel?

Thanks,

Jun


On Tue, Mar 5, 2013 at 6:36 AM, Joe Stein  wrote:

> Hi Chris, setting the ack default to 1 would mean folks would have to have
> a replica setup and configured otherwise starting a server from scratch
> from download would mean an error message to the user.   I hear your risk
> of not replicating though perhaps such a use case would be solved through
> auto discovery or some other feature/contribution for 0.9.
>
> I would be -1 on changing the default right now because new folks coming in
> on a build either as new or migrations simply leaving because they got an
> error or even running by just git clone ./sbt package and running (less
> steps in 0.8).  There are already expectations on 0.8 we should try to keep
> things settling too.
>
> Lastly, folks when they run and go live often will have a chef, cfengine,
> puppet, etc script for configuration
>
> Perhaps through some more operation documentation, comments and general
> communications to the community we can reduce risk.
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop 
> */
>
> On Tue, Mar 5, 2013 at 8:30 AM, Chris Curtin 
> wrote:
>
> > Hi Jun,
> >
> > I wasn't explicitly setting the ack anywhere.
> >
> > Am I reading the code correctly that in SyncProducerConfig.scala the
> > DefaultRequiredAcks is 0? Thus not waiting on the leader?
> >
> > Setting:  props.put("request.required.acks", "1"); causes the writes to
> go
> > back to the performance I was seeing before yesterday.
> >
> > Are you guys open to changing the default to be 1? The MongoDB
> Java-driver
> > guys made a similar default change at the end of last year because many
> > people didn't understand the risk that the default value of no-ack was
> > putting them in until they had a node failure. So they default to 'safe'
> > and let you decide what your risk level is vs. assuming you can lose
> data.
> >
> > Thanks,
> >
> > Chris
> >
> >
> >
> > On Tue, Mar 5, 2013 at 1:00 AM, Jun Rao  wrote:
> >
> > > Chris,
> > >
> > > On the producer side, are you using ack=0? Earlier, ack=0 is the same
> as
> > > ack=1, which means that the producer has to wait for the message to be
> > > received by the leader. More recently, we did the actual implementation
> > of
> > > ack=0, which means the producer doesn't wait for the message to reach
> the
> > > leader and therefore it is much faster.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Mar 4, 2013 at 12:01 PM, Chris Curtin  > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm definitely not complaining, but after upgrading to HEAD today my
> > > > producers are running much, much faster.
> > > >
> > > > Don't have any measurements, but last release I was able to tab
> windows
> > > to
> > > > stop a Broker before I could generate 500 partitioned messages. Now
> it
> > > > completes before I can get the Broker shutdown!
> > > >
> > > > Anything in particular you guys fixed?
> > > >
> > > > (I did remove all the files on disk per the email thread last week
> and
> > > > reset the ZooKeeper meta, but that shouldn't matter right?)
> > > >
> > > > Very impressive!
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > >
> > >
> >
>


Re: 0.8.0 HEAD 3/4/2013 performance jump?

2013-03-05 Thread Chris Curtin
Great points Joe.

What about something being written to INFO at startup about the replication
level being used?

Chris


On Tue, Mar 5, 2013 at 9:36 AM, Joe Stein  wrote:

> Hi Chris, setting the ack default to 1 would mean folks would have to have
> a replica setup and configured otherwise starting a server from scratch
> from download would mean an error message to the user.   I hear your risk
> of not replicating though perhaps such a use case would be solved through
> auto discovery or some other feature/contribution for 0.9.
>
> I would be -1 on changing the default right now because new folks coming in
> on a build either as new or migrations simply leaving because they got an
> error or even running by just git clone ./sbt package and running (less
> steps in 0.8).  There are already expectations on 0.8 we should try to keep
> things settling too.
>
> Lastly, folks when they run and go live often will have a chef, cfengine,
> puppet, etc script for configuration
>
> Perhaps through some more operation documentation, comments and general
> communications to the community we can reduce risk.
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop 
> */
>
> On Tue, Mar 5, 2013 at 8:30 AM, Chris Curtin 
> wrote:
>
> > Hi Jun,
> >
> > I wasn't explicitly setting the ack anywhere.
> >
> > Am I reading the code correctly that in SyncProducerConfig.scala the
> > DefaultRequiredAcks is 0? Thus not waiting on the leader?
> >
> > Setting:  props.put("request.required.acks", "1"); causes the writes to
> go
> > back to the performance I was seeing before yesterday.
> >
> > Are you guys open to changing the default to be 1? The MongoDB
> Java-driver
> > guys made a similar default change at the end of last year because many
> > people didn't understand the risk that the default value of no-ack was
> > putting them in until they had a node failure. So they default to 'safe'
> > and let you decide what your risk level is vs. assuming you can lose
> data.
> >
> > Thanks,
> >
> > Chris
> >
> >
> >
> > On Tue, Mar 5, 2013 at 1:00 AM, Jun Rao  wrote:
> >
> > > Chris,
> > >
> > > On the producer side, are you using ack=0? Earlier, ack=0 is the same
> as
> > > ack=1, which means that the producer has to wait for the message to be
> > > received by the leader. More recently, we did the actual implementation
> > of
> > > ack=0, which means the producer doesn't wait for the message to reach
> the
> > > leader and therefore it is much faster.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Mar 4, 2013 at 12:01 PM, Chris Curtin  > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm definitely not complaining, but after upgrading to HEAD today my
> > > > producers are running much, much faster.
> > > >
> > > > Don't have any measurements, but last release I was able to tab
> windows
> > > to
> > > > stop a Broker before I could generate 500 partitioned messages. Now
> it
> > > > completes before I can get the Broker shutdown!
> > > >
> > > > Anything in particular you guys fixed?
> > > >
> > > > (I did remove all the files on disk per the email thread last week
> and
> > > > reset the ZooKeeper meta, but that shouldn't matter right?)
> > > >
> > > > Very impressive!
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > >
> > >
> >
>


SimpleConsumer error conditions and handling

2013-03-05 Thread Chris Curtin
Hi,

0.8.0 HEAD from 3/4/2013.

As I think through building a robust SimpleConsumer I ran some failure
tests today and want to make sure I understand what is going on.

FYI I know that I should be doing a metadata lookup to find the leader, but
I wanted to see what happens if things are going well and the leader
changes between requests or I've cached the leader and try to connect
without the cost of a leader lookup.

First test: connect to a Broker that is a 'copy' of the topic/partition but
not leader. Get an error '5' which maps to
'ErrorMapping.LeaderNotAvailableCode'.

Why didn't I get ErrorMapping.NotLeaderForPartitionCode or something else
to tell me I'm not talking to the Leader? 'not available' implies something
is wrong with replication. But connecting to the leader Broker everything
works fine.

Second test: connect to a Broker that isn't the leader or a copy and I get
error 3, unknown topic or partition. Makes sense.

Third test: connect to the leader and while reading data, shutdown the
leader Broker via command line: I get some IOExceptions then Connection
Refused on the reconnect. (Note that the Connect Refused is the exception
raised, IOException was written to logs but not raised to my code.)

Not sure the best way to code to recover from this without assuming the
worst every time  Could there be some notice from Kafka that the connection
to the leader was closed due to a shutdown vs. getting Connection Refused
errors so I can respond differently? Something like 'Broker has closed
connection due to shutdown'. So I know to sleep for a second before going
through the leader lookup logic again? Or ideally have Kafka know it was a
clean shutdown and automatically transition to the new leader.

Knowing it was a clean shutdown would also allow me to treat the clean
shutdown as a normal occurrence vs. an exception when something goes wrong.

Thanks,

Chris


Re: 0.8.0 HEAD 3/4/2013 performance jump?

2013-03-05 Thread Joe Stein
Hi Chris, setting the ack default to 1 would mean folks would have to have
a replica setup and configured otherwise starting a server from scratch
from download would mean an error message to the user.   I hear your risk
of not replicating though perhaps such a use case would be solved through
auto discovery or some other feature/contribution for 0.9.

I would be -1 on changing the default right now because new folks coming in
on a build either as new or migrations simply leaving because they got an
error or even running by just git clone ./sbt package and running (less
steps in 0.8).  There are already expectations on 0.8 we should try to keep
things settling too.

Lastly, folks when they run and go live often will have a chef, cfengine,
puppet, etc script for configuration

Perhaps through some more operation documentation, comments and general
communications to the community we can reduce risk.

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop 
*/

On Tue, Mar 5, 2013 at 8:30 AM, Chris Curtin  wrote:

> Hi Jun,
>
> I wasn't explicitly setting the ack anywhere.
>
> Am I reading the code correctly that in SyncProducerConfig.scala the
> DefaultRequiredAcks is 0? Thus not waiting on the leader?
>
> Setting:  props.put("request.required.acks", "1"); causes the writes to go
> back to the performance I was seeing before yesterday.
>
> Are you guys open to changing the default to be 1? The MongoDB Java-driver
> guys made a similar default change at the end of last year because many
> people didn't understand the risk that the default value of no-ack was
> putting them in until they had a node failure. So they default to 'safe'
> and let you decide what your risk level is vs. assuming you can lose data.
>
> Thanks,
>
> Chris
>
>
>
> On Tue, Mar 5, 2013 at 1:00 AM, Jun Rao  wrote:
>
> > Chris,
> >
> > On the producer side, are you using ack=0? Earlier, ack=0 is the same as
> > ack=1, which means that the producer has to wait for the message to be
> > received by the leader. More recently, we did the actual implementation
> of
> > ack=0, which means the producer doesn't wait for the message to reach the
> > leader and therefore it is much faster.
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Mar 4, 2013 at 12:01 PM, Chris Curtin  > >wrote:
> >
> > > Hi,
> > >
> > > I'm definitely not complaining, but after upgrading to HEAD today my
> > > producers are running much, much faster.
> > >
> > > Don't have any measurements, but last release I was able to tab windows
> > to
> > > stop a Broker before I could generate 500 partitioned messages. Now it
> > > completes before I can get the Broker shutdown!
> > >
> > > Anything in particular you guys fixed?
> > >
> > > (I did remove all the files on disk per the email thread last week and
> > > reset the ZooKeeper meta, but that shouldn't matter right?)
> > >
> > > Very impressive!
> > >
> > > Thanks,
> > >
> > > Chris
> > >
> >
>


Re: 0.8.0 HEAD 3/4/2013 performance jump?

2013-03-05 Thread Chris Curtin
Hi Jun,

I wasn't explicitly setting the ack anywhere.

Am I reading the code correctly that in SyncProducerConfig.scala the
DefaultRequiredAcks is 0? Thus not waiting on the leader?

Setting:  props.put("request.required.acks", "1"); causes the writes to go
back to the performance I was seeing before yesterday.

Are you guys open to changing the default to be 1? The MongoDB Java-driver
guys made a similar default change at the end of last year because many
people didn't understand the risk that the default value of no-ack was
putting them in until they had a node failure. So they default to 'safe'
and let you decide what your risk level is vs. assuming you can lose data.

Thanks,

Chris



On Tue, Mar 5, 2013 at 1:00 AM, Jun Rao  wrote:

> Chris,
>
> On the producer side, are you using ack=0? Earlier, ack=0 is the same as
> ack=1, which means that the producer has to wait for the message to be
> received by the leader. More recently, we did the actual implementation of
> ack=0, which means the producer doesn't wait for the message to reach the
> leader and therefore it is much faster.
>
> Thanks,
>
> Jun
>
> On Mon, Mar 4, 2013 at 12:01 PM, Chris Curtin  >wrote:
>
> > Hi,
> >
> > I'm definitely not complaining, but after upgrading to HEAD today my
> > producers are running much, much faster.
> >
> > Don't have any measurements, but last release I was able to tab windows
> to
> > stop a Broker before I could generate 500 partitioned messages. Now it
> > completes before I can get the Broker shutdown!
> >
> > Anything in particular you guys fixed?
> >
> > (I did remove all the files on disk per the email thread last week and
> > reset the ZooKeeper meta, but that shouldn't matter right?)
> >
> > Very impressive!
> >
> > Thanks,
> >
> > Chris
> >
>