Re: Retrieve most-recent-n messages from kafka topic

2013-07-22 Thread Shane Moriah
Thanks Johan,

I converted your code to vanilla java with a few small modifications
(included below in case anyone wants to use it) and ran it a few times.
 Seems like it works ok for the quick peek use case, but I wouldn't
recommend anyone rely on the accuracy of it since I find, at least in our
case, anywhere between 1-10% of the result lines to be corrupt on each
call. It looks like in those cases there are a few special chars at the
beginning, probably just a function of the header regex being imprecise as
you mentioned before.

private ListString getCurrentLinesFromKafka(String topicName, int
linesToFetch) throws UnsupportedEncodingException {
int bytesToFetch = linesToFetch*AVG_LINE_SIZE_IN_BYTES;
SimpleConsumer sConsumer = new SimpleConsumer(BROKER_NAME, 9092,
1000, 1024000);
long[] currentOffset = sConsumer.getOffsetsBefore(topicName,
PARTITION_ID, -1, 3);

Long offset = Math.max((currentOffset[0] - bytesToFetch),
(currentOffset[currentOffset.length - 1]));
FetchRequest fetchRequest = new FetchRequest(topicName, 0, offset,
bytesToFetch);
ByteBufferMessageSet msgBuffer = sConsumer.fetch(fetchRequest);

sConsumer.close();

String decStr = decodeBuffer(msgBuffer.getBuffer(), UTF-8);

String header = \u\u.?.?.?.?.?.?.?.?;
String[] strLst = decStr.split(header);

if (strLst.length  linesToFetch + 2) {
//take only the last linesToFetch of them, also ignore the
first and last since they may be corrupted
int end = strLst.length - 1;  //end is excluded in copyOfRange
int start = end - linesToFetch;
return Lists.newArrayList(Arrays.copyOfRange(strLst, start,
end));
} else if (strLst.length  2) {
//we can at least return something since we have more than the
corrupt first and last values
int end = strLst.length - 1;  //end is excluded in copyOfRange
int start = 1;//ignore the probably corrupt
first value
return Lists.newArrayList(Arrays.copyOfRange(strLst, start,
end));
} else {
return Lists.newArrayList();
}
}

private String decodeBuffer(ByteBuffer buffer, String encoding) throws
UnsupportedEncodingException {
Integer size;
try {
size = buffer.getInt();
} catch (Exception e) {
size = -1;
}

if (size  0) {
return No recent messages in topic;
}

byte[] bytes = buffer.array();
return new String(bytes, encoding);
}


On Fri, Jul 19, 2013 at 1:26 PM, Johan Lundahl johan.lund...@gmail.comwrote:

 Here is my current (very hacky) piece of code handling this part:

   def getLastMessages(fetchSize: Int = 1): List[String] = {
 val sConsumer = new SimpleConsumer(clusterip, 9092, 1000, 1024000)
 val currentOffset = sConsumer.getOffsetsBefore(topic, 0, -1, 3)

 val fetchRequest = new FetchRequest(topic, 0, (currentOffset(0) -
 fetchSize).max(currentOffset(currentOffset.length - 1)), fetchSize)
 val msgBuffer = sConsumer.fetch(fetchRequest)

 sConsumer.close()

 def decodeBuffer(buffer: ByteBuffer, encoding: String, arrSize: Int =
 msgBuffer.sizeInBytes.toInt - 6): String = {
   val size: Int = Option(try { buffer.getInt } catch { case e:
 Throwable = -1 }).getOrElse(-1)
   if (size  0) return sNo recent messages in topic $topic
   val bytes = new Array[Byte](arrSize)
   buffer.get(bytes)
   new String(bytes, encoding)
 }
 val decStr = decodeBuffer(msgBuffer.getBuffer, UTF-8)

 val header = \u\u.?.?.?.?.?.?.?.?
 val strLst = decStr.split(header).toList

 if (strLst.size  1) strLst.tail else strLst
   }


 On Fri, Jul 19, 2013 at 10:02 PM, Shane Moriah shanemor...@gmail.com
 wrote:

  I have a similar use-case to Johan.  We do stream processing off the
 topics
  in the backend but I'd like to expose a recent sample of a topic's data
 to
  a front-end web-app (just in a synchronous,
 click-a-button-and-see-results
  fashion).  If I can only start from the last file offset 500MB behind
  current and not (current - n bytes)  then the data might be very stale
  depending on how fast that topic is being filled. I could iterate from
 the
  last offset and keep only the final n, but that might mean processing
 500MB
  each time just to grab 10 messages.
 
  Johan, are you using just the simple FetchRequest?  Did you get around
 the
  InvalidMessageSizeError when you try to force a fetch offset different
 from
  those returned by getOffsetsBefore?  Or are you also starting from that
  last known offset and iterating forwards by the desired amount?
 
 
  On Fri, Jul 19, 2013 at 11:33 AM, Johan Lundahl johan.lund...@gmail.com
  wrote:
 
   I've had a similar use case where we want to browse and display the
  latest
   few messages in different topics in a webapp.
  
   This kind of works by doing as you 

Replacing brokers in a cluster (0.8)

2013-07-22 Thread Jason Rosenberg
I'm planning to upgrade a 0.8 cluster from 2 old nodes, to 3 new ones
(better hardware).  I'm using a replication factor of 2.

I'm thinking the plan should be to spin up the 3 new nodes, and operate as
a 5 node cluster for a while.  Then first remove 1 of the old nodes, and
wait for the partitions on the removed node to get replicated to the other
nodes.  Then, do the same for the other old node.

Does this sound sensible?

How does the cluster decide when to re-replicate partitions that are on a
node that is no longer available?  Does it only happen if/when new messages
arrive for that partition?  Is it on a partition by partition basis?

Or is it a cluster-level decision that a broker is no longer valid, in
which case all affected partitions would immediately get replicated to new
brokers as needed?

I'm just wondering how I will know when it will be safe to take down my
second old node, after the first one is removed, etc.

Thanks,

Jason


Apache Kafka Question

2013-07-22 Thread anantha.murugan
Hi,



I am planning to use Apache Kafka 0.8  to handle millions of messages per day. 
Now I need to form the environment, like



(i) How many Topics to be created?
(ii) How many partitions/replications to be created?
(iii) How many Brokers to be created?
(iv) How many consumer instances in consumer group?

(v) Topic or Queue? If topic whether we need to create multiple group Id as 
supposed to single one?



How we can go about it? Please clarify.

Thanks  Regards,
Anantha

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.

www.wipro.com


Re: Apache Kafka Question

2013-07-22 Thread Yavar Husain
Millions of messages per day (with each message being few bytes) is not
really 'Big Data'. Kafka has been tested for a million message per second.

The answer to all your question IMO is It depends.

You can start with a single instance (Single machine installation). Let
your producer send messages. Keep one broker. Increase to N brokers. When
you touch the upper limit add a server and repeat all the stuff.

Bench marking and scalability are aspects which you should try on your own
by playing with Kafka. Every use case is different. So performance metric
of one is not a global answer.

For your question on Topic or Queue, please read something about
Distributed Computing Pub/Sub, Message Queue's and other patterns which are
generic concepts and has nothing to do with Kafka. It again depends on your
use case.

Please read as to what topics in Kafka are? If you just go through the
definition of topics you would yourself answer your question within a
minute.

Replications and all would be next steps once you are done with a single
running instance of Kafka. So go ahead and get your hands dirty. You will
love Kafka :)

And yes, the most important thing: Please read the documentation first (bit
of theory) and then dive. There is no silver bullet.

Cheers,
Yavar
http://lnkd.in/GRrrDJ

On Mon, Jul 22, 2013 at 4:27 PM, anantha.muru...@wipro.com wrote:

 Hi,



 I am planning to use Apache Kafka 0.8  to handle millions of messages per
 day. Now I need to form the environment, like



 (i) How many Topics to be created?
 (ii) How many partitions/replications to be created?
 (iii) How many Brokers to be created?
 (iv) How many consumer instances in consumer group?

 (v) Topic or Queue? If topic whether we need to create multiple group Id
 as supposed to single one?



 How we can go about it? Please clarify.

 Thanks  Regards,
 Anantha

 Please do not print this email unless it is absolutely necessary.

 The information contained in this electronic message and any attachments
 to this message are intended for the exclusive use of the addressee(s) and
 may contain proprietary, confidential or privileged information. If you are
 not the intended recipient, you should not disseminate, distribute or copy
 this e-mail. Please notify the sender immediately and destroy all copies of
 this message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient
 should check this email and any attachments for the presence of viruses.
 The company accepts no liability for any damage caused by any virus
 transmitted by this email.

 www.wipro.com



Re: Replacing brokers in a cluster (0.8)

2013-07-22 Thread Glenn Nethercutt
This seems like the type of behavior I'd ultimately want from the 
controlled shutdown tool 
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-1.ControlledShutdown.


Currently, I believe the ShutdownBroker causes new leaders to be 
selected for any partition the dying node was leading, but I don't think 
it explicitly forces a rebalance for topics in which the dying node was 
just an ISR (in-sync replica set) member. Ostensibly, leadership 
elections are what we want to avoid, due to the Zookeeper chattiness 
that would ensue for ensembles with lots of partitions, but I'd wager 
we'd benefit from a reduction in rebalances too.  The preferred 
replication election tool also seems to have some similar level of 
control (manual selection of the preferred replicas), but still doesn't 
let you add/remove brokers from the ISR directly.  I know the 
kafka-reassign-partitions tool lets you specify a full list of 
partitions and replica assignment, but I don't know how easily 
integrated that will be with the lifecycle you described.


Anyone know if controlled shutdown is the right tool for this? Our 
devops team will certainly be interested in the canonical answer as well.


--glenn

On 07/22/2013 05:14 AM, Jason Rosenberg wrote:

I'm planning to upgrade a 0.8 cluster from 2 old nodes, to 3 new ones
(better hardware).  I'm using a replication factor of 2.

I'm thinking the plan should be to spin up the 3 new nodes, and operate as
a 5 node cluster for a while.  Then first remove 1 of the old nodes, and
wait for the partitions on the removed node to get replicated to the other
nodes.  Then, do the same for the other old node.

Does this sound sensible?

How does the cluster decide when to re-replicate partitions that are on a
node that is no longer available?  Does it only happen if/when new messages
arrive for that partition?  Is it on a partition by partition basis?

Or is it a cluster-level decision that a broker is no longer valid, in
which case all affected partitions would immediately get replicated to new
brokers as needed?

I'm just wondering how I will know when it will be safe to take down my
second old node, after the first one is removed, etc.

Thanks,

Jason





Re: Logo

2013-07-22 Thread Jay Kreps
Yeah, good point. I hadn't seen that before.

-Jay


On Mon, Jul 22, 2013 at 10:20 AM, Radek Gruchalski 
radek.gruchal...@portico.io wrote:

 296 looks familiar: https://www.nodejitsu.com/

 Kind regards,
 Radek Gruchalski
 radek.gruchal...@technicolor.com (mailto:radek.gruchal...@technicolor.com)
 | radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) |
 ra...@gruchalski.com (mailto:ra...@gruchalski.com)
 00447889948663

 Confidentiality:
 This communication is intended for the above-named person and may be
 confidential and/or legally privileged.
 If it has come to you in error you must take no action based on it, nor
 must you copy or show it to anyone; please delete/destroy and inform the
 sender immediately.


 On Monday, 22 July 2013 at 18:51, Jay Kreps wrote:

  Hey guys,
 
  We need a logo!
 
  I got a few designs from a 99 designs contest that I would like to put
  forward:
  https://issues.apache.org/jira/browse/KAFKA-982
 
  If anyone else would like to submit a design that would be great.
 
  Let's do a vote to choose one.
 
  -Jay




Re: Logo

2013-07-22 Thread Radek Gruchalski
296 looks familiar: https://www.nodejitsu.com/  

Kind regards,

Radek Gruchalski
radek.gruchal...@technicolor.com (mailto:radek.gruchal...@technicolor.com) | 
radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) | 

ra...@gruchalski.com
 (mailto:ra...@gruchalski.com)
00447889948663





Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.


On Monday, 22 July 2013 at 18:51, Jay Kreps wrote:

 Hey guys,
  
 We need a logo!
  
 I got a few designs from a 99 designs contest that I would like to put
 forward:
 https://issues.apache.org/jira/browse/KAFKA-982
  
 If anyone else would like to submit a design that would be great.
  
 Let's do a vote to choose one.
  
 -Jay  



Re: Replacing brokers in a cluster (0.8)

2013-07-22 Thread Jason Rosenberg
Is the kafka-reassign-partitions tool something I can experiment with now
(this will only be staging data, in the first go-round).  How does it work?
 Do I manually have to specify each replica I want to move?  This would be
cumbersome, as I have on the order of 100's of topicsOr does the tool
have the ability to specify all replicas on a particular broker?  How can I
easily check whether a partition has all its replicas in the ISR?

For some reason, I had thought there would be a default behavior, whereby a
replica could automatically be declared dead after a configurable timeout
period.

Re-assigning broker id's would not be ideal, since I have a scheme
currently whereby broker id's are auto-generated, from a hostname/ip, etc.
 I could make it work, but it's not my preference to override that!

Jason


On Mon, Jul 22, 2013 at 11:50 AM, Jun Rao jun...@gmail.com wrote:

 A replica's data won't be automatically moved to another broker where there
 are failures. This is because we don't know if the failure is transient or
 permanent. The right tool to use is the kafka-reassign-partitions tool. It
 hasn't been thoroughly tested tough. We hope to harden it in the final
 0.8.0 release.

 You can also replace a broker with a new server by keeping the same broker
 id. When the new server starts up, it will replica data from the leader.
 You know the data is fully replicated when both replicas are in ISR.

 Thanks,

 Jun


 On Mon, Jul 22, 2013 at 2:14 AM, Jason Rosenberg j...@squareup.com wrote:

  I'm planning to upgrade a 0.8 cluster from 2 old nodes, to 3 new ones
  (better hardware).  I'm using a replication factor of 2.
 
  I'm thinking the plan should be to spin up the 3 new nodes, and operate
 as
  a 5 node cluster for a while.  Then first remove 1 of the old nodes, and
  wait for the partitions on the removed node to get replicated to the
 other
  nodes.  Then, do the same for the other old node.
 
  Does this sound sensible?
 
  How does the cluster decide when to re-replicate partitions that are on a
  node that is no longer available?  Does it only happen if/when new
 messages
  arrive for that partition?  Is it on a partition by partition basis?
 
  Or is it a cluster-level decision that a broker is no longer valid, in
  which case all affected partitions would immediately get replicated to
 new
  brokers as needed?
 
  I'm just wondering how I will know when it will be safe to take down my
  second old node, after the first one is removed, etc.
 
  Thanks,
 
  Jason
 



Re: Replacing brokers in a cluster (0.8)

2013-07-22 Thread Scott Clasen
Here's a ruby cli that you can use to replace brokers...it shells out to
the kafka-reassign-partitions.sh tool after figuring out broker lists from
zk. Hope its useful.


#!/usr/bin/env ruby

require 'excon'
require 'json'
require 'zookeeper'

def replace(arr, o, n)
  arr.map{|v| v == o ? n : v }
end

if ARGV.length != 4
  puts Usage: bundle exec bin/replace-instance zkstr topic-name
old-broker-id new-broker-id
else
  zkstr = ARGV[0]
  zk = Zookeeper.new(zkstr)
  topic = ARGV[1]
  old = ARGV[2].to_i
  new = ARGV[3].to_i
  puts Replacing broker #{old} with #{new} on all partitions of topic
#{topic}

  current = JSON.parse(zk.get(:path = /brokers/topics/#{topic})[:data])
  replacements_array = []
  replacements = {partitions = replacements_array}
  current[partitions].each { |partition, brokers|
replacements_array.push({topic = topic, partition = partition.to_i,
replicas = replace(brokers, old, new)}) }

  replacement_json = JSON.generate(replacements)


  file = /tmp/replace-#{topic}-#{old}-#{new}
  if File.exist?(file)
File.delete file
  end
  File.open(file, 'w') { |f| f.write(replacement_json) }

  puts ./bin/kafka-reassign-partitions.sh --zookeeper #{zkstr}
--path-to-json-file #{file}
  system ./bin/kafka-reassign-partitions.sh --zookeeper #{zkstr}
--path-to-json-file #{file}





On Mon, Jul 22, 2013 at 10:40 AM, Jason Rosenberg j...@squareup.com wrote:

 Is the kafka-reassign-partitions tool something I can experiment with now
 (this will only be staging data, in the first go-round).  How does it work?
  Do I manually have to specify each replica I want to move?  This would be
 cumbersome, as I have on the order of 100's of topicsOr does the tool
 have the ability to specify all replicas on a particular broker?  How can I
 easily check whether a partition has all its replicas in the ISR?

 For some reason, I had thought there would be a default behavior, whereby a
 replica could automatically be declared dead after a configurable timeout
 period.

 Re-assigning broker id's would not be ideal, since I have a scheme
 currently whereby broker id's are auto-generated, from a hostname/ip, etc.
  I could make it work, but it's not my preference to override that!

 Jason


 On Mon, Jul 22, 2013 at 11:50 AM, Jun Rao jun...@gmail.com wrote:

  A replica's data won't be automatically moved to another broker where
 there
  are failures. This is because we don't know if the failure is transient
 or
  permanent. The right tool to use is the kafka-reassign-partitions tool.
 It
  hasn't been thoroughly tested tough. We hope to harden it in the final
  0.8.0 release.
 
  You can also replace a broker with a new server by keeping the same
 broker
  id. When the new server starts up, it will replica data from the leader.
  You know the data is fully replicated when both replicas are in ISR.
 
  Thanks,
 
  Jun
 
 
  On Mon, Jul 22, 2013 at 2:14 AM, Jason Rosenberg j...@squareup.com
 wrote:
 
   I'm planning to upgrade a 0.8 cluster from 2 old nodes, to 3 new ones
   (better hardware).  I'm using a replication factor of 2.
  
   I'm thinking the plan should be to spin up the 3 new nodes, and operate
  as
   a 5 node cluster for a while.  Then first remove 1 of the old nodes,
 and
   wait for the partitions on the removed node to get replicated to the
  other
   nodes.  Then, do the same for the other old node.
  
   Does this sound sensible?
  
   How does the cluster decide when to re-replicate partitions that are
 on a
   node that is no longer available?  Does it only happen if/when new
  messages
   arrive for that partition?  Is it on a partition by partition basis?
  
   Or is it a cluster-level decision that a broker is no longer valid, in
   which case all affected partitions would immediately get replicated to
  new
   brokers as needed?
  
   I'm just wondering how I will know when it will be safe to take down my
   second old node, after the first one is removed, etc.
  
   Thanks,
  
   Jason
  
 



Re: Logo

2013-07-22 Thread S Ahmed
Similar, yet different.  I like it!


On Mon, Jul 22, 2013 at 1:25 PM, Jay Kreps jay.kr...@gmail.com wrote:

 Yeah, good point. I hadn't seen that before.

 -Jay


 On Mon, Jul 22, 2013 at 10:20 AM, Radek Gruchalski 
 radek.gruchal...@portico.io wrote:

  296 looks familiar: https://www.nodejitsu.com/
 
  Kind regards,
  Radek Gruchalski
  radek.gruchal...@technicolor.com (mailto:
 radek.gruchal...@technicolor.com)
  | radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) |
  ra...@gruchalski.com (mailto:ra...@gruchalski.com)
  00447889948663
 
  Confidentiality:
  This communication is intended for the above-named person and may be
  confidential and/or legally privileged.
  If it has come to you in error you must take no action based on it, nor
  must you copy or show it to anyone; please delete/destroy and inform the
  sender immediately.
 
 
  On Monday, 22 July 2013 at 18:51, Jay Kreps wrote:
 
   Hey guys,
  
   We need a logo!
  
   I got a few designs from a 99 designs contest that I would like to put
   forward:
   https://issues.apache.org/jira/browse/KAFKA-982
  
   If anyone else would like to submit a design that would be great.
  
   Let's do a vote to choose one.
  
   -Jay
 
 



Re: Logo

2013-07-22 Thread David Harris

  
  
It should be a roach in honor of Franz Kafka's Metamorphosis.

On 7/22/2013 2:55 PM, S Ahmed wrote:


  Similar, yet different.  I like it!


On Mon, Jul 22, 2013 at 1:25 PM, Jay Kreps jay.kr...@gmail.com wrote:


  
Yeah, good point. I hadn't seen that before.

-Jay


On Mon, Jul 22, 2013 at 10:20 AM, Radek Gruchalski 
radek.gruchal...@portico.io wrote:



  296 looks familiar: https://www.nodejitsu.com/

Kind regards,
Radek Gruchalski
radek.gruchal...@technicolor.com (mailto:


radek.gruchal...@technicolor.com)


  | radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) |
ra...@gruchalski.com (mailto:ra...@gruchalski.com)
00447889948663

Confidentiality:
This communication is intended for the above-named person and may be
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor
must you copy or show it to anyone; please delete/destroy and inform the
sender immediately.


On Monday, 22 July 2013 at 18:51, Jay Kreps wrote:


  
Hey guys,

We need a logo!

I got a few designs from a 99 designs contest that I would like to put
forward:
https://issues.apache.org/jira/browse/KAFKA-982

If anyone else would like to submit a design that would be great.

Let's do a vote to choose one.

-Jay

  
  





  
  



-- 
  

David Harris
Bridge Interactive Group
email: dhar...@big-llc.com
cell: 404-831-7015
office: 888-901-0150

Bridge Software Products:
www.big-llc.com
www.realvaluator.com
www.rvleadgen.com

  



Re: Logo

2013-07-22 Thread David Arthur

  
  
I actually did this the last time a logo was discussed :)

https://docs.google.com/drawings/d/11WHfjkRGbSiZK6rRkedCrgmgFoP_vQ-QuWNENd4u7UY/edit

As it turns out, it was a dung beetle in the book (I thought it was
a roach as well).

-David

On 7/22/13 2:59 PM, David Harris wrote:


  
  It should be a roach in honor of Franz Kafka's Metamorphosis.
  
  On 7/22/2013 2:55 PM, S Ahmed wrote:
  
  
Similar, yet different.  I like it!


On Mon, Jul 22, 2013 at 1:25 PM, Jay Kreps jay.kr...@gmail.com wrote:



  Yeah, good point. I hadn't seen that before.

-Jay


On Mon, Jul 22, 2013 at 10:20 AM, Radek Gruchalski 
radek.gruchal...@portico.io wrote:


  
296 looks familiar: https://www.nodejitsu.com/

Kind regards,
Radek Gruchalski
radek.gruchal...@technicolor.com (mailto:

  
  radek.gruchal...@technicolor.com)

  
| radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) |
ra...@gruchalski.com (mailto:ra...@gruchalski.com)
00447889948663

Confidentiality:
This communication is intended for the above-named person and may be
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor
must you copy or show it to anyone; please delete/destroy and inform the
sender immediately.


On Monday, 22 July 2013 at 18:51, Jay Kreps wrote:



  Hey guys,

We need a logo!

I got a few designs from a 99 designs contest that I would like to put
forward:
https://issues.apache.org/jira/browse/KAFKA-982

If anyone else would like to submit a design that would be great.

Let's do a vote to choose one.

-Jay




  

  
  
  -- 

  
  David Harris
  Bridge Interactive Group
  email: dhar...@big-llc.com
  cell: 404-831-7015
  office: 888-901-0150
  
  Bridge Software Products:
  www.big-llc.com
  www.realvaluator.com
  www.rvleadgen.com
  


  



Messages TTL setting

2013-07-22 Thread arathi maddula
Hi,

We have a 3 node Kafka cluster. We want to increase the maximum amount of
time for which messages are saved in Kafka data logs.
Can we change the configuration on one node, stop it and start it and then
change the configuration of the next node?
Or should we stop all 3 nodes at a time, make configuration changes and
then restart all 3? Please suggest.

Thanks,
Arathi


Re: Messages TTL setting

2013-07-22 Thread Jay Kreps
Yes, all configuration changes should be possible to do one node at a time.

-Jay


On Mon, Jul 22, 2013 at 2:03 PM, arathi maddula arathimadd...@gmail.comwrote:

 Hi,

 We have a 3 node Kafka cluster. We want to increase the maximum amount of
 time for which messages are saved in Kafka data logs.
 Can we change the configuration on one node, stop it and start it and then
 change the configuration of the next node?
 Or should we stop all 3 nodes at a time, make configuration changes and
 then restart all 3? Please suggest.

 Thanks,
 Arathi



Recommended log level in prod environment.

2013-07-22 Thread Calvin Lei
The beta release comes with mostly trace level logging. Is this
recommended? I notice our cluster produce way too many logs. I set all the
level to info currently.


Re: Recommended log level in prod environment.

2013-07-22 Thread Calvin Lei
nah. We just changed it to INFO and will monitor the log. We have GBs of logs 
when it was at trace level. the kafka-request log was going crazy.


On Jul 22, 2013, at 10:54 PM, Jay Kreps jay.kr...@gmail.com wrote:

 We run at info too except when debugging stuff. Are you saying that info is
 too verbose?
 
 -Jay
 
 
 On Mon, Jul 22, 2013 at 6:43 PM, Calvin Lei ckp...@gmail.com wrote:
 
 The beta release comes with mostly trace level logging. Is this
 recommended? I notice our cluster produce way too many logs. I set all the
 level to info currently.
 



Re: Replacing brokers in a cluster (0.8)

2013-07-22 Thread Jun Rao
You can try kafka-reassign-partitions now. You do have to specify the new
replica assignment manually. We are improving that tool to make it more
automatic.

Thanks,

Jun


On Mon, Jul 22, 2013 at 10:40 AM, Jason Rosenberg j...@squareup.com wrote:

 Is the kafka-reassign-partitions tool something I can experiment with now
 (this will only be staging data, in the first go-round).  How does it work?
  Do I manually have to specify each replica I want to move?  This would be
 cumbersome, as I have on the order of 100's of topicsOr does the tool
 have the ability to specify all replicas on a particular broker?  How can I
 easily check whether a partition has all its replicas in the ISR?

 For some reason, I had thought there would be a default behavior, whereby a
 replica could automatically be declared dead after a configurable timeout
 period.

 Re-assigning broker id's would not be ideal, since I have a scheme
 currently whereby broker id's are auto-generated, from a hostname/ip, etc.
  I could make it work, but it's not my preference to override that!

 Jason


 On Mon, Jul 22, 2013 at 11:50 AM, Jun Rao jun...@gmail.com wrote:

  A replica's data won't be automatically moved to another broker where
 there
  are failures. This is because we don't know if the failure is transient
 or
  permanent. The right tool to use is the kafka-reassign-partitions tool.
 It
  hasn't been thoroughly tested tough. We hope to harden it in the final
  0.8.0 release.
 
  You can also replace a broker with a new server by keeping the same
 broker
  id. When the new server starts up, it will replica data from the leader.
  You know the data is fully replicated when both replicas are in ISR.
 
  Thanks,
 
  Jun
 
 
  On Mon, Jul 22, 2013 at 2:14 AM, Jason Rosenberg j...@squareup.com
 wrote:
 
   I'm planning to upgrade a 0.8 cluster from 2 old nodes, to 3 new ones
   (better hardware).  I'm using a replication factor of 2.
  
   I'm thinking the plan should be to spin up the 3 new nodes, and operate
  as
   a 5 node cluster for a while.  Then first remove 1 of the old nodes,
 and
   wait for the partitions on the removed node to get replicated to the
  other
   nodes.  Then, do the same for the other old node.
  
   Does this sound sensible?
  
   How does the cluster decide when to re-replicate partitions that are
 on a
   node that is no longer available?  Does it only happen if/when new
  messages
   arrive for that partition?  Is it on a partition by partition basis?
  
   Or is it a cluster-level decision that a broker is no longer valid, in
   which case all affected partitions would immediately get replicated to
  new
   brokers as needed?
  
   I'm just wondering how I will know when it will be safe to take down my
   second old node, after the first one is removed, etc.
  
   Thanks,
  
   Jason
  
 



Re: Recommended log level in prod environment.

2013-07-22 Thread Jun Rao
Yes, the kafka-request log logs every request (in TRACE). It's mostly for
debugging purpose. Other than that, there is no harm to turn it off.

Thanks,

Jun


On Mon, Jul 22, 2013 at 7:59 PM, Calvin Lei ckp...@gmail.com wrote:

 nah. We just changed it to INFO and will monitor the log. We have GBs of
 logs when it was at trace level. the kafka-request log was going crazy.


 On Jul 22, 2013, at 10:54 PM, Jay Kreps jay.kr...@gmail.com wrote:

  We run at info too except when debugging stuff. Are you saying that info
 is
  too verbose?
 
  -Jay
 
 
  On Mon, Jul 22, 2013 at 6:43 PM, Calvin Lei ckp...@gmail.com wrote:
 
  The beta release comes with mostly trace level logging. Is this
  recommended? I notice our cluster produce way too many logs. I set all
 the
  level to info currently.