Kafka running on Ceph

2016-05-23 Thread Connie Yang
Hi All,

Does anyone have any performance metrics running Kafka on Ceph?

I briefly gathered at the 2016 Kafka Summit that there's an ongoing work
between the Kafka community and RedHat in getting Kafka running
successfully on Ceph.  Is this correct?  If so, what's timeline for that?

Thanks
Connie


Re: Support customized security protocol

2016-01-19 Thread Connie Yang
@Ismael, what's the status of the SASL/PLAIN PR,
https://github.com/apache/kafka/pull/341?



On Tue, Jan 19, 2016 at 6:25 PM, tao xiao  wrote:

> The PR provides a new SASL mech but it doesn't provide a pluggable way to
> implement user's own logic to do authentication. So I don't think the PR
> will meet my need.
>
> I will write a KIP to open the discussion.
>
> p.s. Ismael, can you grant me the permission to create a KIP in Kafka
> space?
>
>
> On Wed, 20 Jan 2016 at 10:08 Ismael Juma  wrote:
>
> > Hi Tao,
> >
> > The other way would be to implement a SASL provider:
> >
> >
> >
> https://docs.oracle.com/javase/8/docs/technotes/guides/security/sasl/sasl-refguide.html#PROV
> >
> > This would still require Kafka to be changed, some of the changes are in
> > the following PR:
> >
> > https://github.com/apache/kafka/pull/341
> >
> > As per the discussion in the PR above, a KIP is also required.
> >
> > Ismael
> >
> > On Wed, Jan 20, 2016 at 1:48 AM, tao xiao  wrote:
> >
> > > Hi Ismael,
> > >
> > > BTW looks like I don't have the permission to add a KIP in Kafka space.
> > Can
> > > you please grant me the permission?
> > >
> > > On Wed, 20 Jan 2016 at 09:40 tao xiao  wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > Thank you for your reply. I am happy to have a writeup on this.
> > > >
> > > > Can you think of any other ways to make security protocol pluggable
> > > > instead of extending ChannelBuilder?
> > > >
> > > > On Wed, 20 Jan 2016 at 02:14 Ismael Juma  wrote:
> > > >
> > > >> Hi Tao,
> > > >>
> > > >> As you say, security protocols are not currently pluggable.
> > > >> `ChannelBuilder` is already an interface, but `SecurityProtocol` is
> an
> > > >> enum, which makes it hard for users to add additional security
> > > protocols.
> > > >> Changing this would probably require a KIP:
> > > >>
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > > >>
> > > >> Ismael
> > > >>
> > > >> On Mon, Jan 18, 2016 at 3:15 AM, tao xiao 
> > wrote:
> > > >>
> > > >> > Hi Kafka team,
> > > >> >
> > > >> > I want to know if I can plug-in my own security protocol to Kafka
> to
> > > >> > implement project specific authentication mechanism. The current
> > > >> supported
> > > >> > authentication protocols, SASL/GSSAPI and SSL, are not supported
> in
> > my
> > > >> > company and we have own security protocol to do authentication.
> > > >> >
> > > >> > Is it a good idea to make ChannelBuilder extensible so that I can
> > > >> implement
> > > >> > it with my own security channel?
> > > >> >
> > > >>
> > > >
> > >
> >
>


Re: [VOTE] 0.8.2.1 Candidate 1

2015-02-18 Thread Connie Yang
+1
On Feb 18, 2015 7:23 PM, Matt Narrell matt.narr...@gmail.com wrote:

 +1

  On Feb 18, 2015, at 7:56 PM, Jun Rao j...@confluent.io wrote:
 
  This is the first candidate for release of Apache Kafka 0.8.2.1. This
  only fixes one critical issue (KAFKA-1952) in 0.8.2.0.
 
  Release Notes for the 0.8.2.1 release
 
 https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/RELEASE_NOTES.html
 
  *** Please download, test and vote by Saturday, Feb 21, 7pm PT
 
  Kafka's KEYS file containing PGP keys we use to sign the release:
  http://kafka.apache.org/KEYS in addition to the md5, sha1
  and sha2 (SHA256) checksum.
 
  * Release artifacts to be voted upon (source and binary):
  https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/
 
  * Maven artifacts to be voted upon prior to release:
  https://repository.apache.org/content/groups/staging/
 
  * scala-doc
  https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/scaladoc/
 
  * java-doc
  https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/javadoc/
 
  * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.1 tag
 
 https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=c1b4c58531343dce80232e0122d085fc687633f6
 
  /***
 
  Thanks,
 
  Jun




Immediate Kafka security solution before 0.9 release

2015-02-17 Thread Connie Yang
Hi All,

Before Kafka 0.9 release is available, is there an immediate security
solution that we can leverage?

I've come across https://github.com/relango/kafka/tree/kafka_security and
the IP address filter patch from Kafka 0.8.3, which has not have a set
release date.

Thanks,
Connie


Re: How to recover from a disk full situation in Kafka cluster?

2014-07-21 Thread Connie Yang
It looks like org.apache.kafka.clients.producer.KafkaProducer is not
available in 0.8.1.1 client jar.  So, we'll stay with
kafka.javaapi.producer.Producer
implementation.

Thanks,
Connie


On Fri, Jul 18, 2014 at 5:13 PM, Neha Narkhede neha.narkh...@gmail.com
wrote:

 One option is to reduce the value of topic.metadata.refresh.interval.ms
 but
 the concern is that may end up sending too many requests to the brokers
 causing overhead. I'd suggest you use the new producer under
 org.apache.kafka.clients.producer.KafkaProducer that does not have the
 problem. It is fairly new but has gone through some level of testing now
 and we will appreciate any feedback/bugs that you can report back.

 Thanks,
 Neha


 On Fri, Jul 18, 2014 at 4:23 PM, Connie Yang cybercon...@gmail.com
 wrote:

  Sure, I will try to take a snapshot of the data distribution when it
  happens next time.
 
  Assuming the topic.metadata.refresh.interval.ms is the concern, how
 should
  we unstuck our producers?
 
  The important note from that the documentation seems to suggest that the
  metadata refresh will only happen AFTER the message is sent.
 
  The producer generally refreshes the topic metadata from brokers when
 there
  is a failure (partition missing, leader not available...). It will also
  poll regularly (default: every 10min so 60ms). If you set this to a
  negative value, metadata will only get refreshed on failure. If you set
  this to zero, the metadata will get refreshed after each message sent
 (not
  recommended). Important note: the refresh happen only AFTER the message
 is
  sent, so if the producer never sends a message the metadata is never
  refreshed
 
  Thanks,
  Connie
 
 
 
 
  On Fri, Jul 18, 2014 at 3:58 PM, Neha Narkhede neha.narkh...@gmail.com
  wrote:
 
   Does this mean that we should set auto.leader.rebalance.enable to
 true?
  
   I wouldn't recommend that just yet since it is not known to be very
  stable.
   You mentioned that only 2 brokers ever took the traffic and the
  replication
   factor is 2, makes me think that the producer stuck to 1 or few
  partitions
   instead of distributing the data over all the partitions. This is a
 known
   problem in the old producer where the default value of a config (
   topic.metadata.refresh.interval.ms), that controls how long a producer
   sticks to certain partitions, is 10 mins. So it effectively does not
   distribute data evenly across all partitions.
  
   If you see the same behavior next time, try to take a snapshot of data
   distribution across all partitions to verify this theory.
  
   Thanks,
   Neha
  
  
   On Thu, Jul 17, 2014 at 5:43 PM, Connie Yang cybercon...@gmail.com
   wrote:
  
It might appear that the data is not balanced, but it could be as a
   result
of the imbalanced leaders setting.
   
Does this mean that we should set auto.leader.rebalance.enable to
  true?
 Any other configuration we need to change as well?  As I mentioned
   before,
we use pretty much use the default setting.
   
All of our topics have replication factor of 2 (aka 2 copies per
   message).
   
We don't have the topic output when we had the problem, but here's
 our
topic output after we ran the kafka-preferred-replica-election.sh
 tool
  as
suggested:
   
$KAFKA_HOME/bin/kafka-topics.sh   --zookeeper
zkHost1:2181,zkHost2:2181,zkHost3:2181 --describe
 --topic=myKafkaTopic
Topic:myKafkaTopic PartitionCount:24 ReplicationFactor:2 Configs:
retention.ms=4320
Topic: myKafkTopic Partition: 0 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: myKafkTopic Partition: 1 Leader: 3 Replicas: 3,2 Isr: 3,2
Topic: myKafkTopic Partition: 2 Leader: 4 Replicas: 4,3 Isr: 3,4
Topic: myKafkTopic Partition: 3 Leader: 5 Replicas: 5,4 Isr: 5,4
Topic: myKafkTopic Partition: 4 Leader: 6 Replicas: 6,5 Isr: 5,6
Topic: myKafkTopic Partition: 5 Leader: 7 Replicas: 7,6 Isr: 6,7
Topic: myKafkTopic Partition: 6 Leader: 8 Replicas: 8,7 Isr: 7,8
Topic: myKafkTopic Partition: 7 Leader: 9 Replicas: 9,8 Isr: 9,8
Topic: myKafkTopic Partition: 8 Leader: 10 Replicas: 10,9 Isr: 10,9
Topic: myKafkTopic Partition: 9 Leader: 11 Replicas: 11,10 Isr: 11,10
Topic: myKafkTopic Partition: 10 Leader: 12 Replicas: 12,11 Isr:
 11,12
Topic: myKafkTopic Partition: 11 Leader: 13 Replicas: 13,12 Isr:
 12,13
Topic: myKafkTopic Partition: 12 Leader: 14 Replicas: 14,13 Isr:
 14,13
Topic: myKafkTopic Partition: 13 Leader: 15 Replicas: 15,14 Isr:
 14,15
Topic: myKafkTopic Partition: 14 Leader: 16 Replicas: 16,15 Isr:
 16,15
Topic: myKafkTopic Partition: 15 Leader: 17 Replicas: 17,16 Isr:
 16,17
Topic: myKafkTopic Partition: 16 Leader: 18 Replicas: 18,17 Isr:
 18,17
Topic: myKafkTopic Partition: 17 Leader: 19 Replicas: 19,18 Isr:
 18,19
Topic: myKafkTopic Partition: 18 Leader: 20 Replicas: 20,19 Isr:
 20,19
Topic: myKafkTopic Partition: 19 Leader: 21 Replicas: 21,20 Isr:
 20,21
Topic

Re: How to recover from a disk full situation in Kafka cluster?

2014-07-18 Thread Connie Yang
Sure, I will try to take a snapshot of the data distribution when it
happens next time.

Assuming the topic.metadata.refresh.interval.ms is the concern, how should
we unstuck our producers?

The important note from that the documentation seems to suggest that the
metadata refresh will only happen AFTER the message is sent.

The producer generally refreshes the topic metadata from brokers when there
is a failure (partition missing, leader not available...). It will also
poll regularly (default: every 10min so 60ms). If you set this to a
negative value, metadata will only get refreshed on failure. If you set
this to zero, the metadata will get refreshed after each message sent (not
recommended). Important note: the refresh happen only AFTER the message is
sent, so if the producer never sends a message the metadata is never
refreshed

Thanks,
Connie




On Fri, Jul 18, 2014 at 3:58 PM, Neha Narkhede neha.narkh...@gmail.com
wrote:

 Does this mean that we should set auto.leader.rebalance.enable to true?

 I wouldn't recommend that just yet since it is not known to be very stable.
 You mentioned that only 2 brokers ever took the traffic and the replication
 factor is 2, makes me think that the producer stuck to 1 or few partitions
 instead of distributing the data over all the partitions. This is a known
 problem in the old producer where the default value of a config (
 topic.metadata.refresh.interval.ms), that controls how long a producer
 sticks to certain partitions, is 10 mins. So it effectively does not
 distribute data evenly across all partitions.

 If you see the same behavior next time, try to take a snapshot of data
 distribution across all partitions to verify this theory.

 Thanks,
 Neha


 On Thu, Jul 17, 2014 at 5:43 PM, Connie Yang cybercon...@gmail.com
 wrote:

  It might appear that the data is not balanced, but it could be as a
 result
  of the imbalanced leaders setting.
 
  Does this mean that we should set auto.leader.rebalance.enable to true?
   Any other configuration we need to change as well?  As I mentioned
 before,
  we use pretty much use the default setting.
 
  All of our topics have replication factor of 2 (aka 2 copies per
 message).
 
  We don't have the topic output when we had the problem, but here's our
  topic output after we ran the kafka-preferred-replica-election.sh tool as
  suggested:
 
  $KAFKA_HOME/bin/kafka-topics.sh   --zookeeper
  zkHost1:2181,zkHost2:2181,zkHost3:2181 --describe --topic=myKafkaTopic
  Topic:myKafkaTopic PartitionCount:24 ReplicationFactor:2 Configs:
  retention.ms=4320
  Topic: myKafkTopic Partition: 0 Leader: 2 Replicas: 2,1 Isr: 1,2
  Topic: myKafkTopic Partition: 1 Leader: 3 Replicas: 3,2 Isr: 3,2
  Topic: myKafkTopic Partition: 2 Leader: 4 Replicas: 4,3 Isr: 3,4
  Topic: myKafkTopic Partition: 3 Leader: 5 Replicas: 5,4 Isr: 5,4
  Topic: myKafkTopic Partition: 4 Leader: 6 Replicas: 6,5 Isr: 5,6
  Topic: myKafkTopic Partition: 5 Leader: 7 Replicas: 7,6 Isr: 6,7
  Topic: myKafkTopic Partition: 6 Leader: 8 Replicas: 8,7 Isr: 7,8
  Topic: myKafkTopic Partition: 7 Leader: 9 Replicas: 9,8 Isr: 9,8
  Topic: myKafkTopic Partition: 8 Leader: 10 Replicas: 10,9 Isr: 10,9
  Topic: myKafkTopic Partition: 9 Leader: 11 Replicas: 11,10 Isr: 11,10
  Topic: myKafkTopic Partition: 10 Leader: 12 Replicas: 12,11 Isr: 11,12
  Topic: myKafkTopic Partition: 11 Leader: 13 Replicas: 13,12 Isr: 12,13
  Topic: myKafkTopic Partition: 12 Leader: 14 Replicas: 14,13 Isr: 14,13
  Topic: myKafkTopic Partition: 13 Leader: 15 Replicas: 15,14 Isr: 14,15
  Topic: myKafkTopic Partition: 14 Leader: 16 Replicas: 16,15 Isr: 16,15
  Topic: myKafkTopic Partition: 15 Leader: 17 Replicas: 17,16 Isr: 16,17
  Topic: myKafkTopic Partition: 16 Leader: 18 Replicas: 18,17 Isr: 18,17
  Topic: myKafkTopic Partition: 17 Leader: 19 Replicas: 19,18 Isr: 18,19
  Topic: myKafkTopic Partition: 18 Leader: 20 Replicas: 20,19 Isr: 20,19
  Topic: myKafkTopic Partition: 19 Leader: 21 Replicas: 21,20 Isr: 20,21
  Topic: myKafkTopic Partition: 20 Leader: 22 Replicas: 22,21 Isr: 22,21
  Topic: myKafkTopic Partition: 21 Leader: 23 Replicas: 23,22 Isr: 23,22
  Topic: myKafkTopic Partition: 22 Leader: 24 Replicas: 24,23 Isr: 23,24
  Topic: myKafkTopic Partition: 23 Leader: 1 Replicas: 1,24 Isr: 1,24
 
  Thanks,
  Connie
 
 
 
  On Thu, Jul 17, 2014 at 4:20 PM, Neha Narkhede neha.narkh...@gmail.com
  wrote:
 
   Connie,
  
   After we freed up the
   cluster disk space and adjusted the broker data retention policy, we
   noticed that the cluster partition was not balanced based on topic
  describe
   script came from Kafka 0.8.1.1 distribution.
  
   When you say the cluster was not balanced, did you mean the leaders or
  the
   data? The describe topic tool does not give information about data
 sizes,
   so I'm assuming you are referring to leader imbalance. If so, the right
   tool to run is kafka-preferred-replica-election.sh not partition
   reassignment. In general, assuming the partitions were

What happens to Kafka when ZK lost its quorum?

2014-05-13 Thread Connie Yang
Hi all,

Can Kafka producers, brokers and consumers still be processing messages and
functioning in their normal states if Zookeeper lost its quorum?

Thanks,
Connie


What happens to Kafka when ZK lost its quorum or becomes unstable?

2014-05-13 Thread Connie Yang
Hi,

Can the producers, brokers and consumers still be processing messages when
their ZK cluster lost its quorum or becomes unstable?  I know this is
rather general question as it may depends on what configuration these use.
 So, please enumerate all of those combinations.

Thanks,
Connie