A kafka web monitor

2015-03-23 Thread Wan Wei
We have make a simple web console to monitor some kafka informations like
consumer offset, logsize.

https://github.com/shunfei/DCMonitor


Hope you like it and offer your help to make it better :)
Regards
Flow


Re: Check topic exists after deleting it.

2015-03-23 Thread Harsha
DeleteTopic makes a node in zookeeper to let controller know that there is a 
topic up for deletion. This doesn’t immediately delete the topic it can take 
time depending if all the partitions of that topic are online and brokers are 
available as well.  Once all the Log files deleted zookeeper node gets deleted 
as well.
Also make sure you don’t have any producers or consumers are running while the 
topic deleting is going on.

-- 
Harsha


On March 23, 2015 at 1:29:50 AM, anthony musyoki (anthony.musy...@gmail.com) 
wrote:

On deleting a topic via TopicCommand.deleteTopic()  

I get Topic test-delete is marked for deletion.  

I follow up by checking if the topic exists by using  
AdminUtils.topicExists()  
which suprisingly returns true.  

I expected AdminUtils.TopicExists() to check both BrokerTopicsPath  
and DeleteTopicsPath before returning a verdict but it only checks  
BrokerTopicsPath  

Shouldn't a topic marked for deletion return false for topicExists() ?  


kafka audit

2015-03-23 Thread sunil kalva
Hi
What is best practice for adding audit feature in kafka, Is there any
framework available for enabling audit feature at producer and consumer
level and any UI frameworks for monitoring.

tx
SunilKalva


Re: Post on running Kafka at LinkedIn

2015-03-23 Thread Todd Palino
Emmanuel, if it helps, here's a little more detail on the hardware spec we
are using at the moment:

12 CPU (HT enabled)
64 GB RAM
16 x 1TB SAS drives (2 are used as a RAID-1 set for the OS, 14 are a
RAID-10 set just for the Kafka log segments)

We don't colocate any other applications with Kafka except for a couple
monitoring agents. Zookeeper runs on completely separate nodes.

I suggest starting with looking at the basics - watch the CPU, memory, and
disk IO usage on the brokers as you are testing. You're likely going to
find one of these three is the constraint. Disk IO in particular can lead
to a significant increase in produce latency as it increases even over
10-15% utilization.

-Todd


On Fri, Mar 20, 2015 at 3:41 PM, Emmanuel ele...@msn.com wrote:

 This is why I'm confused because I'm tryign to benchmark and I see numbers
 that seem pretty low to me...8000 events/sec on 2 brokers with 3CPU each
 and 5 partitions should be way faster than this and I don't know where to
 start to debug...
 the kafka-consumer-perf-test script gives me ridiculously low numbers
 (1000 events/sec/thread)

 So what could be causing this?
 From: jbringhu...@linkedin.com.INVALID
 To: users@kafka.apache.org
 Subject: Re: Post on running Kafka at LinkedIn
 Date: Fri, 20 Mar 2015 22:16:29 +

 Keep in mind that these brokers aren't really stressed too much at any
 given time -- we need to stay ahead of the capacity curve.
 Your message throughput will really just depend on what hardware you're
 using. However, in the past, we've benchmarked at 400,000 to more than
 800,000 messages / broker / sec, depending on configuration (
 https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
 ).

 -Jon
 On Mar 20, 2015, at 3:03 PM, Emmanuel ele...@msn.com wrote:800B
 messages / day = 9.26M messages / sec over 1100 brokers
 = ~8400 message / broker / sec
 Do I get this right?
 Trying to benchmark my own test cluster and that's what I see with 2
 brokers...Just wondering if my numbers are good or bad...


 Subject: Re: Post on running Kafka at LinkedIn
 From: cl...@kafka.guru
 Date: Fri, 20 Mar 2015 14:27:58 -0700
 To: users@kafka.apache.org

 Yep! We are growing :)

 -Clark

 Sent from my iPhone

 On Mar 20, 2015, at 2:14 PM, James Cheng jch...@tivo.com wrote:

 Amazing growth numbers.

 At the meetup on 1/27, Clark Haskins presented their Kafka usage at the
 time. It was:

 Bytes in: 120 TB
 Messages In: 585 million
 Bytes out: 540 TB
 Total brokers: 704

 In Todd's post, the current numbers:

 Bytes in: 175 TB (45% growth)
 Messages In: 800 billion (36% growth)
 Bytes out: 650 TB (20% growth)
 Total brokers: 1100 (56% growth)

 That much growth in just 2 months? Wowzers.

 -James

 On Mar 20, 2015, at 11:30 AM, James Cheng jch...@tivo.com wrote:

 For those who missed it:

 The Kafka Audit tool was also presented at the 1/27 Kafka meetup:
 http://www.meetup.com/http-kafka-apache-org/events/219626780/

 Recorded video is here, starting around the 40 minute mark:
 http://www.ustream.tv/recorded/58109076

 Slides are here:
 http://www.ustream.tv/recorded/58109076

 -James

 On Mar 20, 2015, at 9:47 AM, Todd Palino tpal...@gmail.com wrote:

 For those who are interested in detail on how we've got Kafka set up at
 LinkedIn, I have just published a new posted to our Engineering blog titled
 Running Kafka at Scale

   https://engineering.linkedin.com/kafka/running-kafka-scale

 It's a general overview of our current Kafka install, tiered architecture,
 audit, and the libraries we use for producers and consumers. You'll also be
 seeing more posts from the SRE team here in the coming weeks on deeper
 looks into both Kafka and Samza.

 Additionally, I'll be giving a talk at ApacheCon next month on running
 tiered Kafka architectures. If you're in Austin for that, please come by
 and check it out.

 -Todd






Re: kafka mirrormaker cross datacenter replication

2015-03-23 Thread Guozhang Wang
With MM, the source and destination cluster can choose different number of
partitions for the mirrored topic, and hence messages may be re-grouped in
the destination cluster. In addition, let's say you have two MMs piping
data to the same destination from two sources, the ordering of which
messages from the two source clusters arrive to the destination is
non-deterministic as well.

Guozhang

On Sun, Mar 22, 2015 at 9:40 PM, Kane Kim kane.ist...@gmail.com wrote:

 I thought that ordering is guaranteed within the partition or mirror maker
 doesn't preserve partitions?



 On Fri, Mar 20, 2015 at 4:44 PM, Guozhang Wang wangg...@gmail.com wrote:

  I think 1) will work, but not sure if about 2), since messages replicated
  at two clusters may be out of order as well, hence you may get message
  1,2,3,4 in one cluster and 1,3,4,2 in another. If you remember that your
  latest message processed in the first cluster is 2, when you fail over to
  the other cluster you may skip and miss message 3 and 4.
 
  Guozhang
 
  On Fri, Mar 20, 2015 at 1:07 PM, Kane Kim kane.ist...@gmail.com wrote:
 
   Also, as I understand we either have to mark all messages with unique
 IDs
   and then deduplicate them, or, if we want just store last message
  processed
   per partition we will need exactly the same partitions number in both
   clusters?
  
   On Fri, Mar 20, 2015 at 10:19 AM, Guozhang Wang wangg...@gmail.com
   wrote:
  
Not sure if transactional messaging will help in this case, as at
 least
   for
now it is still targeted within a single DC, i.e. a transaction is
  only
defined within a Kafka cluster, not across clusters.
   
Guozhang
   
On Fri, Mar 20, 2015 at 10:08 AM, Jon Bringhurst 
jbringhu...@linkedin.com.invalid wrote:
   
 Hey Kane,

 When mirrormakers loose offsets on catastrophic failure, you
  generally
 have two options. You can keep auto.offset.reset set to latest
 and
handle
 the loss of messages, or you can have it set to earliest and
 handle
   the
 duplication of messages.

 Although we try to avoid duplicate messages overall, when failure
happens,
 we (mostly) take the earliest path and deal with the duplication
 of
 messages.

 If your application doesn't treat messages as idempotent, you might
  be
 able to get away with something like couchbase or memcached with a
  TTL
 slightly higher than your Kafka retention time and use that to
 filter
 duplicates. Another pattern may be to deduplicate messages in
 Hadoop
before
 taking action on them.

 -Jon

 P.S. An option in the future might be

   
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka

 On Mar 19, 2015, at 5:32 PM, Kane Kim kane.ist...@gmail.com
 wrote:

  Hello,
 
  What's the best strategy for failover when using mirror-maker to
 replicate
  across datacenters? As I understand offsets in both datacenters
  will
   be
  different, how consumers should be reconfigured to continue
 reading
from
  the same point where they stopped without data loss and/or
   duplication?
 
  Thanks.


   
   
--
-- Guozhang
   
  
 
 
 
  --
  -- Guozhang
 




-- 
-- Guozhang


Re: Check topic exists after deleting it.

2015-03-23 Thread anthony musyoki
Thanks for your prompt response.

In my check for topicExists i will add a check for topic in
DeleteTopicsPath.



On Mon, Mar 23, 2015 at 8:21 PM, Harsha ka...@harsha.io wrote:

 Just to be clear, one needs to stop producers and consumers that
 writing/reading from a topic “test” if they are trying to delete that
 specific topic “test”. Not all producers and clients.

 --
 Harsha

 On March 23, 2015 at 10:13:47 AM, Harsha (harsh...@fastmail.fm) wrote:

  Currently we have auto.create.topics.enable set to true by default. If
 this is set true any one who is making TopicMetadataRequest can create a
 topic . As both producers and consumers can send TopicMetadataRequest which
 will create a topic if the above config is true. So while doing deletion if
 there is  producer or consumer running it can re-create a topic thats in
 deletion process. This issue going to be addressed in upcoming versions.
 Meanwhile if you are not creating topics via producer than turn this config
 off or stop producer and consumers while you are trying to delete a topic.
  --
 Harsha


 On March 23, 2015 at 9:57:53 AM, Grant Henke (ghe...@cloudera.com) wrote:

  What happens when producers or consumers are running while the topic
 deleting is going on?

 On Mon, Mar 23, 2015 at 10:02 AM, Harsha ka...@harsha.io wrote:

  DeleteTopic makes a node in zookeeper to let controller know that there
 is
  a topic up for deletion. This doesn’t immediately delete the topic it can
  take time depending if all the partitions of that topic are online and
  brokers are available as well. Once all the Log files deleted zookeeper
  node gets deleted as well.
  Also make sure you don’t have any producers or consumers are running
 while
  the topic deleting is going on.
 
  --
  Harsha
 
 
  On March 23, 2015 at 1:29:50 AM, anthony musyoki (
  anthony.musy...@gmail.com) wrote:
 
  On deleting a topic via TopicCommand.deleteTopic()
 
  I get Topic test-delete is marked for deletion.
 
  I follow up by checking if the topic exists by using
  AdminUtils.topicExists()
  which suprisingly returns true.
 
  I expected AdminUtils.TopicExists() to check both BrokerTopicsPath
  and DeleteTopicsPath before returning a verdict but it only checks
  BrokerTopicsPath
 
  Shouldn't a topic marked for deletion return false for topicExists() ?
 



 --
 Grant Henke
 Solutions Consultant | Cloudera
 ghe...@cloudera.com | 920-980-8979
 twitter.com/ghenke http://twitter.com/gchenke |
 linkedin.com/in/granthenke




Kafka Sync Producer threads(ack=0) are blocked

2015-03-23 Thread ankit tyagi
Hi All,

Currently we are using kafka_2.8.0-0.8.0-beta1 in our production system. I
am using sync producer with ack=0 to send the events to  broker.

but I am seeing  most of my producer threads are blocked.

jmsListnerTaskExecutor-818 prio=10 tid=0x7f3f5c05a800 nid=0x1719
waiting for monitor entry [0x7f405935e000]
 *  java.lang.Thread.State: BLOCKED (on object monitor)*
*at
kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:53)*
*- waiting to lock 0x000602358ee8 (a java.lang.Object)*
at kafka.producer.Producer.send(Producer.scala:74)
at kafka.javaapi.producer.Producer.send(Producer.scala:32)
at
com.snapdeal.coms.kafka.KafkaProducer.send(KafkaProducer.java:135)
at
com.snapdeal.coms.kafka.KafkaEventPublisher.publishCOMSEventOnESB(KafkaEventPublisher.java:61)
at
com.snapdeal.coms.service.EventsPublishingService.publishStateChangeEvent(EventsPublishingService.java:88)
at
com.snapdeal.coms.publisher.core.PublisherEventPublishingService.publishUploadIdState(PublisherEventPublishingService.java:46)
at
com.snapdeal.coms.publisher.splitter.VendorProductUpdateSplitter.split(VendorProductUpdateSplitter.java:112)
at sun.reflect.GeneratedMethodAccessor227.invoke(Unknown Source)



[image: Inline image 1]

Jvisualvm also shows that most of the time producer threads are in blocked
state though I don't see any exception in kafka sever logs. Any insight??


Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream

2015-03-23 Thread James Cheng
I created a wiki page that lists all the MySQL replication options that people 
posted, plus a couple others. People may/may not find it useful.

https://github.com/wushujames/mysql-cdc-projects/wiki

I wasn't sure where to host it, so I put it up on a Github Wiki.

-James

On Mar 17, 2015, at 11:09 PM, Xiao lixiao1...@gmail.com wrote:

 Linkedin Gabblin compaction tool is using Hive to perform the compaction. 
 Does it mean Lumos is replaced?
 
 Confused…
 
 On Mar 17, 2015, at 10:00 PM, Xiao lixiao1...@gmail.com wrote:
 
 Hi, all,
 
 Do you know whether Linkedin plans to open source Lumos in the near future?
 
 I found the answer from Qiao Lin’s post about replication from Oracle/mySQL 
 to Hadoop.
 
  - https://engineering.linkedin.com/data-ingestion/gobblin-big-data-ease
 
 At the source side, it can be DataBus-based or file based.
 
 At the target side, it is Lumos to rebuild the snapshots due to inability to 
 do an update/delete in Hadoop.
 
 The slides about Lumos:
  http://www.slideshare.net/Hadoop_Summit/th-220p230-cramachandranv1
 The talk about Lumos:
  https://www.youtube.com/watch?v=AGlRjlrNDYk
 
 Event publishing is different from database replication. Kafka is used for 
 change publishing or maybe also used for sending changes (recorded in files).
 
 Thanks,
 
 Xiao Li
 
 On Mar 17, 2015, at 7:26 PM, Arya Ketan ketan.a...@gmail.com wrote:
 
 AFAIK , linkedin uses databus to do the same. Aesop is built on top of
 databus , extending its beautiful capabilities to mysql n hbase
 On Mar 18, 2015 7:37 AM, Xiao lixiao1...@gmail.com wrote:
 
 Hi, all,
 
 Do you know how Linkedin team publishes changed rows in Oracle to Kafka? I
 believe they already knew the whole problem very well.
 
 Using triggers? or directly parsing the log? or using any Oracle
 GoldenGate interfaces?
 
 Any lesson or any standard message format? Could the Linkedin people share
 it with us? I believe it can help us a lot.
 
 Thanks,
 
 Xiao Li
 
 
 On Mar 17, 2015, at 12:26 PM, James Cheng jch...@tivo.com wrote:
 
 This is a great set of projects!
 
 We should put this list of projects on a site somewhere so people can
 more easily see and refer to it. These aren't Kafka-specific, but most seem
 to be MySQL CDC. Does anyone have a place where they can host a page?
 Preferably a wiki, so we can keep it up to date easily.
 
 -James
 
 On Mar 17, 2015, at 8:21 AM, Hisham Mardam-Bey 
 hisham.mardam...@gmail.com wrote:
 
 Pretty much a hijack / plug as well (=
 
 https://github.com/mardambey/mypipe
 
 MySQL binary log consumer with the ability to act on changed rows and
 publish changes to different systems with emphasis on Apache Kafka.
 
 Mypipe currently encodes events using Avro before pushing them into
 Kafka
 and is Avro schema repository aware. The project is young; and patches
 for
 improvements are appreciated (=
 
 On Mon, Mar 16, 2015 at 10:35 PM, Arya Ketan ketan.a...@gmail.com
 wrote:
 
 Great work.
 Sorry for kinda hijacking this thread, but I though that we had built
 some-thing on mysql bin log event propagator and wanted to share it .
 You guys can also look into Aesop ( https://github.com/Flipkart/aesop
 ).
 Its
 a change propagation frame-work. It has relays which listens to bin
 logs of
 Mysql, keeps track of SCNs  and has consumers which can then
 (transform/map
 or interpret as is) the bin log-event to a destination. Consumers also
 keep
 track of SCNs and a slow consumer can go back to a previous SCN if it
 wants
 to re-listen to events  ( similar to kafka's consumer view ).
 
 All the producers/consumers are extensible and you can write your own
 custom consumer and feed off the data to it.
 
 Common use-cases:
 a) Archive mysql based data into say hbase
 b) Move mysql based data to say a search store for serving reads.
 
 It has a decent ( not an awesome :) ) console too which gives a nice
 human
 readable view of where the producers and consumers are.
 
 Current supported producers are mysql bin logs, hbase wall-edits.
 
 
 Further insights/reviews/feature reqs/pull reqs/advices are all
 welcome.
 
 --
 Arya
 
 Arya
 
 On Tue, Mar 17, 2015 at 1:48 AM, Gwen Shapira gshap...@cloudera.com
 wrote:
 
 Really really nice!
 
 Thank you.
 
 On Mon, Mar 16, 2015 at 7:18 AM, Pierre-Yves Ritschard 
 p...@spootnik.org
 
 wrote:
 Hi kafka,
 
 I just wanted to mention I published a very simple project which can
 connect as MySQL replication client and stream replication events to
 kafka: https://github.com/pyr/sqlstream
 
 When you don't have control over an application, it can provide a
 simple
 way of consolidating SQL data in kafka.
 
 This is an early release and there are a few caveats (mentionned in
 the
 README), mostly the poor partitioning which I'm going to evolve
 quickly
 and the reconnection strategy which doesn't try to keep track of
 binlog
 position, other than that, it should work as advertised.
 
 Cheers,
 - pyr
 
 
 
 
 
 --
 Hisham Mardam-Bey
 http://hisham.cc/
 
 
 
 
 



Kafka Meetup @ LinkedIn 3/24

2015-03-23 Thread Clark Haskins
Hey Everyone –

Just a reminder about the Meetup tomorrow night @ LinkedIn.


There will be 3 talks:

Offset management - 6:35PM Joel Koshy(LinkedIn)

The Netflix Data Pipeline - 7:05PM - Allen Wang  Steven Wu(Netflix)

Best Practices - 7:50PM - Jay Kreps(Confluent)


If you are interested in attending please sign up at the link below:

http://www.meetup.com/http-kafka-apache-org/events/220355031/


We will also be recording  streaming the event live at:

http://www.ustream.tv/linkedin-events

-Clark


*Clark Elliott Haskins III*

LinkedIn DDS Site Reliability Engineering

Kafka, Zookeeper, Samza SRE Manager

https://www.linkedin.com/in/clarkhaskins

*There is no place like 127.0.0.1*


Re: kafka audit

2015-03-23 Thread Todd Palino
We've talked about it a little bit, but part of the problem is that it is
pretty well integrated into our infrastructure, and as such it's hard to
pull it out. I illustrated this a little differently than Jon did in my
latest blog post (http://engineering.linkedin.com/kafka/running-kafka-scale),
how the producer (and consumer) bits that handle audit are integrated in
our internal libraries that wrap the open source libraries. Between the
schema-registry, the publishing of the audit data back into Kafka, the
audit consumers, and the database that is needed for storing the audit
data, it gets woven in pretty tightly.

Confluent has made a start on this by releasing a stack with schemas
integrated in. This is probably a good place to start as far as building an
open source audit service.

-Todd


On Mon, Mar 23, 2015 at 12:47 AM, Navneet Gupta (Tech - BLR) 
navneet.gu...@flipkart.com wrote:

 Are there any plans to open source the same? What alternates do we have
 here?

 We are building an internal auditing framework for our entire big data
 pipeline. Kafka is one of the data sources we have (ingested data).

 On Mon, Mar 23, 2015 at 1:03 PM, tao xiao xiaotao...@gmail.com wrote:

  Linkedin has an excellent tool that monitors lag/data loss/data
 duplication
  and etc. Here is the reference
 
 
 
 http://www.slideshare.net/JonBringhurst/kafka-audit-kafka-meetup-january-27th-2015
 
  it is not open sourced though.
 
  On Mon, Mar 23, 2015 at 3:26 PM, sunil kalva kalva.ka...@gmail.com
  wrote:
 
   Hi
   What is best practice for adding audit feature in kafka, Is there any
   framework available for enabling audit feature at producer and consumer
   level and any UI frameworks for monitoring.
  
   tx
   SunilKalva
  
 
 
 
  --
  Regards,
  Tao
 



 --
 Thanks  Regards,
 Navneet Gupta



Re: Mirror maker fetcher thread unexpectedly stopped

2015-03-23 Thread tao xiao
I think I worked out the answer to question 1.
java.lang.IllegalMonitorStateException
was thrown due to no ownership of ReentrantLock when trying to call await()
on the lock condition.

Here is the code snippet from the AbstractFetcherThread.scala in trunk

partitionMapLock synchronized {
partitionsWithError ++= partitionMap.keys
// there is an error occurred while fetching partitions, sleep
a while
partitionMapCond.await(fetchBackOffMs, TimeUnit.MILLISECONDS)
}

as shown above partitionMapLock is not acquired before calling
partitionMapCond.await

we can fix this by explicitly calling partitionMapLock.lock(). below code
block should work

inLock(partitionMapLock) {
partitionsWithError ++= partitionMap.keys
// there is an error occurred while fetching partitions, sleep
a while
partitionMapCond.await(fetchBackOffMs, TimeUnit.MILLISECONDS)
}

On Mon, Mar 23, 2015 at 1:50 PM, tao xiao xiaotao...@gmail.com wrote:

 Hi,

 I was running a mirror maker and got
  java.lang.IllegalMonitorStateException that caused the underlying fetcher
 thread completely stopped. Here is the log from mirror maker.

 [2015-03-21 02:11:53,069] INFO Reconnect due to socket error:
 java.io.EOFException: Received -1 when reading from channel, socket has
 likely been closed. (kafka.consumer.SimpleConsumer)
 [2015-03-21 02:11:53,081] WARN
 [ConsumerFetcherThread-localhost-1426853968478-4ebaa354-0-6], Error in
 fetch Name: FetchRequest; Version: 0; CorrelationId: 2398588; ClientId:
 phx-slc-mm-user-3; ReplicaId: -1; MaxWait: 100 ms; MinBytes: 1 bytes;
 RequestInfo: [test.topic,0] - PartitionFetchInfo(3766065,1048576).
 Possible cause: java.nio.channels.ClosedChannelException
 (kafka.consumer.ConsumerFetcherThread)
 [2015-03-21 02:11:53,083] ERROR
 [ConsumerFetcherThread-localhost-1426853968478-4ebaa354-0-6], Error due to
  (kafka.consumer.ConsumerFetcherThread)
 java.lang.IllegalMonitorStateException
 at
 java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:155)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.fullyRelease(AbstractQueuedSynchronizer.java:1723)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2166)
 at
 kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:106)
 at
 kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:90)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
 [2015-03-21 02:11:53,083] INFO
 [ConsumerFetcherThread-localhost-1426853968478-4ebaa354-0-6], Stopped
  (kafka.consumer.ConsumerFetcherThread)

 I am still investigating what caused the connection error on server side
 but I have a couple of questions related to mirror maker itself

 1. What is root cause of java.lang.IllegalMonitorStateException? As shown
 in the AbstractFetcherThread source the fetcher thread should catch the
 java.io.EOFException thrown from underlying simplyConsumer and sleep a
 while before next run.
 2. Mirror maker is unaware of the termination of fetcher thread. That
 makes it unable to detect the failure and trigger rebalancing. I have 3
 mirror maker instances running in 3 different machines listening to the
 same topic. I would expect the mirror maker will release the partition
 ownership when underlying fetcher thread terminates so that rebalancing can
 be triggered.but in fact this is not the case. is this expected behavior or
 do I miss configure anything?

 I am running the trunk version as of commit
 82789e75199fdc1cae115c5c2eadfd0f1ece4d0d

 --
 Regards,
 Tao




-- 
Regards,
Tao


Check topic exists after deleting it.

2015-03-23 Thread anthony musyoki
On deleting a topic via TopicCommand.deleteTopic()

I get Topic test-delete is marked for deletion.

I follow up by checking if the topic exists by using
AdminUtils.topicExists()
which suprisingly returns true.

I expected AdminUtils.TopicExists() to check both BrokerTopicsPath
and DeleteTopicsPath before returning a verdict but it only checks
BrokerTopicsPath

Shouldn't a topic marked for deletion return false for topicExists() ?


Re: kafka audit

2015-03-23 Thread Navneet Gupta (Tech - BLR)
Are there any plans to open source the same? What alternates do we have
here?

We are building an internal auditing framework for our entire big data
pipeline. Kafka is one of the data sources we have (ingested data).

On Mon, Mar 23, 2015 at 1:03 PM, tao xiao xiaotao...@gmail.com wrote:

 Linkedin has an excellent tool that monitors lag/data loss/data duplication
 and etc. Here is the reference


 http://www.slideshare.net/JonBringhurst/kafka-audit-kafka-meetup-january-27th-2015

 it is not open sourced though.

 On Mon, Mar 23, 2015 at 3:26 PM, sunil kalva kalva.ka...@gmail.com
 wrote:

  Hi
  What is best practice for adding audit feature in kafka, Is there any
  framework available for enabling audit feature at producer and consumer
  level and any UI frameworks for monitoring.
 
  tx
  SunilKalva
 



 --
 Regards,
 Tao




-- 
Thanks  Regards,
Navneet Gupta


Re: kafka audit

2015-03-23 Thread tao xiao
Linkedin has an excellent tool that monitors lag/data loss/data duplication
and etc. Here is the reference

http://www.slideshare.net/JonBringhurst/kafka-audit-kafka-meetup-january-27th-2015

it is not open sourced though.

On Mon, Mar 23, 2015 at 3:26 PM, sunil kalva kalva.ka...@gmail.com wrote:

 Hi
 What is best practice for adding audit feature in kafka, Is there any
 framework available for enabling audit feature at producer and consumer
 level and any UI frameworks for monitoring.

 tx
 SunilKalva




-- 
Regards,
Tao


Re: Check topic exists after deleting it.

2015-03-23 Thread Harsha
Currently we have auto.create.topics.enable set to true by default. If this is 
set true any one who is making TopicMetadataRequest can create a topic . As 
both producers and consumers can send TopicMetadataRequest which will create a 
topic if the above config is true. So while doing deletion if there is  
producer or consumer running it can re-create a topic thats in deletion 
process. This issue going to be addressed in upcoming versions. Meanwhile if 
you are not creating topics via producer than turn this config off or stop 
producer and consumers while you are trying to delete a topic.
-- 
Harsha


On March 23, 2015 at 9:57:53 AM, Grant Henke (ghe...@cloudera.com) wrote:

What happens when producers or consumers are running while the topic  
deleting is going on?  

On Mon, Mar 23, 2015 at 10:02 AM, Harsha ka...@harsha.io wrote:  

 DeleteTopic makes a node in zookeeper to let controller know that there is  
 a topic up for deletion. This doesn’t immediately delete the topic it can  
 take time depending if all the partitions of that topic are online and  
 brokers are available as well. Once all the Log files deleted zookeeper  
 node gets deleted as well.  
 Also make sure you don’t have any producers or consumers are running while  
 the topic deleting is going on.  
  
 --  
 Harsha  
  
  
 On March 23, 2015 at 1:29:50 AM, anthony musyoki (  
 anthony.musy...@gmail.com) wrote:  
  
 On deleting a topic via TopicCommand.deleteTopic()  
  
 I get Topic test-delete is marked for deletion.  
  
 I follow up by checking if the topic exists by using  
 AdminUtils.topicExists()  
 which suprisingly returns true.  
  
 I expected AdminUtils.TopicExists() to check both BrokerTopicsPath  
 and DeleteTopicsPath before returning a verdict but it only checks  
 BrokerTopicsPath  
  
 Shouldn't a topic marked for deletion return false for topicExists() ?  
  



--  
Grant Henke  
Solutions Consultant | Cloudera  
ghe...@cloudera.com | 920-980-8979  
twitter.com/ghenke http://twitter.com/gchenke | linkedin.com/in/granthenke  


Re: Check topic exists after deleting it.

2015-03-23 Thread Grant Henke
What happens when producers or consumers are running while the topic
deleting is going on?

On Mon, Mar 23, 2015 at 10:02 AM, Harsha ka...@harsha.io wrote:

 DeleteTopic makes a node in zookeeper to let controller know that there is
 a topic up for deletion. This doesn’t immediately delete the topic it can
 take time depending if all the partitions of that topic are online and
 brokers are available as well.  Once all the Log files deleted zookeeper
 node gets deleted as well.
 Also make sure you don’t have any producers or consumers are running while
 the topic deleting is going on.

 --
 Harsha


 On March 23, 2015 at 1:29:50 AM, anthony musyoki (
 anthony.musy...@gmail.com) wrote:

 On deleting a topic via TopicCommand.deleteTopic()

 I get Topic test-delete is marked for deletion.

 I follow up by checking if the topic exists by using
 AdminUtils.topicExists()
 which suprisingly returns true.

 I expected AdminUtils.TopicExists() to check both BrokerTopicsPath
 and DeleteTopicsPath before returning a verdict but it only checks
 BrokerTopicsPath

 Shouldn't a topic marked for deletion return false for topicExists() ?




-- 
Grant Henke
Solutions Consultant | Cloudera
ghe...@cloudera.com | 920-980-8979
twitter.com/ghenke http://twitter.com/gchenke | linkedin.com/in/granthenke


Re: Check topic exists after deleting it.

2015-03-23 Thread Harsha
Just to be clear, one needs to stop producers and consumers that 
writing/reading from a topic “test” if they are trying to delete that specific 
topic “test”. Not all producers and clients.

-- 
Harsha
On March 23, 2015 at 10:13:47 AM, Harsha (harsh...@fastmail.fm) wrote:

Currently we have auto.create.topics.enable set to true by default. If this is 
set true any one who is making TopicMetadataRequest can create a topic . As 
both producers and consumers can send TopicMetadataRequest which will create a 
topic if the above config is true. So while doing deletion if there is  
producer or consumer running it can re-create a topic thats in deletion 
process. This issue going to be addressed in upcoming versions. Meanwhile if 
you are not creating topics via producer than turn this config off or stop 
producer and consumers while you are trying to delete a topic.
-- 
Harsha


On March 23, 2015 at 9:57:53 AM, Grant Henke (ghe...@cloudera.com) wrote:

What happens when producers or consumers are running while the topic
deleting is going on?

On Mon, Mar 23, 2015 at 10:02 AM, Harsha ka...@harsha.io wrote:

 DeleteTopic makes a node in zookeeper to let controller know that there is
 a topic up for deletion. This doesn’t immediately delete the topic it can
 take time depending if all the partitions of that topic are online and
 brokers are available as well. Once all the Log files deleted zookeeper
 node gets deleted as well.
 Also make sure you don’t have any producers or consumers are running while
 the topic deleting is going on.

 --
 Harsha


 On March 23, 2015 at 1:29:50 AM, anthony musyoki (
 anthony.musy...@gmail.com) wrote:

 On deleting a topic via TopicCommand.deleteTopic()

 I get Topic test-delete is marked for deletion.

 I follow up by checking if the topic exists by using
 AdminUtils.topicExists()
 which suprisingly returns true.

 I expected AdminUtils.TopicExists() to check both BrokerTopicsPath
 and DeleteTopicsPath before returning a verdict but it only checks
 BrokerTopicsPath

 Shouldn't a topic marked for deletion return false for topicExists() ?




--
Grant Henke
Solutions Consultant | Cloudera
ghe...@cloudera.com | 920-980-8979
twitter.com/ghenke http://twitter.com/gchenke | linkedin.com/in/granthenke


Re: KafkaSpout forceFromStart Issue

2015-03-23 Thread Harsha
Hi Francois,
           Looks like this belong storm mailing lists. Can you please send this 
question on storm mailing lists.

Thanks,
Harsha


On March 23, 2015 at 11:17:47 AM, François Méthot (fmetho...@gmail.com) wrote:

Hi,  

We have a storm topology that uses Kafka to read a topic with 6  
partitions. ( Kafka 0.8.2, Storm 0.9.3 )  

Recently, we had to set the KafkaSpout to read from the beginning, so we  
temporary configured our KafkaConfig this way:  

kafkaConfig.forceFromStart=true  
kafkaConfig.startOffsetTime = OffsetRequest.EarliestTime()  

It worked well, but afterward, setting those parameters back to false and  
to LatestTime respectively had no effect. In fact the topology won't read  
from our topic anymore.  

When the topology starts, The spout successully logs the offset and  
consumer group's cursor position for each partition in the worker log. But  
nothing is read.  

The only way we can read back from our Topic is to give our SpoutConfig a  
new Kafka ConsumerGroup Id.  

So it looks like, if we don't want to modify any KafkaSpout/Kafka code, the  
only way to read from the beginning would be to write the position we want  
to read from in Zookeeper for our Consumer Group where offset are stored  
and to restart our topology.  
Would anyone know if this is a bug in the KafkaSpout or an issue inherited  
from bug in Kafka?  
Thanks  
Francois  


Re: Updates To cwiki For Producer

2015-03-23 Thread Gwen Shapira
If you have feedback, don't hesitate to comment on the JIRA.

On Mon, Mar 23, 2015 at 4:19 PM, Pete Wright pwri...@rubiconproject.com
wrote:

 Hi Gwen - thanks for sending this along.  I'll patch my local checkout
 and take a look at this.

 Cheers,
 -pete

 On 03/20/15 21:16, Gwen Shapira wrote:
  We have a patch with examples:
  https://issues.apache.org/jira/browse/KAFKA-1982
 
  Unfortunately, its not committed yet.
 
  Gwen
 
  On Fri, Mar 20, 2015 at 11:24 AM, Pete Wright 
 pwri...@rubiconproject.com
  wrote:
 
  Thanks that's helpful.  I am working on an example producer using the
  new API, if I have any helpful notes or examples I'll share that.
 
  I was basically trying to be lazy and poach some example code as a
  starting point for our internal tests :)
 
  Cheers,
  -pete
 
  On 03/20/15 10:59, Guozhang Wang wrote:
  For the new java producer, its java doc can be found here:
 
 
 
 http://kafka.apache.org/083/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
 
  We can update the wiki if there are some examples that are still
 missing
  from this java doc.
 
  Guozhang
 
  On Thu, Mar 19, 2015 at 4:37 PM, Pete Wright 
 pwri...@rubiconproject.com
 
  wrote:
 
  Hi,
  Is there a plan to update the producer documentation on the wiki
 located
  here:
 
 
 
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
 
  This would be helpful for people working on implementing the new
  producer class deployed in 0.8.2.x.  If there are any patches
 available
  for this or other docs available for reference that would be helpful
 as
  well.
 
  Thanks!
  -pete
 
  --
  Pete Wright
  Lead Systems Architect
  Rubicon Project
  pwri...@rubiconproject.com
  310.309.9298
 
 
 
 
 
  --
  Pete Wright
  Lead Systems Architect
  Rubicon Project
  pwri...@rubiconproject.com
  310.309.9298
 
 

 --
 Pete Wright
 Lead Systems Architect
 Rubicon Project
 pwri...@rubiconproject.com
 310.309.9298



Re: Updates To cwiki For Producer

2015-03-23 Thread Pete Wright
Hi Gwen - thanks for sending this along.  I'll patch my local checkout
and take a look at this.

Cheers,
-pete

On 03/20/15 21:16, Gwen Shapira wrote:
 We have a patch with examples:
 https://issues.apache.org/jira/browse/KAFKA-1982
 
 Unfortunately, its not committed yet.
 
 Gwen
 
 On Fri, Mar 20, 2015 at 11:24 AM, Pete Wright pwri...@rubiconproject.com
 wrote:
 
 Thanks that's helpful.  I am working on an example producer using the
 new API, if I have any helpful notes or examples I'll share that.

 I was basically trying to be lazy and poach some example code as a
 starting point for our internal tests :)

 Cheers,
 -pete

 On 03/20/15 10:59, Guozhang Wang wrote:
 For the new java producer, its java doc can be found here:


 http://kafka.apache.org/083/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html

 We can update the wiki if there are some examples that are still missing
 from this java doc.

 Guozhang

 On Thu, Mar 19, 2015 at 4:37 PM, Pete Wright pwri...@rubiconproject.com

 wrote:

 Hi,
 Is there a plan to update the producer documentation on the wiki located
 here:


 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example

 This would be helpful for people working on implementing the new
 producer class deployed in 0.8.2.x.  If there are any patches available
 for this or other docs available for reference that would be helpful as
 well.

 Thanks!
 -pete

 --
 Pete Wright
 Lead Systems Architect
 Rubicon Project
 pwri...@rubiconproject.com
 310.309.9298





 --
 Pete Wright
 Lead Systems Architect
 Rubicon Project
 pwri...@rubiconproject.com
 310.309.9298

 

-- 
Pete Wright
Lead Systems Architect
Rubicon Project
pwri...@rubiconproject.com
310.309.9298


Re: Anyone interested in speaking at Bay Area Kafka meetup @ LinkedIn on March 24?

2015-03-23 Thread Jason Rosenberg
Hi Jon,

It the link for the 1/27 meetup you posted works for me, but I haven't
found how to find that same link on the meetup site (there are links that
point to the live stream, which of course is no longer happening!).

Thoughts?

Thanks,

Jason

On Mon, Mar 2, 2015 at 11:31 AM, Jon Bringhurst 
jbringhu...@linkedin.com.invalid wrote:

 The meetups are recorded. For example, here's a link to the January meetup:

 http://www.ustream.tv/recorded/58109076

 The links to the recordings are usually posted to the comments for each
 meetup on http://www.meetup.com/http-kafka-apache-org/

 -Jon

 On Feb 23, 2015, at 3:24 PM, Ruslan Khafizov ruslan.khafi...@gmail.com
 wrote:

 +1 For recording sessions.
 On 24 Feb 2015 07:22, Jiangjie Qin j...@linkedin.com.invalid wrote:

 +1, I¹m very interested.

 On 2/23/15, 3:05 PM, Jay Kreps jay.kr...@gmail.com wrote:

 +1

 I think something like Kafka on AWS at Netflix would be hugely
 interesting to a lot of people.

 -Jay

 On Mon, Feb 23, 2015 at 3:02 PM, Allen Wang aw...@netflix.com.invalid
 wrote:

 We (Steven Wu and Allen Wang) can talk about Kafka use cases and
 operations
 in Netflix. Specifically, we can talk about how we scale and operate
 Kafka
 clusters in AWS and how we migrate our data pipeline to Kafka.

 Thanks,
 Allen


 On Mon, Feb 23, 2015 at 12:15 PM, Ed Yakabosky 
 eyakabo...@linkedin.com.invalid wrote:

 Hi Kafka Open Source -

 LinkedIn will host another Bay Area Kafka meetup in Mountain View on

 March

 24.  We are planning to present on Offset Management but are looking

 for

 additional speakers.  If you¹re interested in presenting a use case,
 operational plan, or your experience with a particular feature (REST
 interface, WebConsole), please reply-all to let us know.

 [BCC: Open Source lists]

 Thanks,
 Ed