data archiving
Hi all, I'm looking for the way of archiving data. The data is hot for few days in our system. After that it can rarely be used. Speed is not so important for archive. Lets say we have kafka cluster and storage system. It would be great if kafka supported moving data to storage system instead of eviction and end user could specify what storage system is used(dynamo, s3, hadoop, etc...). Is it possible to implement? What other solutions you can advice? Regards, Anatoly
Re: data archiving
You should do this as a consumer (i.e. archiveDataConsumer) Take a look at the AWS section of the eco system https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem (e.g. https://github.com/pinterest/secor ). Also the tools is a good place to check out https://cwiki.apache.org/confluence/display/KAFKA/System+Tools (e.g. https://cwiki.apache.org/confluence/display/KAFKA/System+Tools#SystemTools-MirrorMaker ). If there isn't a consumer you need you could write one (most often what folks do) or google and maybe find it and let the community know. Thanks! /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Mon, Jun 16, 2014 at 6:02 AM, Anatoly Deyneka adeyn...@gmail.com wrote: Hi all, I'm looking for the way of archiving data. The data is hot for few days in our system. After that it can rarely be used. Speed is not so important for archive. Lets say we have kafka cluster and storage system. It would be great if kafka supported moving data to storage system instead of eviction and end user could specify what storage system is used(dynamo, s3, hadoop, etc...). Is it possible to implement? What other solutions you can advice? Regards, Anatoly
Re: kafka producer, one per web app?
Yes, the producer is thread safe, and sharing instances will be more efficient if you are producing in async mode. -Jay On Mon, Jun 16, 2014 at 9:12 AM, S Ahmed sahmed1...@gmail.com wrote: In my web application, I should be creating a single instance of a producer correct? So in scala I should be doing something like: object KafkaProducer { // props... val producer = new Producer[AnyRef, AnyRef](new ProducerConfig(props)) } And then say in my QueueService I would do: class QueueService { def send(topic: String, message: Array[Byte], partition: Array[Byte]): Unit = { try { KakfaProducer.producer.send(new KeyedMessage(topic,message, partition)) } catch { case e: Exception = e.printStackTrace System.exit(1) } } } Threading wise, is this correct?
linkedin and pageview producer + when kafka is down
I'd love to get some insights on how things work at linkedin in terms of your web servers and kafka producers. You guys probably connect to multiple kafka clusters, so let's assume you are only connecting to a single cluster. 1. do you use a single producer for all message types/topics? 2. For your pageview topic i.e. it is getting sent on a per page request (albiet it is batched): *What happens when your kafka cluster is down? Will your web application behave as normal or will it really slow things down? Locally on my laptop I shutdown my vagrant that is running kafka, and the page renders very slow when the producer is down. Or do you use some smart circuit breaker logic that will stop trying to send producer messages if kafka is down?
Re: Broken download link?
Ack! Thanks for pointing that out. Should be fixed now. -Jay On Mon, Jun 16, 2014 at 11:22 AM, William Borg Barthet w.borgbart...@onehippo.com wrote: Hi there, as I was excitedly reading through the introductory documentation, I came across a download link [1] in the quickstart section [2]. It seems like the file isn't there anymore. It didn't take me very long to find an alternate source for download [3], so that's all good. I apologise if this is something you guys are already aware of, but I thought I'd let you know anyway. Cheers, William [1] - https://www.apache.org/dyn/closer.cgi?path=/kafka/0.8.1/kafka_2.9.2-0.8.1.tgz [2] - http://kafka.apache.org/documentation.html#quickstart [3] - http://kafka.apache.org/downloads.html -- Amsterdam - Oosteinde 11, 1017 WT Amsterdam Boston - 101 Main Street, Cambridge, MA 02142 US +1 877 414 4776 (toll free) Europe +31(0)20 522 4466 www.onehippo.com
Newly added partitions in 0.8.1.1 dont see data immediately
Hi, I used the add partition functionality in create-topics to alter a previous topic and increase the partitions. I noticed that after the new partitions were added, they dont receive data immediately from the producer unless a new producer is started up or the old producer is restarted. Here is the sequence of steps: 1) start cluster with 2 brokers. 2) create topic with 2 partitions and replication factor 2. 3) start 4 consumers in a group. 2 consumers are redundant. 4) Start the performance producer to send messages. 5) Now alter the topic to have 4 partitions. 6) zookeeper immediately shows the new partitions 7) On watching the partitions through Kafka Offset Monitor, I dont see any messages going into the 2 new partitions. 8) However, if i start a new performance producer or restart the one in 4) I see that all the partitions are written to. Why does 7) happen ? Is this again related to the sticky partitioning behavior change in 0.8.11 ? My producer.properties has topic.metadata.refresh.interval.ms=1000 Thanks, Prakash
mirrormaker's configuration to minimize/prevent data loss
As I read, consumer and producer in mirrormaker are independent and use queue to communicate. Therefore consumers keep on consuming/commiting offsets to zk even if producer is failing. Is it still the way it works in 0.8.0, any plans to change? Is there any way to minimize data loss in this case? I am ok with not using async mode on producer, but will it help? Can I configure mirrormaker to exit immediately if producer fails? If this should be a responsibility of an external process, what should I monitor log for to kill the mirroring process in case of error? -- Andrey Yegorov
ISR not updating
Hi, team. Im using Kafka 0.8.1.1. I'm running 8 brokers on 4 machine. (2 brokers on 1 machine) and I have 3 topics each have 16 partitions and 3 replicas. kafka-topics describe is Topic:topicCDR PartitionCount:16 ReplicationFactor:3 Configs:retention.ms =360 Topic: topicCDR Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,2 Topic: topicCDR Partition: 1 Leader: 4 Replicas: 4,2,3 Isr: 3,4,2 Topic: topicCDR Partition: 2 Leader: 5 Replicas: 5,3,4 Isr: 3,4,5 Topic: topicCDR Partition: 3 Leader: 6 Replicas: 6,4,5 Isr: 4,5,6 Topic: topicCDR Partition: 4 Leader: 7 Replicas: 7,5,6 Isr: 5,6,7 Topic: topicCDR Partition: 5 Leader: 8 Replicas: 8,6,7 Isr: 6,7,8 Topic: topicCDR Partition: 6 Leader: 1 Replicas: 1,7,8 Isr: 1,7,8 Topic: topicCDR Partition: 7 Leader: 2 Replicas: 2,8,1 Isr: 8,2 Topic: topicCDR Partition: 8 Leader: 3 Replicas: 3,2,4 Isr: 3,4,2 Topic: topicCDR Partition: 9 Leader: 4 Replicas: 4,3,5 Isr: 3,4,5 Topic: topicCDR Partition: 10 Leader: 5 Replicas: 5,4,6 Isr: 4,5,6 Topic: topicCDR Partition: 11 Leader: 6 Replicas: 6,5,7 Isr: 5,6,7 Topic: topicCDR Partition: 12 Leader: 7 Replicas: 7,6,8 Isr: 6,7,8 Topic: topicCDR Partition: 13 Leader: 8 Replicas: 8,7,1 Isr: 7,8 Topic: topicCDR Partition: 14 Leader: 8 Replicas: 1,8,2 Isr: 8,2 Topic: topicCDR Partition: 15 Leader: 2 Replicas: 2,1,3 Isr: 3,2 Topic:topicDEBUG PartitionCount:16 ReplicationFactor:3 Configs:retention.ms =360 Topic: topicDEBUG Partition: 0 Leader: 4 Replicas: 4,3,5 Isr: 3,4,5 Topic: topicDEBUG Partition: 1 Leader: 5 Replicas: 5,4,6 Isr: 4,5,6 Topic: topicDEBUG Partition: 2 Leader: 6 Replicas: 6,5,7 Isr: 5,6,7 Topic: topicDEBUG Partition: 3 Leader: 7 Replicas: 7,6,8 Isr: 6,7,8 Topic: topicDEBUG Partition: 4 Leader: 8 Replicas: 8,7,1 Isr: 7,8 Topic: topicDEBUG Partition: 5 Leader: 8 Replicas: 1,8,2 Isr: 8,2 Topic: topicDEBUG Partition: 6 Leader: 2 Replicas: 2,1,3 Isr: 3,2 Topic: topicDEBUG Partition: 7 Leader: 3 Replicas: 3,2,4 Isr: 3,4,2 Topic: topicDEBUG Partition: 8 Leader: 4 Replicas: 4,5,6 Isr: 4,5,6 Topic: topicDEBUG Partition: 9 Leader: 5 Replicas: 5,6,7 Isr: 5,6,7 Topic: topicDEBUG Partition: 10 Leader: 6 Replicas: 6,7,8 Isr: 6,7,8 Topic: topicDEBUG Partition: 11 Leader: 7 Replicas: 7,8,1 Isr: 7,8,1 Topic: topicDEBUG Partition: 12 Leader: 8 Replicas: 8,1,2 Isr: 8,2 Topic: topicDEBUG Partition: 13 Leader: 3 Replicas: 1,2,3 Isr: 3,2 Topic: topicDEBUG Partition: 14 Leader: 2 Replicas: 2,3,4 Isr: 3,4,2 Topic: topicDEBUG Partition: 15 Leader: 3 Replicas: 3,4,5 Isr: 3,4,5 Topic:topicTRACE PartitionCount:16 ReplicationFactor:3 Configs:retention.ms =360 Topic: topicTRACE Partition: 0 Leader: 5 Replicas: 5,8,1 Isr: 5,8,1 Topic: topicTRACE Partition: 1 Leader: 6 Replicas: 6,1,2 Isr: 6,1,2 Topic: topicTRACE Partition: 2 Leader: 7 Replicas: 7,2,3 Isr: 3,7,2 Topic: topicTRACE Partition: 3 Leader: 8 Replicas: 8,3,4 Isr: 3,4,8 Topic: topicTRACE Partition: 4 Leader: 1 Replicas: 1,4,5 Isr: 1,5,4 Topic: topicTRACE Partition: 5 Leader: 2 Replicas: 2,5,6 Isr: 5,6,2 Topic: topicTRACE Partition: 6 Leader: 3 Replicas: 3,6,7 Isr: 3,6,7 Topic: topicTRACE Partition: 7 Leader: 4 Replicas: 4,7,8 Isr: 4,7,8 Topic: topicTRACE Partition: 8 Leader: 5 Replicas: 5,1,2 Isr: 5,1,2 Topic: topicTRACE Partition: 9 Leader: 6 Replicas: 6,2,3 Isr: 3,6,2 Topic: topicTRACE Partition: 10 Leader: 7 Replicas: 7,3,4 Isr: 3,4,7 Topic: topicTRACE Partition: 11 Leader: 8 Replicas: 8,4,5 Isr: 4,5,8 Problem is one of my topic's ISR is not updating and keep failing to be preferred replica. more detail, broker 1 for topicDEBUG's ISR is not updating. And log of broker 1 is absolutely normal and has no error. This is expected situation? what I have to updating this? Thanks in advance. -- *Sincerely* *,**Bongyeon Kim* Java Developer Engineer Seoul, Korea Mobile: +82-10-9369-1314 Email: bongyeon...@gmail.com Twitter: http://twitter.com/tigerby Facebook: http://facebook.com/tigerby Wiki: http://tigerby.com
Unit testing with Kafka
Are there unit testing libs in kafka which we can include to test our producers/consumers?? I found the following but the maven libs mentioned there seem to be missing. http://grokbase.com/t/kafka/users/13ck94p302/writing-unit-tests-for-kafka-code Any one else tackled this issue? Thanks, -Vinay
Building Kafka on Mac OS X
Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to get it to run on OS X. Basically building Kafka on OS X with 'gradlew jar’ gets stuck forever without any progress (Indeed I tried to leave it building all night with no avail). Any advices will be greatly appreciated. Thanks in advance.
Re: data archiving
Have you looked at Pinterest Secor? ( http://engineering.pinterest.com/post/84276775924/introducing-pinterest-secor ) Cheers, Robert On Mon, Jun 16, 2014 at 5:17 AM, Mark Godfrey msg...@gmail.com wrote: There is Bifrost, which archives Kafka data to S3: https://github.com/uswitch/bifrost Obviously that's a fairly specific archive solution, but it might work for you. Mark. On Mon, Jun 16, 2014 at 11:02 AM, Anatoly Deyneka adeyn...@gmail.com wrote: Hi all, I'm looking for the way of archiving data. The data is hot for few days in our system. After that it can rarely be used. Speed is not so important for archive. Lets say we have kafka cluster and storage system. It would be great if kafka supported moving data to storage system instead of eviction and end user could specify what storage system is used(dynamo, s3, hadoop, etc...). Is it possible to implement? What other solutions you can advice? Regards, Anatoly
Re: Building Kafka on Mac OS X
What output was it stuck on? Tim On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com wrote: Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to get it to run on OS X. Basically building Kafka on OS X with 'gradlew jar’ gets stuck forever without any progress (Indeed I tried to leave it building all night with no avail). Any advices will be greatly appreciated. Thanks in advance.
Re: Newly added partitions in 0.8.1.1 dont see data immediately
With topic.metadata.refresh.interval.ms=1000, the producer should refresh metadata and pick up the new partitions after 1 sec. Do you see metadata being refreshed? You may have to turn on the debug level logging. Thanks, Jun On Mon, Jun 16, 2014 at 3:18 PM, Prakash Gowri Shankor prakash.shan...@gmail.com wrote: Hi, I used the add partition functionality in create-topics to alter a previous topic and increase the partitions. I noticed that after the new partitions were added, they dont receive data immediately from the producer unless a new producer is started up or the old producer is restarted. Here is the sequence of steps: 1) start cluster with 2 brokers. 2) create topic with 2 partitions and replication factor 2. 3) start 4 consumers in a group. 2 consumers are redundant. 4) Start the performance producer to send messages. 5) Now alter the topic to have 4 partitions. 6) zookeeper immediately shows the new partitions 7) On watching the partitions through Kafka Offset Monitor, I dont see any messages going into the 2 new partitions. 8) However, if i start a new performance producer or restart the one in 4) I see that all the partitions are written to. Why does 7) happen ? Is this again related to the sticky partitioning behavior change in 0.8.11 ? My producer.properties has topic.metadata.refresh.interval.ms=1000 Thanks, Prakash
Re: mirrormaker's configuration to minimize/prevent data loss
Currently, mirrormaker only logs the error if the producer fails. You can potentially increase # retries to deal with producer failures. Thanks, Jun On Mon, Jun 16, 2014 at 3:53 PM, Andrey Yegorov andrey.yego...@gmail.com wrote: As I read, consumer and producer in mirrormaker are independent and use queue to communicate. Therefore consumers keep on consuming/commiting offsets to zk even if producer is failing. Is it still the way it works in 0.8.0, any plans to change? Is there any way to minimize data loss in this case? I am ok with not using async mode on producer, but will it help? Can I configure mirrormaker to exit immediately if producer fails? If this should be a responsibility of an external process, what should I monitor log for to kill the mirroring process in case of error? -- Andrey Yegorov
Re: Building Kafka on Mac OS X
It just hangs there without any output at all. Jorge. On Jun 16, 2014, at 11:27 PM, Timothy Chen tnac...@gmail.com wrote: What output was it stuck on? Tim On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com wrote: Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to get it to run on OS X. Basically building Kafka on OS X with 'gradlew jar’ gets stuck forever without any progress (Indeed I tried to leave it building all night with no avail). Any advices will be greatly appreciated. Thanks in advance.
Re: Building Kafka on Mac OS X
Can you try running it in debug mode? (./gradlew jar -d) Tim On Mon, Jun 16, 2014 at 8:44 PM, Jorge Marizan jorge.mari...@gmail.com wrote: It just hangs there without any output at all. Jorge. On Jun 16, 2014, at 11:27 PM, Timothy Chen tnac...@gmail.com wrote: What output was it stuck on? Tim On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com wrote: Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to get it to run on OS X. Basically building Kafka on OS X with 'gradlew jar’ gets stuck forever without any progress (Indeed I tried to leave it building all night with no avail). Any advices will be greatly appreciated. Thanks in advance.
Re: Building Kafka on Mac OS X
I will try it and let you know. Jorge. On Jun 17, 2014, at 12:02 AM, Timothy Chen tnac...@gmail.com wrote: Can you try running it in debug mode? (./gradlew jar -d) Tim On Mon, Jun 16, 2014 at 8:44 PM, Jorge Marizan jorge.mari...@gmail.com wrote: It just hangs there without any output at all. Jorge. On Jun 16, 2014, at 11:27 PM, Timothy Chen tnac...@gmail.com wrote: What output was it stuck on? Tim On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com wrote: Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to get it to run on OS X. Basically building Kafka on OS X with 'gradlew jar’ gets stuck forever without any progress (Indeed I tried to leave it building all night with no avail). Any advices will be greatly appreciated. Thanks in advance.
Re: Building Kafka on Mac OS X
For a weird reason, it is compiling correctly now. Time to play with Kafka. Thanks :) On Jun 17, 2014, at 12:12 AM, Jorge Marizan jorge.mari...@gmail.com wrote: I will try it and let you know. Jorge. On Jun 17, 2014, at 12:02 AM, Timothy Chen tnac...@gmail.com wrote: Can you try running it in debug mode? (./gradlew jar -d) Tim On Mon, Jun 16, 2014 at 8:44 PM, Jorge Marizan jorge.mari...@gmail.com wrote: It just hangs there without any output at all. Jorge. On Jun 16, 2014, at 11:27 PM, Timothy Chen tnac...@gmail.com wrote: What output was it stuck on? Tim On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com wrote: Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to get it to run on OS X. Basically building Kafka on OS X with 'gradlew jar’ gets stuck forever without any progress (Indeed I tried to leave it building all night with no avail). Any advices will be greatly appreciated. Thanks in advance.
Re: Building Kafka on Mac OS X
Have seen if you have a write with zero data it will hang On Jun 16, 2014, at 21:02, Timothy Chen tnac...@gmail.com wrote: Can you try running it in debug mode? (./gradlew jar -d) Tim On Mon, Jun 16, 2014 at 8:44 PM, Jorge Marizan jorge.mari...@gmail.com wrote: It just hangs there without any output at all. Jorge. On Jun 16, 2014, at 11:27 PM, Timothy Chen tnac...@gmail.com wrote: What output was it stuck on? Tim On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com wrote: Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to get it to run on OS X. Basically building Kafka on OS X with 'gradlew jar’ gets stuck forever without any progress (Indeed I tried to leave it building all night with no avail). Any advices will be greatly appreciated. Thanks in advance.