data archiving

2014-06-16 Thread Anatoly Deyneka
Hi all,

I'm looking for the way of archiving data.
The data is hot for few days in our system.
After that it can rarely be used. Speed is not so important for archive.

Lets say we have kafka cluster and storage system.
It would be great if kafka supported moving data to storage system instead
of eviction and end user could specify what storage system is used(dynamo,
s3, hadoop, etc...).
Is it possible to implement?

What other solutions you can advice?

Regards,
Anatoly


Re: data archiving

2014-06-16 Thread Joe Stein
You should do this as a consumer (i.e. archiveDataConsumer)

Take a look at the AWS section of the eco system
https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem (e.g.
https://github.com/pinterest/secor ).

Also the tools is a good place to check out
https://cwiki.apache.org/confluence/display/KAFKA/System+Tools (e.g.
https://cwiki.apache.org/confluence/display/KAFKA/System+Tools#SystemTools-MirrorMaker
).

If there isn't a consumer you need you could write one (most often what
folks do) or google and maybe find it and let the community know.

Thanks!

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/


On Mon, Jun 16, 2014 at 6:02 AM, Anatoly Deyneka adeyn...@gmail.com wrote:

 Hi all,

 I'm looking for the way of archiving data.
 The data is hot for few days in our system.
 After that it can rarely be used. Speed is not so important for archive.

 Lets say we have kafka cluster and storage system.
 It would be great if kafka supported moving data to storage system instead
 of eviction and end user could specify what storage system is used(dynamo,
 s3, hadoop, etc...).
 Is it possible to implement?

 What other solutions you can advice?

 Regards,
 Anatoly



Re: kafka producer, one per web app?

2014-06-16 Thread Jay Kreps
Yes, the producer is thread safe, and sharing instances will be more
efficient if you are producing in async mode.

-Jay


On Mon, Jun 16, 2014 at 9:12 AM, S Ahmed sahmed1...@gmail.com wrote:

 In my web application, I should be creating a single instance of a producer
 correct?

 So in scala I should be doing something like:

 object KafkaProducer {
   // props...
val producer = new Producer[AnyRef, AnyRef](new ProducerConfig(props))
 }

 And then say in my QueueService I would do:

 class QueueService {

 def send(topic: String, message: Array[Byte], partition: Array[Byte]): Unit
 = {
 try {
   KakfaProducer.producer.send(new KeyedMessage(topic,message,
 partition))
 } catch {
   case e: Exception =
 e.printStackTrace
 System.exit(1)
 }
   }

 }

 Threading wise, is this correct?



linkedin and pageview producer + when kafka is down

2014-06-16 Thread S Ahmed
I'd love to get some insights on how things work at linkedin in terms of
your web servers and kafka producers.

You guys probably connect to multiple kafka clusters, so let's assume you
are only connecting to a single cluster.

1. do you use a single producer for all message types/topics?

2. For your pageview topic i.e. it is getting sent on a per page request
(albiet it is batched):

*What happens when your kafka cluster is down?  Will your web application
behave as normal or will it really slow things down?

Locally on my laptop I shutdown my vagrant that is running kafka, and the
page renders very slow when the producer is down.

Or do you use some smart circuit breaker logic that will stop trying to
send producer messages if kafka is down?


Re: Broken download link?

2014-06-16 Thread Jay Kreps
Ack! Thanks for pointing that out. Should be fixed now.
-Jay


On Mon, Jun 16, 2014 at 11:22 AM, William Borg Barthet 
w.borgbart...@onehippo.com wrote:

 Hi there,

 as I was excitedly reading through the introductory documentation, I came
 across a download link [1] in the quickstart section [2]. It seems like the
 file isn't there anymore.

 It didn't take me very long to find an alternate source for download [3],
 so that's all good. I apologise if this is something you guys are already
 aware of, but I thought I'd let you know anyway.

 Cheers,

 William


 [1] -

 https://www.apache.org/dyn/closer.cgi?path=/kafka/0.8.1/kafka_2.9.2-0.8.1.tgz
 [2] - http://kafka.apache.org/documentation.html#quickstart
 [3] - http://kafka.apache.org/downloads.html
 --
 Amsterdam - Oosteinde 11, 1017 WT Amsterdam
 Boston - 101 Main Street, Cambridge, MA 02142

 US +1 877 414 4776 (toll free)
 Europe +31(0)20 522 4466
 www.onehippo.com



Newly added partitions in 0.8.1.1 dont see data immediately

2014-06-16 Thread Prakash Gowri Shankor
Hi,

I used the add partition functionality in create-topics to alter a previous
topic and increase the partitions. I noticed that after the new partitions
were added, they dont receive data immediately from the producer unless a
new producer is started up or the old producer is restarted.
Here is the sequence of steps:
1) start cluster with 2 brokers.
2) create topic with 2 partitions and replication factor 2.
3) start 4 consumers in a group. 2 consumers are redundant.
4) Start the performance producer to send messages.
5) Now alter the topic to have 4 partitions.
6) zookeeper immediately shows the new partitions
7) On watching the partitions through Kafka Offset Monitor, I dont see any
messages going into the 2 new partitions.
8) However, if i start a new performance producer or restart the one in 4)
I see that all the partitions are written to.

Why does 7) happen ? Is this again related to the sticky partitioning
behavior change in 0.8.11 ?
My producer.properties has topic.metadata.refresh.interval.ms=1000

Thanks,
Prakash


mirrormaker's configuration to minimize/prevent data loss

2014-06-16 Thread Andrey Yegorov
As I read, consumer and producer in mirrormaker are independent and use
queue to communicate. Therefore consumers keep on consuming/commiting
offsets to zk even if producer is failing. Is it still the way it works in
0.8.0, any plans to change?

Is there any way to minimize data loss in this case? I am ok with not using
async mode on producer, but will it help? Can I configure mirrormaker to
exit immediately if producer fails? If this should be a responsibility of
an external process, what should I monitor log for to kill the mirroring
process in case of error?

--
Andrey Yegorov


ISR not updating

2014-06-16 Thread Bongyeon Kim
Hi, team.

Im using Kafka 0.8.1.1.
I'm running 8 brokers on 4 machine. (2 brokers on 1 machine) and I have 3
topics each have 16 partitions and 3 replicas.

kafka-topics describe is

Topic:topicCDR PartitionCount:16 ReplicationFactor:3 Configs:retention.ms
=360
Topic: topicCDR Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,2
Topic: topicCDR Partition: 1 Leader: 4 Replicas: 4,2,3 Isr: 3,4,2
Topic: topicCDR Partition: 2 Leader: 5 Replicas: 5,3,4 Isr: 3,4,5
Topic: topicCDR Partition: 3 Leader: 6 Replicas: 6,4,5 Isr: 4,5,6
Topic: topicCDR Partition: 4 Leader: 7 Replicas: 7,5,6 Isr: 5,6,7
Topic: topicCDR Partition: 5 Leader: 8 Replicas: 8,6,7 Isr: 6,7,8
Topic: topicCDR Partition: 6 Leader: 1 Replicas: 1,7,8 Isr: 1,7,8
Topic: topicCDR Partition: 7 Leader: 2 Replicas: 2,8,1 Isr: 8,2
Topic: topicCDR Partition: 8 Leader: 3 Replicas: 3,2,4 Isr: 3,4,2
Topic: topicCDR Partition: 9 Leader: 4 Replicas: 4,3,5 Isr: 3,4,5
Topic: topicCDR Partition: 10 Leader: 5 Replicas: 5,4,6 Isr: 4,5,6
Topic: topicCDR Partition: 11 Leader: 6 Replicas: 6,5,7 Isr: 5,6,7
Topic: topicCDR Partition: 12 Leader: 7 Replicas: 7,6,8 Isr: 6,7,8
Topic: topicCDR Partition: 13 Leader: 8 Replicas: 8,7,1 Isr: 7,8
Topic: topicCDR Partition: 14 Leader: 8 Replicas: 1,8,2 Isr: 8,2
Topic: topicCDR Partition: 15 Leader: 2 Replicas: 2,1,3 Isr: 3,2
Topic:topicDEBUG PartitionCount:16 ReplicationFactor:3 Configs:retention.ms
=360
Topic: topicDEBUG Partition: 0 Leader: 4 Replicas: 4,3,5 Isr: 3,4,5
Topic: topicDEBUG Partition: 1 Leader: 5 Replicas: 5,4,6 Isr: 4,5,6
Topic: topicDEBUG Partition: 2 Leader: 6 Replicas: 6,5,7 Isr: 5,6,7
Topic: topicDEBUG Partition: 3 Leader: 7 Replicas: 7,6,8 Isr: 6,7,8
Topic: topicDEBUG Partition: 4 Leader: 8 Replicas: 8,7,1 Isr: 7,8
Topic: topicDEBUG Partition: 5 Leader: 8 Replicas: 1,8,2 Isr: 8,2
Topic: topicDEBUG Partition: 6 Leader: 2 Replicas: 2,1,3 Isr: 3,2
Topic: topicDEBUG Partition: 7 Leader: 3 Replicas: 3,2,4 Isr: 3,4,2
Topic: topicDEBUG Partition: 8 Leader: 4 Replicas: 4,5,6 Isr: 4,5,6
Topic: topicDEBUG Partition: 9 Leader: 5 Replicas: 5,6,7 Isr: 5,6,7
Topic: topicDEBUG Partition: 10 Leader: 6 Replicas: 6,7,8 Isr: 6,7,8
Topic: topicDEBUG Partition: 11 Leader: 7 Replicas: 7,8,1 Isr: 7,8,1
Topic: topicDEBUG Partition: 12 Leader: 8 Replicas: 8,1,2 Isr: 8,2
Topic: topicDEBUG Partition: 13 Leader: 3 Replicas: 1,2,3 Isr: 3,2
Topic: topicDEBUG Partition: 14 Leader: 2 Replicas: 2,3,4 Isr: 3,4,2
Topic: topicDEBUG Partition: 15 Leader: 3 Replicas: 3,4,5 Isr: 3,4,5
Topic:topicTRACE PartitionCount:16 ReplicationFactor:3 Configs:retention.ms
=360
Topic: topicTRACE Partition: 0 Leader: 5 Replicas: 5,8,1 Isr: 5,8,1
Topic: topicTRACE Partition: 1 Leader: 6 Replicas: 6,1,2 Isr: 6,1,2
Topic: topicTRACE Partition: 2 Leader: 7 Replicas: 7,2,3 Isr: 3,7,2
Topic: topicTRACE Partition: 3 Leader: 8 Replicas: 8,3,4 Isr: 3,4,8
Topic: topicTRACE Partition: 4 Leader: 1 Replicas: 1,4,5 Isr: 1,5,4
Topic: topicTRACE Partition: 5 Leader: 2 Replicas: 2,5,6 Isr: 5,6,2
Topic: topicTRACE Partition: 6 Leader: 3 Replicas: 3,6,7 Isr: 3,6,7
Topic: topicTRACE Partition: 7 Leader: 4 Replicas: 4,7,8 Isr: 4,7,8
Topic: topicTRACE Partition: 8 Leader: 5 Replicas: 5,1,2 Isr: 5,1,2
Topic: topicTRACE Partition: 9 Leader: 6 Replicas: 6,2,3 Isr: 3,6,2
Topic: topicTRACE Partition: 10 Leader: 7 Replicas: 7,3,4 Isr: 3,4,7
Topic: topicTRACE Partition: 11 Leader: 8 Replicas: 8,4,5 Isr: 4,5,8


Problem is one of my topic's ISR is not updating and keep failing to be
preferred replica. more detail, broker 1 for  topicDEBUG's ISR is not
updating.
And log of broker 1 is absolutely normal and has no error.

This is expected situation? what I have to updating this?


Thanks in advance.





-- 
*Sincerely*
*,**Bongyeon Kim*

Java Developer  Engineer
Seoul, Korea
Mobile:  +82-10-9369-1314
Email:  bongyeon...@gmail.com
Twitter:  http://twitter.com/tigerby
Facebook:  http://facebook.com/tigerby
Wiki: http://tigerby.com


Unit testing with Kafka

2014-06-16 Thread vinay gupta
Are there unit testing libs in kafka which we can include to test our 
producers/consumers??

I found the following but the maven libs mentioned there seem to be missing.

http://grokbase.com/t/kafka/users/13ck94p302/writing-unit-tests-for-kafka-code


Any one else tackled this issue?

Thanks,
-Vinay

Building Kafka on Mac OS X

2014-06-16 Thread Jorge Marizan
Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to get it 
to run on OS X.
Basically building Kafka on OS X with 'gradlew jar’  gets stuck forever without 
any progress (Indeed I tried to leave it building all night with no avail). 

Any advices will be greatly appreciated. Thanks in advance.

Re: data archiving

2014-06-16 Thread Robert Hodges
Have you looked at Pinterest Secor?  (
http://engineering.pinterest.com/post/84276775924/introducing-pinterest-secor
)

Cheers, Robert


On Mon, Jun 16, 2014 at 5:17 AM, Mark Godfrey msg...@gmail.com wrote:

 There is Bifrost, which archives Kafka data to S3:
 https://github.com/uswitch/bifrost

 Obviously that's a fairly specific archive solution, but it might work for
 you.


 Mark.

 On Mon, Jun 16, 2014 at 11:02 AM, Anatoly Deyneka adeyn...@gmail.com
 wrote:

  Hi all,
 
  I'm looking for the way of archiving data.
  The data is hot for few days in our system.
  After that it can rarely be used. Speed is not so important for archive.
 
  Lets say we have kafka cluster and storage system.
  It would be great if kafka supported moving data to storage system
 instead
  of eviction and end user could specify what storage system is
 used(dynamo,
  s3, hadoop, etc...).
  Is it possible to implement?
 
  What other solutions you can advice?
 
  Regards,
  Anatoly
 



Re: Building Kafka on Mac OS X

2014-06-16 Thread Timothy Chen
What output was it stuck on?

Tim

On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com wrote:
 Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to get 
 it to run on OS X.
 Basically building Kafka on OS X with 'gradlew jar’  gets stuck forever 
 without any progress (Indeed I tried to leave it building all night with no 
 avail).

 Any advices will be greatly appreciated. Thanks in advance.


Re: Newly added partitions in 0.8.1.1 dont see data immediately

2014-06-16 Thread Jun Rao
With  topic.metadata.refresh.interval.ms=1000, the producer should refresh
metadata and pick up the new partitions after 1 sec. Do you see metadata
being refreshed? You may have to turn on the debug level logging.

Thanks,

Jun


On Mon, Jun 16, 2014 at 3:18 PM, Prakash Gowri Shankor 
prakash.shan...@gmail.com wrote:

 Hi,

 I used the add partition functionality in create-topics to alter a previous
 topic and increase the partitions. I noticed that after the new partitions
 were added, they dont receive data immediately from the producer unless a
 new producer is started up or the old producer is restarted.
 Here is the sequence of steps:
 1) start cluster with 2 brokers.
 2) create topic with 2 partitions and replication factor 2.
 3) start 4 consumers in a group. 2 consumers are redundant.
 4) Start the performance producer to send messages.
 5) Now alter the topic to have 4 partitions.
 6) zookeeper immediately shows the new partitions
 7) On watching the partitions through Kafka Offset Monitor, I dont see any
 messages going into the 2 new partitions.
 8) However, if i start a new performance producer or restart the one in 4)
 I see that all the partitions are written to.

 Why does 7) happen ? Is this again related to the sticky partitioning
 behavior change in 0.8.11 ?
 My producer.properties has topic.metadata.refresh.interval.ms=1000

 Thanks,
 Prakash



Re: mirrormaker's configuration to minimize/prevent data loss

2014-06-16 Thread Jun Rao
Currently, mirrormaker only logs the error if the producer fails. You can
potentially increase # retries to deal with producer failures.

Thanks,

Jun


On Mon, Jun 16, 2014 at 3:53 PM, Andrey Yegorov andrey.yego...@gmail.com
wrote:

 As I read, consumer and producer in mirrormaker are independent and use
 queue to communicate. Therefore consumers keep on consuming/commiting
 offsets to zk even if producer is failing. Is it still the way it works in
 0.8.0, any plans to change?

 Is there any way to minimize data loss in this case? I am ok with not using
 async mode on producer, but will it help? Can I configure mirrormaker to
 exit immediately if producer fails? If this should be a responsibility of
 an external process, what should I monitor log for to kill the mirroring
 process in case of error?

 --
 Andrey Yegorov



Re: Building Kafka on Mac OS X

2014-06-16 Thread Jorge Marizan
It just hangs there without any output at all.

Jorge.

On Jun 16, 2014, at 11:27 PM, Timothy Chen tnac...@gmail.com wrote:

 What output was it stuck on?
 
 Tim
 
 On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com 
 wrote:
 Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to get 
 it to run on OS X.
 Basically building Kafka on OS X with 'gradlew jar’  gets stuck forever 
 without any progress (Indeed I tried to leave it building all night with no 
 avail).
 
 Any advices will be greatly appreciated. Thanks in advance.



Re: Building Kafka on Mac OS X

2014-06-16 Thread Timothy Chen
Can you try running it in debug mode? (./gradlew jar -d)

Tim

On Mon, Jun 16, 2014 at 8:44 PM, Jorge Marizan jorge.mari...@gmail.com wrote:
 It just hangs there without any output at all.

 Jorge.

 On Jun 16, 2014, at 11:27 PM, Timothy Chen tnac...@gmail.com wrote:

 What output was it stuck on?

 Tim

 On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com 
 wrote:
 Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to 
 get it to run on OS X.
 Basically building Kafka on OS X with 'gradlew jar’  gets stuck forever 
 without any progress (Indeed I tried to leave it building all night with no 
 avail).

 Any advices will be greatly appreciated. Thanks in advance.



Re: Building Kafka on Mac OS X

2014-06-16 Thread Jorge Marizan
I will try it and let you know.

Jorge. 

On Jun 17, 2014, at 12:02 AM, Timothy Chen tnac...@gmail.com wrote:

 Can you try running it in debug mode? (./gradlew jar -d)
 
 Tim
 
 On Mon, Jun 16, 2014 at 8:44 PM, Jorge Marizan jorge.mari...@gmail.com 
 wrote:
 It just hangs there without any output at all.
 
 Jorge.
 
 On Jun 16, 2014, at 11:27 PM, Timothy Chen tnac...@gmail.com wrote:
 
 What output was it stuck on?
 
 Tim
 
 On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com 
 wrote:
 Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to 
 get it to run on OS X.
 Basically building Kafka on OS X with 'gradlew jar’  gets stuck forever 
 without any progress (Indeed I tried to leave it building all night with 
 no avail).
 
 Any advices will be greatly appreciated. Thanks in advance.
 



Re: Building Kafka on Mac OS X

2014-06-16 Thread Jorge Marizan
For a weird reason, it is compiling correctly now.

Time to play with Kafka. Thanks :)

On Jun 17, 2014, at 12:12 AM, Jorge Marizan jorge.mari...@gmail.com wrote:

 I will try it and let you know.
 
 Jorge. 
 
 On Jun 17, 2014, at 12:02 AM, Timothy Chen tnac...@gmail.com wrote:
 
 Can you try running it in debug mode? (./gradlew jar -d)
 
 Tim
 
 On Mon, Jun 16, 2014 at 8:44 PM, Jorge Marizan jorge.mari...@gmail.com 
 wrote:
 It just hangs there without any output at all.
 
 Jorge.
 
 On Jun 16, 2014, at 11:27 PM, Timothy Chen tnac...@gmail.com wrote:
 
 What output was it stuck on?
 
 Tim
 
 On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com 
 wrote:
 Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to 
 get it to run on OS X.
 Basically building Kafka on OS X with 'gradlew jar’  gets stuck forever 
 without any progress (Indeed I tried to leave it building all night with 
 no avail).
 
 Any advices will be greatly appreciated. Thanks in advance.
 
 



Re: Building Kafka on Mac OS X

2014-06-16 Thread Steve Morin
Have seen if you have a write with zero data it will hang

 On Jun 16, 2014, at 21:02, Timothy Chen tnac...@gmail.com wrote:
 
 Can you try running it in debug mode? (./gradlew jar -d)
 
 Tim
 
 On Mon, Jun 16, 2014 at 8:44 PM, Jorge Marizan jorge.mari...@gmail.com 
 wrote:
 It just hangs there without any output at all.
 
 Jorge.
 
 On Jun 16, 2014, at 11:27 PM, Timothy Chen tnac...@gmail.com wrote:
 
 What output was it stuck on?
 
 Tim
 
 On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com 
 wrote:
 Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to 
 get it to run on OS X.
 Basically building Kafka on OS X with 'gradlew jar’  gets stuck forever 
 without any progress (Indeed I tried to leave it building all night with 
 no avail).
 
 Any advices will be greatly appreciated. Thanks in advance.