Re: What is the best way to write Kafka data into HDFS?

2016-02-10 Thread Steve Morin
R P, happy to walk you through https://github.com/DemandCube/Scribengin if
your interested

On Wed, Feb 10, 2016 at 5:09 PM, R P <hadoo...@outlook.com> wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


-- 
*Steve Morin | Managing Partner - CTO*

*Nvent*

O 800-407-1156 ext 803 <800-407-1156;803> | M 347-453-5579

smo...@nventdata.com  <smo...@nventdata.com>

*Enabling the Data Driven Enterprise*
*(Ask us how we can setup scalable open source realtime billion+ event/data
collection/analytics infrastructure in weeks)*

Service Areas: Management & Strategy Consulting | Data Engineering | Data
Science & Visualization

BigData Technologies: Hadoop & Ecosystem | NoSql| Hbase | Cassandra | Storm
| Spark | Kafka | Mesos | Docker | & More

Industries: IoT | Advertising | Retail | Manufacturing | TV & Cable |
Energy | Oil & Gas | Insurance | Finance | Telecom


Re: Kakfa question about starting kafka with same broker id

2015-02-18 Thread Steve Morin
Why would you want to ever do that?

 On Feb 18, 2015, at 15:16, Deepak Dhakal ddha...@salesforce.com wrote:
 
 Hi,
 
 My name is Deepak and I work for salesforce. We are using kafka 8.11  and
 have a question about starting kafka with same broker id.
 
 Steps:
 
 Start a kakfa broker with broker id =1 - it starts fine with external ZK
 Start another kafka with same broker id =1 .. it does not start the kafka
 which is expected but I am seeing the following log and it keeps retrying
 forever.
 
 Is there way to control how many time a broker tries to starts itself with
 the same broker id ?
 
 
 Thanks
 Deepak
 
 [2015-02-18 14:47:20,713] INFO conflict in /controller data:
 {version:1,brokerid:19471,timestamp:1424299100135} stored data:
 {version:1,brokerid:19471,timestamp:1424288444314}
 (kafka.utils.ZkUtils$)
 
 [2015-02-18 14:47:20,716] INFO I wrote this conflicted ephemeral node
 [{version:1,brokerid:19471,timestamp:1424299100135}] at /controller
 a while back in a different session, hence I will backoff for this node to
 be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 
 [2015-02-18 14:47:30,719] INFO conflict in /controller data:
 {version:1,brokerid:19471,timestamp:1424299100135} stored data:
 {version:1,brokerid:19471,timestamp:1424288444314}
 (kafka.utils.ZkUtils$)
 
 [2015-02-18 14:47:30,722] INFO I wrote this conflicted ephemeral node
 [{version:1,brokerid:19471,timestamp:1424299100135}] at /controller
 a while back in a different session, hence I will backoff for this node to
 be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)


Re: New Producer - ONLY sync mode?

2015-02-09 Thread Steve Morin
Jay,
Thanks I'll look at that more closely.

On Sat, Feb 7, 2015 at 1:23 PM, Jay Kreps jay.kr...@gmail.com wrote:

 Steve

 In terms of mimicing the sync behavior, I think that is what .get() does,
 no?

 We are always returning the offset and error information. The example I
 gave didn't make use of it, but you definitely can make use of it if you
 want to.

 -Jay

 On Wed, Feb 4, 2015 at 9:58 AM, Steve Morin st...@stevemorin.com wrote:

  Looking at this thread I would ideally want something at least the right
  recipe to mimic sync behavior like Otis is talking about.
 
  In the second case, would like to be able to individually know if
 messages
  have failed even regardless if they are in separate batches, sort of like
  what Kinesis does as Pradeep mentioned.
  -Steve
 
  On Wed, Feb 4, 2015 at 11:19 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
   Yeah totally. Using a callback is, of course, the Right Thing for this
  kind
   of stuff. But I have found that kind of asynchronous thinking can be
 hard
   for people. Even if you get out of the pre-java 8 syntactic pain that
   anonymous inner classes inflict just dealing with multiple threads of
   control without creating async spaghetti can be a challenge for complex
   stuff. That is really the only reason for the futures in the api, they
  are
   strictly less powerful than the callbacks, but at least using them you
  can
   just call .get() and pretend it is blocking.
  
   -Jay
  
   On Wed, Feb 4, 2015 at 7:19 AM, Joe Stein joe.st...@stealth.ly
 wrote:
  
Now that 0.8.2.0 is in the wild I look forward to working with more
 and
seeing what folks start to-do with this function
   
   
  
 
 https://dist.apache.org/repos/dist/release/kafka/0.8.2.0/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html#send(org.apache.kafka.clients.producer.ProducerRecord
,
org.apache.kafka.clients.producer.Callback) and keeping it fully non
blocking.
   
One sprint I know of coming up is going to have the new producer as a
component in their reactive calls and handling bookkeeping and
 retries
through that type of call back approach. Should work well (haven't
  tried
but don't see why not) with Akka, ScalaZ, RxJava, Finagle, etc, etc,
  etc
   in
functional languages and frameworks.
   
I think as JDK 8 starts to get out in the wild too more (may after
 jdk7
eol) the use of .get will be reduced (imho) and folks will be
 thinking
   more
about non-blocking vs blocking and not as so much sync vs async but
 my
crystal ball just back from the shop so well see =8^)
   
/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/
   
On Tue, Feb 3, 2015 at 10:45 PM, Jay Kreps jay.kr...@gmail.com
  wrote:
   
 Hey guys,

 I guess the question is whether it really matters how many
 underlying
 network requests occur? It is very hard for an application to
 depend
  on
 this even in the old producer since it depends on the partitions
placement
 (a send to two partitions may go to either one machine or two and
 so
  it
 will send either one or two requests). So when you send a batch in
  one
call
 you may feel that is all at once, but that is only actually
   guaranteed
if
 all messages have the same partition.

 The challenge is allowing even this in the presence of bounded
  request
 sizes which we have in the new producer. The user sends a list of
   objects
 and the serialized size that will result is not very apparent to
  them.
   If
 you break it up into multiple requests then that is kind of further
ruining
 the illusion of a single send. If you don't then you have to just
  error
out
 which is equally annoying to have to handle.

 But I'm not sure if from your description you are saying you
 actually
care
 how many physical requests are issued. I think it is more like it
 is
   just
 syntactically annoying to send a batch of data now because it
 needs a
   for
 loop.

 Currently to do this you would do:

 List responses = new ArrayList();
 for(input: recordBatch)
 responses.add(producer.send(input));
 for(response: responses)
 response.get

 If you don't depend on the offset/error info we could add a flush
  call
   so
 you could instead do
 for(input: recordBatch)
 producer.send(input);
 producer.flush();

 But if you do want the error/offset then you are going to be back
 to
   the
 original case.

 Thoughts?

 -Jay

 On Mon, Feb 2, 2015 at 1:48 PM, Gwen Shapira 
 gshap...@cloudera.com
 wrote:

  I've been thinking about that too, since both Flume and Sqoop

Re: [ANNOUNCEMENT] Apache Kafka 0.8.2.0 Released

2015-02-05 Thread Steve Morin
Congratz team it's a big accomplishment

 On Feb 5, 2015, at 14:22, Otis Gospodnetic otis.gospodne...@gmail.com wrote:
 
 Big thanks to Jun and everyone else involved!  We're on 0.8.2 as of today.
 :)
 
 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/
 
 
 On Tue, Feb 3, 2015 at 8:37 PM, Jun Rao j...@confluent.io wrote:
 
 The Apache Kafka community is pleased to announce the release for Apache
 Kafka 0.8.2.0.
 
 The 0.8.2.0 release introduces many new features, improvements and fixes
 including:
 - A new Java producer for ease of implementation and enhanced performance.
 - A Kafka-based offset storage.
 - Delete topic support.
 - Per topic configuration of preference for consistency over availability.
 - Scala 2.11 support and dropping support for Scala 2.8.
 - LZ4 Compression.
 
 All of the changes in this release can be found:
 https://archive.apache.org/dist/kafka/0.8.2.0/RELEASE_NOTES.html
 
 Apache Kafka is high-throughput, publish-subscribe messaging system
 rethought of as a distributed commit log.
 
 ** Fast = A single Kafka broker can handle hundreds of megabytes of reads
 and
 writes per second from thousands of clients.
 
 ** Scalable = Kafka is designed to allow a single cluster to serve as the
 central data backbone
 for a large organization. It can be elastically and transparently expanded
 without downtime.
 Data streams are partitioned and spread over a cluster of machines to allow
 data streams
 larger than the capability of any single machine and to allow clusters of
 co-ordinated consumers.
 
 ** Durable = Messages are persisted on disk and replicated within the
 cluster to prevent
 data loss. Each broker can handle terabytes of messages without performance
 impact.
 
 ** Distributed by Design = Kafka has a modern cluster-centric design that
 offers
 strong durability and fault-tolerance guarantees.
 
 You can download the release from: http://kafka.apache.org/downloads.html
 
 We welcome your help and feedback. For more information on how to
 report problems, and to get involved, visit the project website at
 http://kafka.apache.org/
 
 Thanks,
 
 Jun
 


Re: Client Offset Storage

2014-12-12 Thread Steve Morin
Suren,
  Like out of the box storage or roll your own?
-Steve

On Fri, Dec 12, 2014 at 6:33 AM, Surendranauth Hiraman 
suren.hira...@velos.io wrote:

 My team is using Kafka 0.8.1 and we may not be able to upgrade to 0.8.2 to
 take advantage of the broker-side commit of client offsets.

 Is anyone aware of a Java/Scala library for client offset storage outside
 of ZK?


 -- Suren


 SUREN HIRAMAN, VP TECHNOLOGY
 Velos
 Accelerating Machine Learning

 440 NINTH AVENUE, 11TH FLOOR
 NEW YORK, NY 10001
 O: (917) 525-2466 ext. 105
 F: 646.349.4063
 E: suren.hiraman@v suren.hira...@sociocast.comelos.io
 W: www.velos.io



Re: Programmatic Kafka version detection/extraction?

2014-11-11 Thread Steve Morin
That would be great!

On Mon, Nov 10, 2014 at 9:45 PM, Jun Rao jun...@gmail.com wrote:

 Otis,

 We don't have an api for that now. We can probably expose this as a JMX as
 part of kafka-1481.

 Thanks,

 Jun

 On Mon, Nov 10, 2014 at 7:17 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

  Hi,
 
  Is there a way to detect which version of Kafka one is running?
  Is there an API for that, or a constant with this value, or maybe an
 MBean
  or some other way to get to this info?
 
  Thanks,
  Otis
  --
  Monitoring * Alerting * Anomaly Detection * Centralized Log Management
  Solr  Elasticsearch Support * http://sematext.com/
 



Re: migrating log data to new locations

2014-10-07 Thread Steve Morin
Neha,
 I log volume or can it be volumes plural?
-Steve

On Tue, Oct 7, 2014 at 6:41 AM, Neha Narkhede neha.narkh...@gmail.com
wrote:

 Is it possible to perform this migration without losing the data currently
 stored in the kafka cluster?

 Though I haven't tested this, the way this is designed should allow you to
 shut down a broker, move some partition directories over to the new log
 volume and restart the broker. You will have to do this manually per broker
 though.

 Thanks,
 Neha

 On Tue, Oct 7, 2014 at 3:31 AM, Javier Alba m...@fjavieralba.com wrote:

  Hi,
 
  I have a Kafka 0.8.1.1 cluster consisting in 4 servers with several
 topics
  on it.
 
  The cluster was initially configured to store kafka log data in a single
  directory on each server (log.dirs = /tmp/kafka-logs)
 
  Now, I have assigned 3 new disks to each server and I would like to use
  them to store log data, instead the old directory. (logs.dirs =
  /srv/data/1,/srv/data/2,/srv/data/3)
 
  What would be the recommended way of doing such a migration?
 
  Is it possible to perform this migration without losing the data
 currently
  stored in the kafka cluster?
 
  Would it be possible to achieve that kind of change without having to
 stop
  the cluster and losing service?
 
 
  Thanks,
 
 



Re: Right Tool

2014-09-12 Thread Steve Morin
What record format are you writing to Kafka with?

 On Sep 12, 2014, at 17:45, Patrick Barker patrickbarke...@gmail.com wrote:
 
 O, I'm not trying to use it for persistence, I'm wanting to sync 3
 databases: sql, mongo, graph. I want to publish to kafka and then have it
 update the db's. I'm wanting to keep this as efficient as possible.
 
 On Fri, Sep 12, 2014 at 6:39 PM, cac...@gmail.com cac...@gmail.com wrote:
 
 I would say that it depends upon what you mean by persistence. I don't
 believe Kafka is intended to be your permanent data store, but it would
 work if you were basically write once with appropriate query patterns. It
 would be an odd way to describe it though.
 
 Christian
 
 On Fri, Sep 12, 2014 at 4:05 PM, Stephen Boesch java...@gmail.com wrote:
 
 Hi Patrick,   Kafka can be used at any scale including small ones
 (initially anyways). The issues I ran into personally various issues with
 ZooKeeper management and a bug in deleting topics (is that fixed yet?)
 In
 any case you might try out Kafka  - given its highly performant,
 scalable,
 and flexible backbone.   After that you will have little worry about
 scale
 - given Kafka's use within massive web scale deployments.
 
 2014-09-12 15:18 GMT-07:00 Patrick Barker patrickbarke...@gmail.com:
 
 Hey, I'm new to kafka and I'm trying to get a handle on how it all
 works. I
 want to integrate polyglot persistence into my application. Kafka looks
 like exactly what I want just on a smaller scale. I am currently only
 dealing with about 2,000 users, which may grow,  but is kafka a good
 use
 case here, or is there another technology thats better suited?
 
 Thanks
 


Re: Right Tool

2014-09-12 Thread Steve Morin
You would need make sure they were all persisted down properly to each
database?  Why are you persisting it to three different databases (sql,
mongo, graph)?
-Steve

On Fri, Sep 12, 2014 at 7:35 PM, Patrick Barker patrickbarke...@gmail.com
wrote:

 I'm just getting familiar with kafka, currently I just save everything to
 all my db's in a single transaction, if any of them fail I roll them all
 back. However, this is slowing my app down. So, as I understand it I could
 write to kafka, close the transaction, and then it would keep on publishing
 out to my databases. I'm not sure what format I would write it in yet, I
 guess json

 On Fri, Sep 12, 2014 at 7:00 PM, Steve Morin steve.mo...@gmail.com
 wrote:

  What record format are you writing to Kafka with?
 
   On Sep 12, 2014, at 17:45, Patrick Barker patrickbarke...@gmail.com
  wrote:
  
   O, I'm not trying to use it for persistence, I'm wanting to sync 3
   databases: sql, mongo, graph. I want to publish to kafka and then have
 it
   update the db's. I'm wanting to keep this as efficient as possible.
  
   On Fri, Sep 12, 2014 at 6:39 PM, cac...@gmail.com cac...@gmail.com
  wrote:
  
   I would say that it depends upon what you mean by persistence. I don't
   believe Kafka is intended to be your permanent data store, but it
 would
   work if you were basically write once with appropriate query patterns.
  It
   would be an odd way to describe it though.
  
   Christian
  
   On Fri, Sep 12, 2014 at 4:05 PM, Stephen Boesch java...@gmail.com
  wrote:
  
   Hi Patrick,   Kafka can be used at any scale including small ones
   (initially anyways). The issues I ran into personally various issues
  with
   ZooKeeper management and a bug in deleting topics (is that fixed
 yet?)
   In
   any case you might try out Kafka  - given its highly performant,
   scalable,
   and flexible backbone.   After that you will have little worry about
   scale
   - given Kafka's use within massive web scale deployments.
  
   2014-09-12 15:18 GMT-07:00 Patrick Barker patrickbarke...@gmail.com
 :
  
   Hey, I'm new to kafka and I'm trying to get a handle on how it all
   works. I
   want to integrate polyglot persistence into my application. Kafka
  looks
   like exactly what I want just on a smaller scale. I am currently
 only
   dealing with about 2,000 users, which may grow,  but is kafka a good
   use
   case here, or is there another technology thats better suited?
  
   Thanks
  
 



Re: Consume more than produce

2014-08-01 Thread Steve Morin
You have to remember statsd uses udp and possibly lossy which might account
for the errors.
-Steve


On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg guy.doulb...@perion.com
wrote:

 Hey,


 After a year or so I have Kafka as my streaming layer in my production, I
 decided it is time to audit, and to test how many events do I lose, if I
 lose events at all.


 I discovered something interesting which I can't explain.


 The producer produces less events that the consumer group consumes.


 It is not much more, it is about 0.1% more events


 I use the Consumer API (not the simple consumer API)


 I was thinking I might had rebalancing going on in my system, but it
 doesn't look like that.


 Did anyone see such a behaviour


 In order to audit, I calculated for each event the minute it arrived, and
 assigned this value to the event, I used statsd do to count all events from
 all my producer cluster, and all consumer group cluster.


 I must say that it is not a happening for every minute,


 Thanks, Guy





Re: kafka support in collectd and syslog-ng

2014-07-25 Thread Steve Morin
Cool


On Fri, Jul 25, 2014 at 9:25 AM, Joe Stein joe.st...@stealth.ly wrote:

 Awesome!

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /


 On Fri, Jul 25, 2014 at 12:08 PM, Jay Kreps jay.kr...@gmail.com wrote:

  This is great!
 
  -Jay
 
  On Fri, Jul 25, 2014 at 8:55 AM, Pierre-Yves Ritschard p...@spootnik.org
 
  wrote:
   Hi list,
  
   Just a quick note to let you know that kafka support has now been
 merged
  in
   collectd, which means that system and application metrics can directly
 be
   produced on a topic from the collectd daemon.
  
   Additionally, syslog-ng will soon ship with a kafka producing module as
   well, it will be part of the next release of the syslog-ng-incubator
  module
   collection: https://github.com/balabit/syslog-ng-incubator.
  
   What this means is that there is now a very lightweight way to create a
   infrastructure event stream on top of kafka, with known tools in the
 ops
   world.
  
   I relied on the librdkafka library to provide kafka producing, which I
  can
   recommend for C needs.
  
 - pyr
 



Re: Kafka on yarn

2014-07-23 Thread Steve Morin
Kam,
  Give it some time and think it's getting better as a real possibility for
Kafka on Yarn.  There are new capabilities coming out in Yarn/HDFS to allow
for node groups/label that can work with locality and secondarily new
functionality in HDFS that depending on the use-case can be very
interesting with in-memory files.
-Steve


On Wed, Jul 23, 2014 at 4:44 PM, Kam Kasravi kamkasr...@yahoo.com.invalid
wrote:

 Thanks Joe for the input related to Mesos as well as acknowledging the
 need for YARN to support this type of cluster allocation - long running
 services with node locality priority.

 Thanks Jay - That's an interesting fact that I wasn't aware of - though I
 imagine there could possibly be a long latency for the replica data to be
 transferred to the new broker (depending on #/size of partitions). It does
 open up some possibilities to restart brokers on app master restart using
 different containers  (as well as some complications if an old container
 with old data were reallocated on restart). I had used zookeeper to store
 broker locations so the app master on restart would look for this
 information and attempt to reallocate containers on these nodes.  All this
 said, would this be part of kafka or some other framework? I can see kafka
 benefitting from this at the same time kafka's appeal IMO is it's
 simplicity. Spark has chosen to include YARN within its distribution, not
 sure what the kafka team thinks.



 On Wednesday, July 23, 2014 4:19 PM, Jay Kreps jay.kr...@gmail.com
 wrote:



 Hey Kam,

 It would be nice to have a way to get a failed node back with it's
 original data, but this isn't strictly necessary, it is just a good
 optimization. As long as you run with replication you can restart a
 broker elsewhere with no data, and it will restore it's state off the
 other replicas.

 -Jay


 On Wed, Jul 23, 2014 at 3:47 PM, Kam Kasravi
 kamkasr...@yahoo.com.invalid wrote:
  Hi
 
  Kafka-on-yarn requires YARN to consistently allocate a kafka broker at a
 particular resource since the broker needs to always use its local data.
 YARN doesn't do this well, unless you provide (override) the default
 scheduler (CapacityScheduler or FairScheduler). SequenceIO did something
 along these lines for a different use case. Unfortunately replacing the
 scheduler is a global operation which would affect all App masters.
 Additionally one could argue that the broker should be run as an OS service
 and auto restarted on failure if necessary. Slider (incubating) did some of
 this groundwork but YARN still has lots of limitations in providing
 guarantees to consistently allocate a container on a particular node
 especially on appmaster restart (eg ResourceManager dies). That said, it
 might be worthwhile to enumerate all of this here with some possible
 solutions. If there is interest I could certainly list the relevant JIRA's
 along with some additional JIRA's
   required IMO.
 
  Thanks
  Kam
 
 
  On Wednesday, July 23, 2014 2:37 PM, hsy...@gmail.com 
 hsy...@gmail.com wrote:
 
 
 
  Hi guys,
 
  Kafka is getting more and more popular and in most cases people run kafka
  as long-term service in the cluster. Is there a discussion of running
 kafka
  on yarn cluster which we can utilize the convenient
 configuration/resource
  management and HA.  I think there is a big potential and requirement for
  that.
  I found a project https://github.com/kkasravi/kafka-yarn. But is there a
  official roadmap/plan for this?
 
  Thank you very much!
 
  Best,
  Siyuan



Re: Performance/Stress tools

2014-07-16 Thread Steve Morin
We are working on this Yarn based tool , but it's still in alpha
https://github.com/DemandCube/DemandSpike


On Wed, Jul 16, 2014 at 11:30 AM, Dayo Oliyide dayo.oliy...@gmail.com
wrote:

 Hi,

 I'm setting up a Kafka Cluster and would like to carry out some
 performance/stress tests on different configurations.

 Other than the performance testing scripts that come with Kafka, are there
 any other tools that anyone would recommend?

 Regards,
 Dayo



Re: kafka producer pulling from custom restAPI

2014-06-27 Thread Steve Morin
The answer is no, it doesn't work that way.  You would have to write a process 
to consume from the API back end and have that back end write to Kafka

 On Jun 27, 2014, at 19:35, sonali.parthasara...@accenture.com wrote:
 
 Hi,
 
 I have a quick question. Say I have a custom REST API with data in JSON 
 format. Can the Kafka Producer read directly from the REST API and write to a 
 topic?
 
 If so, can you point me to the code base?
 
 Thanks,
 Sonali
 
 Sonali Parthasarathy
 RD Developer, Data Insights
 Accenture Technology Labs
 703-341-7432
 
 
 
 
 This message is for the designated recipient only and may contain privileged, 
 proprietary, or otherwise confidential information. If you have received it 
 in error, please notify the sender immediately and delete the original. Any 
 other use of the e-mail by you is prohibited. Where allowed by local law, 
 electronic communications with Accenture and its affiliates, including e-mail 
 and instant messaging (including content), may be scanned by our systems for 
 the purposes of information security and assessment of internal compliance 
 with Accenture policy.
 __
 
 www.accenture.com


Re: Building Kafka on Mac OS X

2014-06-16 Thread Steve Morin
Have seen if you have a write with zero data it will hang

 On Jun 16, 2014, at 21:02, Timothy Chen tnac...@gmail.com wrote:
 
 Can you try running it in debug mode? (./gradlew jar -d)
 
 Tim
 
 On Mon, Jun 16, 2014 at 8:44 PM, Jorge Marizan jorge.mari...@gmail.com 
 wrote:
 It just hangs there without any output at all.
 
 Jorge.
 
 On Jun 16, 2014, at 11:27 PM, Timothy Chen tnac...@gmail.com wrote:
 
 What output was it stuck on?
 
 Tim
 
 On Mon, Jun 16, 2014 at 6:39 PM, Jorge Marizan jorge.mari...@gmail.com 
 wrote:
 Hi team, I’m a newcomer to Kafka, but I’m having some troubles trying to 
 get it to run on OS X.
 Basically building Kafka on OS X with 'gradlew jar’  gets stuck forever 
 without any progress (Indeed I tried to leave it building all night with 
 no avail).
 
 Any advices will be greatly appreciated. Thanks in advance.
 


Re: Use Kafka To Send Files

2014-06-15 Thread Steve Morin
You also wouldn't have any meta data about the file so I would avoid doing this.

 On Jun 15, 2014, at 20:51, Mark Roberts wiz...@gmail.com wrote:
 
 You would ship the contents of the file across as a message. In general this 
 would mean that your maximum file size must be smaller than your maximum 
 message size. It would generally be a better choice to put a pointer to the 
 file in some shared location on the queue.
 
 -Mark
 
 On Jun 15, 2014, at 19:41, huqiming0...@live.com wrote:
 
 Hi Sometimes we want use kafka to send files(like textfile,xml...), but 
 I didn't see this in documention.
Can kafka use to tansfer files? If can ,how can I do
   Thanks 


Re: Make kafka storage engine pluggable and provide a HDFS plugin?

2014-05-20 Thread Steve Morin
Hangjun,
  Does having Kafka in Yarn would be a big architectural change from where
it is now?  From what I have seen on most typical setup you want machines
optimized for Kafka, not just it on top of hdfs.
-Steve


On Tue, May 20, 2014 at 8:37 PM, Hangjun Ye yehang...@gmail.com wrote:

 Thanks Jun and Francois.

 We used Kafka 0.8.0 previously. We got some weird error when expanding
 cluster and it couldn't be finished.
 Now we use 0.8.1.1, I would have a try on cluster expansion sometime.

 I read the discussion on that jira issue and I agree with points raised
 there.
 HDFS was also improved a lot since then and many issues have been resolved
 (e.g. SPOF).

 We have a team for building and providing storage/computing platform for
 our company and we have already provided a Hadoop cluster.
 If Kafka has an option to store data on HDFS, we just need to allocate some
 space quota for it on our cluster (and increase it on demand) and it might
 reduce our operational cost a lot.

 Another (and maybe more aggressive) thought is about the deployment. Jun
 has a good point: HDFS only provides data redundancy, but not
 computational redundancy. If Kafka could be deployed on YARN, it could
 offload some computational resource management to YARN and we don't have to
 allocate machines physically. Kafka still needs to take care of load
 balance and partition assignment among brokers by itself.
 Many computational frameworks like spark/samza have such an option and it's
 a big attractive point for us.

 Best,
 Hangjun


 2014-05-20 21:00 GMT+08:00 François Langelier f.langel...@gmail.com:

  Take a look at Camus https://github.com/linkedin/camus/
 
 
 
  François Langelier
  Étudiant en génie Logiciel - École de Technologie
  Supérieurehttp://www.etsmtl.ca/
  Capitaine Club Capra http://capra.etsmtl.ca/
  VP-Communication - CS Games http://csgames.org 2014
  Jeux de Génie http://www.jdgets.com/ 2011 à 2014
  Argentier Fraternité du Piranha http://fraternitedupiranha.com/
  2012-2014
  Comité Organisateur Olympiades ÉTS 2012
  Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
 
 
  On 19 May 2014 05:28, Hangjun Ye yehang...@gmail.com wrote:
 
   Hi there,
  
   I recently started to use Kafka for our data analysis pipeline and it
  works
   very well.
  
   One problem to us so far is expanding our cluster when we need more
  storage
   space.
   Kafka provides some scripts for helping do this but the process wasn't
   smooth.
  
   To make it work perfectly, seems Kafka needs to do some jobs that a
   distributed file system has already done.
   So just wondering if any thoughts to make Kafka work on top of HDFS?
  Maybe
   make the Kafka storage engine pluggable and HDFS is one option?
  
   The pros might be that HDFS has already handled storage management
   (replication, corrupted disk/machine, migration, load balance, etc.)
 very
   well and it frees Kafka and the users from the burden, and the cons
 might
   be performance degradation.
   As Kafka does very well on performance, possibly even with some degree
 of
   degradation, it's still competitive for the most situations.
  
   Best,
   --
   Hangjun Ye
  
 



 --
 Hangjun Ye



Re: Anouncing Kafka Offset Monitor 0.1

2014-03-07 Thread Steve Morin
Very nice

 On Mar 7, 2014, at 11:55, Pierre Andrews pie...@quantifind.com wrote:
 
 Claude, we should join forces ;)
 
 
 On Fri, Mar 7, 2014 at 4:45 PM, Claude Mamo claude.m...@gmail.com wrote:
 
 Awesome!!! ;-)
 
 Claude
 
 
 On Fri, Mar 7, 2014 at 4:03 PM, Pierre Andrews pie...@quantifind.com
 wrote:
 
 Great! Thanks!
 
 
 On Fri, Mar 7, 2014 at 3:59 PM, Jay Kreps jay.kr...@gmail.com wrote:
 
 This is really useful! I added it to the ecosystem page:
 https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
 
 -Jay
 
 
 On Fri, Mar 7, 2014 at 10:49 AM, Pierre Andrews pie...@quantifind.com
 wrote:
 
 Hello everyone,
 
 at Quantifind, we are big users of Kafka and we like it a lot!
 In a few use cases, we had to figure out if a queue was growing and
 how
 its
 consumers were behaving. There are a few command-line tools to try to
 figure out what's going on, but it's not always easy to debug and to
 see
 what has happened while everyone was sleeping.
 
 To be able to monitor our kafka queues and consumers, we thus
 developed a
 tiny web app that could tell us the log size of each topic in the
 brokers
 and the offsets of each consumers. That's very similar to what the
 kafka
 ConsumerOffsetChecker tool* is doing, but instead of having a one off
 snapshot, our app keeps an history and displays a nice graph of
 what's
 going on.
 
 You can find screenshots and more details here:
 http://quantifind.github.io/KafkaOffsetMonitor/
 
 the code is on github:
 https://github.com/quantifind/KafkaOffsetMonitor
 
 If you have kafka 0.8 setup, it's very easy to use:
 
  1- download the current jar
 http://quantifind.github.io/KafkaOffsetMonitor/dist/KafkaOffsetMonitor-assembly-0.1.0-SNAPSHOT.jar
  2- run it, pointing at your kafka brokers:
java -cp KafkaOffsetMonitor-assembly-0.1.0-SNAPSHOT.jar \
 com.quantifind.kafka.offsetapp.OffsetGetterWeb \
 --zk zk-server1,zk-server2 \
 --port 8080 \
 --refresh 10.seconds \
 --retain 2.days
   3- open your browser and point it to localhost:8080
 
 You can run it locally or host it on a server if you prefer.
 
 It's all open source and we'll be happy to receive issue report and
 pull
 requests.
 
 I hope that you like it and that it can find some uses in the rest of
 the
 Kafka community.
 
 Best
 
 Pierre
 
 PS: we are aware of
 https://github.com/claudemamo/kafka-web-consolebut
 are
 currently offering slightly different features. Hopefully we can
 merge
 the
 projects in the future.
 
 * here:
 https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/tools/ConsumerOffsetChecker.scala
 


Re: Logging in new clients

2014-02-03 Thread Steve Morin
My vote would be with log4j, I don't have that much experience with log4j2
or a  good feel for how much the industry is moving towards it.


On Mon, Feb 3, 2014 at 11:17 AM, Joel Koshy jjkosh...@gmail.com wrote:

 We are already using other libraries in various parts of our code
 (e.g., metrics, zkclient, joptsimple, etc) some of which pull in these
 other logging dependencies anyway. i.e., what do we gain by using jul?
 There may be a good reason why people don't use jul so I think we
 should fully understand that before going with jul. So it may be
 simpler to just stick with log4j for the client rewrites and
 investigate logging later.

 log4j2 is becoming more widespread and many users seem to be favorable
 toward logback. slf4j would cover all of these very easily. From what
 I understand jul does not make it very easy to plug in with these
 various options but I could be wrong.

 I completely agree on the need to fix our client logging as that will
 go a long way in usability for end-users unless we want to keep
 getting asked the Why do I see this ERROR in my logs..? questions.

 Joel

 On Mon, Feb 03, 2014 at 11:08:39AM -0800, Neha Narkhede wrote:
   Basically my preference would be java.util.logging unless there is
 some
  known problem with it, otherwise I guess slf4j, and if not that then
 log4j.
 
  +1. My preference is to use java.util.logging to avoid adding an external
  dependency,
  but I'm not too sure about what's the standard out there, so open to
  suggestions
  on picking a different library.
 
 
 
  On Mon, Feb 3, 2014 at 11:00 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
   We probably need to add a small amount of logging in the new producer
 and
   (soon) consumer clients. I wanted to have a quick discussion on logging
   libraries before I start adding this in the producer.
  
   Previously we have been pretty verbose loggers and I think we should
 stop
   that. For clients you mostly don't need to log: if there is an error
 you
   should throw it back not log it, so you don't need ERROR logging.
 Likewise
   I think it is rude to pollute peoples logs with the details of client
   initialization (establishing connections, etc), so you don't need INFO
   logging. However perhaps there is an argument to be made for WARN and
   DEBUG. I think it is perhaps useful to log a WARN when a server breaks
 a
   connection or metadata initialization fails. It can sometimes also be
   useful to be able to enable debug logging to see step by step
 processing in
   the client, which is the case for DEBUG logging.
  
   Here is my knowledge about the state of Java logging:
   1. Most people still use log4j. The server is using log4j.
   2. Second runner-up in slf4j. I personally consider slf4j pretty silly
 but
   this is perhaps the more flexible choice since people can plug in
 different
   stuff.
   3. java.util.logging ships with the jdk, but for some reason no one
 uses
   it.
   4. There is no critical mass around any other logging library.
  
   The context for how to think about this is the following. We are not
 trying
   to pick the best logging library. Fundamentally logging is pretty
   straight-forward and for our simple use case it is inconceivable that
 any
   particular library could be much better than any other in terms of
 feature
   set. We want the most standard library. My goal is to minimize the
   dependencies of the client and make our basic logging cases work for
 most
   cases.
  
   Is there a reason not to just the java.util.logging? It comes with the
 jdk
   and supports pluggable appenders so people who have some other library
 can
   plug that in right?
  
   Basically my preference would be java.util.logging unless there is some
   known problem with it, otherwise I guess slf4j, and if not that then
 log4j.
  
   Thougts?
  
   -Jay
  




Re: There is insufficient memory for the Java Runtime Environment to continue.

2014-01-31 Thread Steve Morin
Do you have anything like Graphite or Ganglia monitoring the box to see
exactly what's going on?


On Fri, Jan 31, 2014 at 1:45 PM, David Montgomery davidmontgom...@gmail.com
 wrote:

 Welll...I did get kafka to run on a digiocean box with 4 gigs or ram.  All
 great but now i am paying 40 USD a month for dev servers when I was paying
 5.  I have 5 dev servers around the worlds.  Would be great to get back ot
 5 USD for boxes that just need to start up rather just doing anything
 substantial.

 Going back to the original issue.  Is this the only lever I have to get
 kafka to work on a 512megs ram box?  KAFKA_HEAP_OPTS=-Xmx256M -Xms128M?
 Any other trick?

 Thanks



 On Sat, Feb 1, 2014 at 5:21 AM, Benjamin Black b...@b3k.us wrote:

  Sorry, was looking at pre-release 0.8 code. No idea now why they are not
  being set as expected.
 
 
  On Fri, Jan 31, 2014 at 1:20 PM, Benjamin Black b...@b3k.us wrote:
 
   kafka-run-class.sh in 0.8 does not define KAFKA_HEAP_OPTS. i think you
   want KAFKA_OPTS.
  
  
   On Fri, Jan 31, 2014 at 1:14 PM, David Montgomery 
   davidmontgom...@gmail.com wrote:
  
   What d you mean?
  
   This is appended to kafka-run-class.sh
  
   KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote=true
   -Dcom.sun.management.jmxremote.authenticate=false
   -Dcom.sun.management.jmxremote.ssl=false
  
   prior to running the below in the shel I run =  export
   KAFKA_HEAP_OPTS=-Xmx1G -Xms1G
  
   I start kafka like this:
   /var/lib/kafka-0.8.0-src/bin/kafka-server-start.sh
   /var/lib/kafka-0.8.0-src/config/server.properties
  
   What else is there do?  I did not have an issue with kafka 7.  Only 8.
  
   Thanks
  
  
  
  
  
  
  
  
   On Fri, Jan 31, 2014 at 4:47 AM, Benjamin Black b...@b3k.us wrote:
  
are you sure the java opts are being set as you expect?
On Jan 30, 2014 12:41 PM, David Montgomery 
  davidmontgom...@gmail.com
   
wrote:
   
 Hi,

 This is a dedicated machine on DO.But i can say I did not
 have a
 problem with kafka 7.
 I just upgraded the macine to 1gig on digi ocean.  Same error.

 export KAFKA_HEAP_OPTS=-Xmx1G -Xms1G
 root@do-kafka-sf-development-20140130195343
 :/etc/supervisor/conf.d#
 /var/lib/kafka-0.8.0-src/bin/kafka-server-start.sh
 /var/lib/kafka-0.8.0-src/config/server.properties
 OpenJDK 64-Bit Server VM warning: INFO:
 os::commit_memory(0xbad3, 986513408, 0) failed;
   error='Cannot
 allocate memory' (errno=12)
 #
 # There is insufficient memory for the Java Runtime Environment to
 continue.
 # Native memory allocation (malloc) failed to allocate 986513408
  bytes
for
 committing reserved memory.
 # An error report file with more information is saved as:
 # /etc/supervisor/conf.d/hs_err_pid11635.log



 Below is how I install via chef

 version = '0.8.0'
 tar -xzf kafka-#{version}-src.tgz
 cd kafka-#{version}-src
 ./sbt update
 ./sbt package
 ./sbt assembly-package-dependency
 echo 'KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote=true
 -Dcom.sun.management.jmxremote.authenticate=false
 -Dcom.sun.management.jmxremote.ssl=false' | tee -a
 /var/lib/kafka-#{version}-src/bin/kafka-run-class.sh
 echo 'export JMX_PORT=${JMX_PORT:-}' | tee -a
 /var/lib/kafka-#{version}-src/bin/kafka-server-start.sh













 On Fri, Jan 31, 2014 at 4:04 AM, Jay Kreps jay.kr...@gmail.com
   wrote:

  I think this may be more a general java thing. Can you try
 running
   any
 java
  class with the same command line options you are using for kafka
  and
  confirm that that also doesn't work.
 
  -Jay
 
  On Thu, Jan 30, 2014 at 11:23 AM, David Montgomery 
  davidmontgom...@gmail.com wrote:
 
   Hi, %20us...@kafka.apache.org
  
   Why oh whhy can I nt start kafka 8?  I am on a machine with
 512
   megs
of
  ram
   on digi ocean.  What does one have to do to get kafka to work?
  
  
  
   root@do-kafka-sf-development-20140130051956: export
   KAFKA_HEAP_OPTS=-Xmx256M -Xms128M
   root@do-kafka-sf-development-20140130051956
   :/etc/supervisor/conf.d#
   /var/lib/kafka-0.8.0-src/bin/kafka-server-start.sh
   /var/lib/kafka-0.8.0-src/config/server.properties
   OpenJDK 64-Bit Server VM warning: INFO:
   os::commit_memory(0xbad3, 986513408, 0) failed;
 error='Cannot
   allocate memory' (errno=12)
   #
   # There is insufficient memory for the Java Runtime
 Environment
  to
   continue.
   # Native memory allocation (malloc) failed to allocate
 986513408
bytes
  for
   committing reserved memory.
   # An error report file with more information is saved as:
   # /etc/supervisor/conf.d/hs_err_pid17533.log
  
  
   Thanks
  
 

   
  
  
  
 



Re: New Producer Public API

2014-01-29 Thread Steve Morin
Is the new producer API going to maintain protocol compatibility with old 
version if the API under the hood?

 On Jan 29, 2014, at 10:15, Jay Kreps jay.kr...@gmail.com wrote:
 
 The challenge of directly exposing ProduceRequestResult is that the offset
 provided is just the base offset and there is no way to know for a
 particular message where it was in relation to that base offset because the
 batching is transparent and non-deterministic. So I think we do need some
 kind of per-message result.
 
 I started with FutureRequestResult, I think for the same reason you
 prefer it but then when I actually looked at some code samples it wasn't
 too great--checked exceptions, methods that we can't easily implement, etc.
 I moved away from that for two reasons:
 1. When I actually wrote out some code samples of usage they were a little
 ugly for the reasons I described--checked exceptions, methods we can't
 implement, no helper methods, etc.
 2. I originally intended to make the result send work like a
 ListenableFuture so that you would register the callback on the result
 rather than as part of the call. I moved away from this primarily because
 the implementation complexity was a little higher.
 
 Whether or not the code prettiness on its own outweighs the familiarity of
 a normal Future I don't know, but that was the evolution of my thinking.
 
 -Jay
 
 
 On Wed, Jan 29, 2014 at 10:06 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
 Hey Neha,
 
 Error handling in RecordSend works as in Future you will get the exception
 if there is one from any of the accessor methods or await().
 
 The purpose of hasError was that you can write things slightly more simply
 (which some people expressed preference for):
  if(send.hasError())
// do something
  long offset = send.offset();
 
 Instead of the more the slightly longer:
 try {
   long offset = send.offset();
 } catch (KafkaException e) {
   // do something
 }
 
 
 On Wed, Jan 29, 2014 at 10:01 AM, Neha Narkhede 
 neha.narkh...@gmail.comwrote:
 
 Regarding the use of Futures -
 
 Agree that there are some downsides to using Futures but both approaches
 have some tradeoffs.
 
 - Standardization and usability
 Future is a widely used and understood Java API and given that the
 functionality that RecordSend hopes to provide is essentially that of
 Future, I think it makes sense to expose a widely understood public API
 for
 our clients. RecordSend, on the other hand, seems to provide some APIs
 that
 are very similar to that of Future, in addition to exposing a bunch of
 APIs
 that belong to ProduceRequestResult. As a user, I would've really
 preferred
 to deal with ProduceRequestResult directly -
 FutureProduceRequestResult send(...)
 
 - Error handling
 RecordSend's error handling is quite unintuitive where the user has to
 remember to invoke hasError and error, instead of just throwing the
 exception. Now there are
 some downsides regarding error handling with the Future as well, where the
 user has to catch InterruptedException when we would never run into it.
 However, it seems like a price worth paying for supporting a standard API
 and error handling
 
 - Unused APIs
 This is a downside of using Future, where the cancel() operation would
 always return false and mean nothing. But we can mention that caveat in
 our
 Java docs.
 
 To summarize, I would prefer to expose a well understood and widely
 adopted
 Java API and put up with the overhead of catching one unnecessary checked
 exception, rather than wrap the useful ProduceRequestResult in a custom
 async object (RecordSend) and explain that to our many users.
 
 Thanks,
 Neha
 
 
 
 
 On Tue, Jan 28, 2014 at 8:10 PM, Jay Kreps jay.kr...@gmail.com wrote:
 
 Hey Neha,
 
 Can you elaborate on why you prefer using Java's Future? The downside
 in my
 mind is the use of the checked InterruptedException and
 ExecutionException.
 ExecutionException is arguable, but forcing you to catch
 InterruptedException, often in code that can't be interrupted, seems
 perverse. It also leaves us with the cancel() method which I don't
 think we
 really can implement.
 
 Option 1A, to recap/elaborate, was the following. There is no
 Serializer or
 Partitioner api. We take a byte[] key and value and an optional integer
 partition. If you specify the integer partition it will be used. If you
 do
 not specify a key or a partition the partition will be chosen in a round
 robin fashion. If you specify a key but no partition we will chose a
 partition based on a hash of the key. In order to let the user find the
 partition we will need to given them access to the Cluster instance
 directly from the producer.
 
 -Jay
 
 
 On Tue, Jan 28, 2014 at 6:25 PM, Neha Narkhede neha.narkh...@gmail.com
 wrote:
 
 Here are more thoughts on the public APIs -
 
 - I suggest we use java's Future instead of custom Future especially
 since
 it is part of the public API
 
 - Serialization: I like the simplicity of the producer APIs with the
 absence of 

Re: Kafka @ apachecon Denver?

2014-01-27 Thread Steve Morin
Are you using mesos?

 On Jan 27, 2014, at 8:39, Joe Stein joe.st...@stealth.ly wrote:
 
 I was going to submit a talk on Kafka and Mesos.  I still am trying to nail 
 down the dates in my schedule though.  Anyone else going? Maybe we could do a 
 meetup or bof or something?
 
 
 /***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
 /
 
 
 On Jan 27, 2014, at 9:54 AM, David Corley davidcor...@gmail.com wrote:
 
 Kafka Devs,
 Just wondering if there'll be anything in the line of Kafka presentations
 and/or tutorials at ApacheCon in Denver in April?


Re: Anyone working on a Kafka book?

2013-12-13 Thread Steve Morin
Shafaq,
  That's pretty cool, have you already connected Kafka to spark RRD/DStream
or is that something you have to figure out?
-Steve


On Tue, Dec 10, 2013 at 7:10 PM, Shafaq s.abdullah...@gmail.com wrote:

 Hi Steve,

The first phase would be pretty simple, essentially hooking up the
 Kafka-DStream-consumer to perform KPI aggregation over the streamed data
 from Kafka Broker cluster in real-time.

   We would like to maximize the throughput by choosing the right message
 payload size, correct kafka topic/partition mapped to  spark RRD/DStream
 and minimizing I/O.

 Next, we would be using featurizing the stream to be able to develop
 machine learning models using SVMs (Support Vector Machines) etc to provide
 rich insights.

 Would be soon giving talk on this one, so keep tuned.

 Regards,
 S.Abdullah


 On Tue, Dec 10, 2013 at 10:22 AM, Steve Morin st...@stevemorin.comwrote:

 Shafaq,
   What does the architecture of what your building look like?
 -Steve


 On Tue, Dec 10, 2013 at 10:19 AM, Shafaq s.abdullah...@gmail.com wrote:

 Hey Guys,
I would love to contribute to the book specially in the portion of
 Kafka-Spark integration or parts of kafka in general.
Am building a Kafka-Spark Real-time framework here at Gree Intl Inc
 processing order of MBs of data per second.

 My profile:
www.linkedin.com/in/shafaqabdullah/


 Let me know, Im open in whatever ways.


 Regards,
 S.Abdullah



 On Tue, Dec 10, 2013 at 9:15 AM, chetan conikee coni...@gmail.comwrote:

 Hey Guys

 Yes, Ben Lorica (Oreilly) and I are planning to pen a Beginning Kafka
 book.
 We only finalized this late October are hoping to start this mid-month

 Chetan


 On Tue, Dec 10, 2013 at 8:45 AM, Steve Morin st...@stevemorin.com
 wrote:

  I'll let chetan comment if he's up for it.
  -Steve
 
 
 
  On Tue, Dec 10, 2013 at 8:40 AM, David Arthur mum...@gmail.com
 wrote:
 
  There was some talk a few months ago, not sure what the current
 status is.
 
 
  On 12/10/13 10:01 AM, S Ahmed wrote:
 
  Is there a book or this was just an idea?
 
 
  On Mon, Mar 25, 2013 at 12:42 PM, Chris Curtin 
 curtin.ch...@gmail.com
  wrote:
 
   Thanks Jun,
 
  I've updated the example with this information.
 
  I've also removed some of the unnecessary newlines.
 
  Thanks,
 
  Chris
 
 
  On Mon, Mar 25, 2013 at 12:04 PM, Jun Rao jun...@gmail.com
 wrote:
 
   Chris,
 
  This looks good. One thing about partitioning. Currently, if a
 message
  doesn't have a key, we always use the random partitioner
 (regardless of
  what partitioner.class is set to).
 
  Thanks,
 
  Jun
 
 
 
 
 
 




 --
 Kind Regards,
 Shafaq





 --
 Kind Regards,
 Shafaq




Re: Anyone working on a Kafka book?

2013-12-13 Thread Steve Morin
Would you mind sharing your connection setup?

 On Dec 13, 2013, at 10:36, Shafaq s.abdullah...@gmail.com wrote:
 
 Thats already done.
 
 On Dec 13, 2013 9:28 AM, Steve Morin st...@stevemorin.com wrote:
 Shafaq,
   That's pretty cool, have you already connected Kafka to spark RRD/DStream 
 or is that something you have to figure out?
 -Steve
 
 
 On Tue, Dec 10, 2013 at 7:10 PM, Shafaq s.abdullah...@gmail.com wrote:
 Hi Steve,

The first phase would be pretty simple, essentially hooking up the 
 Kafka-DStream-consumer to perform KPI aggregation over the streamed data 
 from Kafka Broker cluster in real-time.
 
   We would like to maximize the throughput by choosing the right message 
 payload size, correct kafka topic/partition mapped to  spark RRD/DStream 
 and minimizing I/O.
 
 Next, we would be using featurizing the stream to be able to develop 
 machine learning models using SVMs (Support Vector Machines) etc to provide 
 rich insights. 
 
 Would be soon giving talk on this one, so keep tuned.
 
 Regards,
 S.Abdullah
 
 
 On Tue, Dec 10, 2013 at 10:22 AM, Steve Morin st...@stevemorin.com wrote:
 Shafaq,
   What does the architecture of what your building look like?
 -Steve
 
 
 On Tue, Dec 10, 2013 at 10:19 AM, Shafaq s.abdullah...@gmail.com wrote:
 
 Hey Guys,
I would love to contribute to the book specially in the portion of 
 Kafka-Spark integration or parts of kafka in general. 
Am building a Kafka-Spark Real-time framework here at Gree Intl Inc 
 processing order of MBs of data per second.
   
 My profile:
www.linkedin.com/in/shafaqabdullah/
 
 
 Let me know, Im open in whatever ways.
 
 
 Regards,
 S.Abdullah
 
 
 
 On Tue, Dec 10, 2013 at 9:15 AM, chetan conikee coni...@gmail.com 
 wrote:
 Hey Guys
 
 Yes, Ben Lorica (Oreilly) and I are planning to pen a Beginning Kafka
 book.
 We only finalized this late October are hoping to start this mid-month
 
 Chetan
 
 
 On Tue, Dec 10, 2013 at 8:45 AM, Steve Morin st...@stevemorin.com 
 wrote:
 
  I'll let chetan comment if he's up for it.
  -Steve
 
 
 
  On Tue, Dec 10, 2013 at 8:40 AM, David Arthur mum...@gmail.com wrote:
 
  There was some talk a few months ago, not sure what the current 
  status is.
 
 
  On 12/10/13 10:01 AM, S Ahmed wrote:
 
  Is there a book or this was just an idea?
 
 
  On Mon, Mar 25, 2013 at 12:42 PM, Chris Curtin 
  curtin.ch...@gmail.com
  wrote:
 
   Thanks Jun,
 
  I've updated the example with this information.
 
  I've also removed some of the unnecessary newlines.
 
  Thanks,
 
  Chris
 
 
  On Mon, Mar 25, 2013 at 12:04 PM, Jun Rao jun...@gmail.com wrote:
 
   Chris,
 
  This looks good. One thing about partitioning. Currently, if a 
  message
  doesn't have a key, we always use the random partitioner 
  (regardless of
  what partitioner.class is set to).
 
  Thanks,
 
  Jun
 
 
 
 
 
 
 
 
 
 -- 
 Kind Regards,
 Shafaq
 
 
 
 -- 
 Kind Regards,
 Shafaq


Re: Anyone working on a Kafka book?

2013-12-10 Thread Steve Morin
I forget but think Chetan was with oreilly

 On Dec 10, 2013, at 7:01, S Ahmed sahmed1...@gmail.com wrote:
 
 Is there a book or this was just an idea?
 
 
 On Mon, Mar 25, 2013 at 12:42 PM, Chris Curtin curtin.ch...@gmail.comwrote:
 
 Thanks Jun,
 
 I've updated the example with this information.
 
 I've also removed some of the unnecessary newlines.
 
 Thanks,
 
 Chris
 
 
 On Mon, Mar 25, 2013 at 12:04 PM, Jun Rao jun...@gmail.com wrote:
 
 Chris,
 
 This looks good. One thing about partitioning. Currently, if a message
 doesn't have a key, we always use the random partitioner (regardless of
 what partitioner.class is set to).
 
 Thanks,
 
 Jun
 


Re: Anyone working on a Kafka book?

2013-12-10 Thread Steve Morin
I'll let chetan comment if he's up for it.
-Steve


On Tue, Dec 10, 2013 at 8:40 AM, David Arthur mum...@gmail.com wrote:

 There was some talk a few months ago, not sure what the current status is.


 On 12/10/13 10:01 AM, S Ahmed wrote:

 Is there a book or this was just an idea?


 On Mon, Mar 25, 2013 at 12:42 PM, Chris Curtin curtin.ch...@gmail.com
 wrote:

  Thanks Jun,

 I've updated the example with this information.

 I've also removed some of the unnecessary newlines.

 Thanks,

 Chris


 On Mon, Mar 25, 2013 at 12:04 PM, Jun Rao jun...@gmail.com wrote:

  Chris,

 This looks good. One thing about partitioning. Currently, if a message
 doesn't have a key, we always use the random partitioner (regardless of
 what partitioner.class is set to).

 Thanks,

 Jun







Re: Anyone working on a Kafka book?

2013-12-10 Thread Steve Morin
Shafaq,
  What does the architecture of what your building look like?
-Steve


On Tue, Dec 10, 2013 at 10:19 AM, Shafaq s.abdullah...@gmail.com wrote:

 Hey Guys,
I would love to contribute to the book specially in the portion of
 Kafka-Spark integration or parts of kafka in general.
Am building a Kafka-Spark Real-time framework here at Gree Intl Inc
 processing order of MBs of data per second.

 My profile:
www.linkedin.com/in/shafaqabdullah/


 Let me know, Im open in whatever ways.


 Regards,
 S.Abdullah



 On Tue, Dec 10, 2013 at 9:15 AM, chetan conikee coni...@gmail.com wrote:

 Hey Guys

 Yes, Ben Lorica (Oreilly) and I are planning to pen a Beginning Kafka
 book.
 We only finalized this late October are hoping to start this mid-month

 Chetan


 On Tue, Dec 10, 2013 at 8:45 AM, Steve Morin st...@stevemorin.com
 wrote:

  I'll let chetan comment if he's up for it.
  -Steve
 
 
 
  On Tue, Dec 10, 2013 at 8:40 AM, David Arthur mum...@gmail.com wrote:
 
  There was some talk a few months ago, not sure what the current status
 is.
 
 
  On 12/10/13 10:01 AM, S Ahmed wrote:
 
  Is there a book or this was just an idea?
 
 
  On Mon, Mar 25, 2013 at 12:42 PM, Chris Curtin 
 curtin.ch...@gmail.com
  wrote:
 
   Thanks Jun,
 
  I've updated the example with this information.
 
  I've also removed some of the unnecessary newlines.
 
  Thanks,
 
  Chris
 
 
  On Mon, Mar 25, 2013 at 12:04 PM, Jun Rao jun...@gmail.com wrote:
 
   Chris,
 
  This looks good. One thing about partitioning. Currently, if a
 message
  doesn't have a key, we always use the random partitioner
 (regardless of
  what partitioner.class is set to).
 
  Thanks,
 
  Jun
 
 
 
 
 
 




 --
 Kind Regards,
 Shafaq




Re: Using Kafka 0.8 from Scala and Akka

2013-12-04 Thread Steve Morin
Chetan,
  Are you also releasing a Scala RxJava producer as well?
-Steve


On Tue, Dec 3, 2013 at 10:42 PM, Richard Rodseth rrods...@gmail.com wrote:

 Any update on this, Chetan? Thanks.


 On Thu, Oct 31, 2013 at 4:11 PM, chetan conikee coni...@gmail.com wrote:

  I am in the process of releasing out Scala and RxJava consumer(s) on
  github. Will be releasing it soon. Keep an eye out.
 
 
  On Thu, Oct 31, 2013 at 3:49 PM, Richard Rodseth rrods...@gmail.com
  wrote:
 
   So I have the 0.8. Beta 1 consumer Java example running now.
  
   Is there a Scala API documented somewhere? What about Akka examples?
   RxJava? When I Google for that I get old links to using Kafka as a
  durable
   Akka mailbox. I don't think that's what I'm looking for. I just want
   something reactive that doesn't use java.util.concurrent.
  
   Any pointers?
  
   Thanks.
  
 



Re: Loggly's use of Kafka on AWS

2013-12-02 Thread Steve Morin
Philip this is definitely useful

 On Dec 2, 2013, at 14:55, Surendranauth Hiraman suren.hira...@sociocast.com 
 wrote:
 
 S Ahmed,
 
 This combination of Kafka and Storm to process streaming data is becoming
 pretty common. Definitely worth looking at.
 
 The throughput will vary depending on your workload (cpu usage, etc.) and
 if you're talking to a backend, of course. But it scales very well.
 
 -Suren
 
 
 
 On Mon, Dec 2, 2013 at 5:49 PM, S Ahmed sahmed1...@gmail.com wrote:
 
 Interesting.  So twitter storm is used to basically process the messages on
 kafka?   I'll have to read-up on storm b/c I always thought the use case
 was a bit different.
 
 
 
 On Sun, Dec 1, 2013 at 9:59 PM, Joe Stein joe.st...@stealth.ly wrote:
 
 Awesome Philip, thanks for sharing!
 
 On Sun, Dec 1, 2013 at 9:17 PM, Philip O'Toole phi...@loggly.com
 wrote:
 
 A couple of us here at Loggly recently spoke at AWS reinvent, on how we
 use Kafka 0.72 in our ingestion pipeline. The slides are at the link
 below,
 and may be of interest to people on this list.
 
 
 
 
 http://www.slideshare.net/AmazonWebServices/infrastructure-at-scale-apache-kafka-twitter-storm-elastic-search-arc303-aws-reinvent-2013
 
 Any questions, let me know, though I can't promise I can answer
 everything. Can't give the complete game away. :-)
 
 As always, Kafka rocks!
 
 Philip
 
 
 
 
 
 
 
 
 
 -- 
 ___
 Available at these partners:
 [image: CloudFlare | shopify | Bigcommerce]
 
 SUREN HIRAMAN, VP TECHNOLOGY
 SOCIOCAST
 Simple. Powerful. Predictions.
 
 96 SPRING STREET, 7TH FLOOR
 NEW YORK, NY 10012
 O: (917) 525-2466 ext. 105
 F: 646.349.4063
 E: suren.hira...@sociocast.com
 W: www.sociocast.com
 
 Increase Conversion Rates up to 500%. Go to www.sociocast.com and enter
 your URL for a free trial!


Re: kafka producer - retry messages

2013-11-28 Thread Steve Morin
Demian,
  I have been looking at building that into Sparkngin (
https://github.com/DemandCube/Sparkngin)  What kind of window are you
looking for?
-Steve



On Thu, Nov 28, 2013 at 7:23 AM, Demian Berjman dberj...@despegar.comwrote:

 Joe, i meant that all the kafka cluster is down, even the replicas of that
 topic.

 Thanks,


 On Thu, Nov 28, 2013 at 12:10 PM, Joe Stein joe.st...@stealth.ly wrote:

  Have you tried the replication feature added to 0.8
  http://kafka.apache.org/documentation.html#replication
 
 
  /***
   Joe Stein
   Founder, Principal Consultant
   Big Data Open Source Security LLC
   http://www.stealth.ly
   Twitter: @allthingshadoop
  /
 
 
  On Nov 28, 2013, at 9:37 AM, Demian Berjman dberj...@despegar.com
 wrote:
 
   Hi.
  
   Anyone has build a retry system (durable) for the producer in case the
   kakfa cluster is down? We have certain messages that must be sent
 because
   there are after a transaction that cannot be undo.
  
   We could set the property message.send.max.retries, but if the
 producer
   goes down, we lost the messages.
  
   Thanks,
 



Re: kafka producer - retry messages

2013-11-28 Thread Steve Morin
What I mean by that is that your looking to have the Kafka cluster able to
be down for like 5 minutes or upto a day.  The problem is estimating how
long it will take to recover.

Is this work your doing for a consulting project?  Or are you doing
something on behalf of an employer.  Basically would like to know more
about the use-case.  You can email me directly at st...@stevemorin.com so
we don't clog the message board.


On Thu, Nov 28, 2013 at 8:05 AM, Demian Berjman dberj...@despegar.comwrote:

 Steve, thanks for the response! I don't understand what you mean by what
 kind of window?

 I am looking for something like, i think you did it in Sparkngin: Log
 persistence if the log producer connection is down.



Re: kafka producer - retry messages

2013-11-28 Thread Steve Morin
Philip,
  Do you do that at loggly?

Otis,
  How was your retry code structured?  Have you open sourced it?

 On Nov 28, 2013, at 16:08, Philip O'Toole phi...@loggly.com wrote:
 
 By FS I guess you mean file system. 
 
 In that case, if one is that concerned, why not run a single Kafka broker on 
 the same machine, and connect to it over localhost? And disable ZK mode too, 
 perhaps. 
 
 I may be missing something, but I never fully understand why people try 
 really hard to build a stream-to-disk backup approach, when they might be 
 able to couple tightly to Kafka, which, well, just streams to disk. 
 
 Philip
 
 On Nov 28, 2013, at 3:58 PM, Otis Gospodnetic otis.gospodne...@gmail.com 
 wrote:
 
 Hi,
 
 We've done this at Sematext, where we use Kafka in all 3 products/services
 you see in my signature.  When we fail to push a message into Kafka we
 store it in the FS and from there we can process it later.
 
 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/
 
 
 On Thu, Nov 28, 2013 at 9:37 AM, Demian Berjman dberj...@despegar.comwrote:
 
 Hi.
 
 Anyone has build a retry system (durable) for the producer in case the
 kakfa cluster is down? We have certain messages that must be sent because
 there are after a transaction that cannot be undo.
 
 We could set the property message.send.max.retries, but if the producer
 goes down, we lost the messages.
 
 Thanks,
 


Re: kafka producer - retry messages

2013-11-28 Thread Steve Morin
Philip,
  How would do you mirror this to a main Kafka instance?
-Steve

 On Nov 28, 2013, at 16:14, Philip O'Toole phi...@loggly.com wrote:
 
 I should add in our custom producers we buffer in RAM if required, so Kafka 
 can be restarted etc. But I would never code streaming to disk now. I would 
 just run a Kafka instance on the same node. 
 
 Philip
 
 On Nov 28, 2013, at 4:08 PM, Philip O'Toole phi...@loggly.com wrote:
 
 By FS I guess you mean file system. 
 
 In that case, if one is that concerned, why not run a single Kafka broker on 
 the same machine, and connect to it over localhost? And disable ZK mode too, 
 perhaps. 
 
 I may be missing something, but I never fully understand why people try 
 really hard to build a stream-to-disk backup approach, when they might be 
 able to couple tightly to Kafka, which, well, just streams to disk. 
 
 Philip
 
 On Nov 28, 2013, at 3:58 PM, Otis Gospodnetic otis.gospodne...@gmail.com 
 wrote:
 
 Hi,
 
 We've done this at Sematext, where we use Kafka in all 3 products/services
 you see in my signature.  When we fail to push a message into Kafka we
 store it in the FS and from there we can process it later.
 
 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/
 
 
 On Thu, Nov 28, 2013 at 9:37 AM, Demian Berjman 
 dberj...@despegar.comwrote:
 
 Hi.
 
 Anyone has build a retry system (durable) for the producer in case the
 kakfa cluster is down? We have certain messages that must be sent because
 there are after a transaction that cannot be undo.
 
 We could set the property message.send.max.retries, but if the producer
 goes down, we lost the messages.
 
 Thanks,
 


Re: producer exceptions when broker dies

2013-10-25 Thread Steve Morin
Kane and Aniket,
  I am interested in knowing what the pattern/solution that people usually
use to implement exactly once as well.
-Steve


On Fri, Oct 25, 2013 at 11:39 AM, Kane Kane kane.ist...@gmail.com wrote:

 Guozhang, but i've posted a piece from kafka documentation above:
 So effectively Kafka guarantees at-least-once delivery by default and
 allows the user to implement at most once delivery by disabling retries on
 the producer.

 What i want is at-most-once and docs claim it's possible with certain
 settings. Did i miss anything here?


 On Fri, Oct 25, 2013 at 11:35 AM, Guozhang Wang wangg...@gmail.com
 wrote:

  Aniket is exactly right. In general, Kafka provides at least once
  guarantee instead of exactly once.
 
  Guozhang
 
 
  On Fri, Oct 25, 2013 at 11:13 AM, Aniket Bhatnagar 
  aniket.bhatna...@gmail.com wrote:
 
   As per my understanding, if the broker says the msg is committed,  its
   guaranteed to have been committed as per ur ack config. If it says it
 did
   not get committed, then its very hard to figure out if this was just a
   false error. Since there is concept of unique ids for messages, a
 replay
  of
   the same message will result in duplication. I think its a reasonable
   behaviour considering kafka prefers to append data to partitions fot
   performance reasons.
   The best way to right now deal with duplicate msgs is to build the
   processing engine (layer where your consumer sits) to deal with at
 least
   once semantics of the broker.
   On 25 Oct 2013 23:23, Kane Kane kane.ist...@gmail.com wrote:
  
Or, to rephrase it more generally, is there a way to know exactly if
message was committed or no?
   
   
On Fri, Oct 25, 2013 at 10:43 AM, Kane Kane kane.ist...@gmail.com
   wrote:
   
 Hello Guozhang,

 My partitions are split almost evenly between broker, so, yes -
  broker
 that I shutdown is the leader for some of them. Does it mean i can
  get
   an
 exception and data is still being written? Is there any setting on
  the
 broker where i can control this? I.e. can i make broker replication
timeout
 shorter than producer timeout, so i can ensure if i get an
 exception
   data
 is not being committed?

 Thanks.


 On Fri, Oct 25, 2013 at 10:36 AM, Guozhang Wang 
 wangg...@gmail.com
wrote:

 Hello Kane,

 As discussed in the other thread, even if a timeout response is
 sent
back
 to the producer, the message may still be committed.

 Did you shut down the leader broker of the partition or a follower
broker?

 Guozhang

 On Fri, Oct 25, 2013 at 8:45 AM, Kane Kane kane.ist...@gmail.com
 
wrote:

  I have cluster of 3 kafka brokers. With the following script I
  send
some
  data to kafka and in the middle do the controlled shutdown of 1
broker.
 All
  3 brokers are ISR before I start sending. When i shutdown the
   broker i
 get
  a couple of exceptions and I expect data shouldn't be written.
  Say,
   I
 send
  1500 lines and get 50 exceptions. I expect to consume 1450
 lines,
   but
  instead i always consume more, i.e. 1480 or 1490. I want to
 decide
   if
I
  want to retry sending myself, not using
 message.send.max.retries.
   But
 looks
  like if I retry sending if there is an exception - I will end up
   with
  duplicates. Is there anything I'm doing wrong or having wrong
 assumptions
  about kafka?
 
  Thanks.
 
  val prod = new MyProducer(10.80.42.147:9092,10.80.42.154:9092,
  10.80.42.156:9092)
  var count = 0
  for(line - Source.fromFile(file).getLines()){
  try {
prod.send(benchmark, buffer.toList)
count += 1
println(sent %s, count)
  } catch {
case _ = println(Exception!)
  }
  }
 
  class MyProducer(brokerList: String) {
val sync = true
val requestRequiredAcks = -1
 
val props = new Properties()
props.put(metadata.broker.list, brokerList)
props.put(producer.type, if(sync) sync else async)
props.put(request.required.acks, requestRequiredAcks)
props.put(key.serializer.class,
  classOf[StringEncoder].getName)
props.put(serializer.class, classOf[StringEncoder].getName)
props.put(message.send.max.retries, 0)
props.put(request.timeout.ms, 2000)
 
val producer = new Producer[AnyRef, AnyRef](new
ProducerConfig(props))
 
def send(topic: String, messages: List[String]) = {
  val requests = new ArrayBuffer[KeyedMessage[AnyRef, AnyRef]]
  for (message - messages) {
requests += new KeyedMessage(topic, null, message,
 message)
  }
  producer.send(requests)
}
  }
 



 --
 -- Guozhang



   
  
 
 
 
  --
  -- Guozhang
 



Re: Flush configuration per topic

2013-10-17 Thread Steve Morin
Is there only time delay or can it be set to flush for every message with
the obvious performance hit?


On Wed, Oct 16, 2013 at 9:49 AM, Jay Kreps jay.kr...@gmail.com wrote:

 Yes, the change in trunk is that all log configurations are automatically
 available at both the log level and the global default level and can be set
 at topic creation time or changed later without bouncing any servers.

 -Jay


 On Tue, Oct 15, 2013 at 5:47 PM, Simon Hørup Eskildsen
 si...@sirupsen.comwrote:

  Do you mean that it's possible to override log configurations per topic
 in
  trunk?
 
  Yeah, you're right. :-) I wasn't sure what to call it if not consistency,
  even though I know that sort of has another meaning in this context.
 
 
  On Tue, Oct 15, 2013 at 6:53 PM, Jay Kreps jay.kr...@gmail.com wrote:
 
   Yeah, looks like you are right, we don't have the per-topic override in
  0.8
   :-(
  
   All log configurations are overridable in trunk which will be 0.8.1.
  
   Just to be totally clear this setting does not impact consistency (i.e.
  all
   replicas will have the same messages in the same order), nor even
   durability (if you have replication  1), but just recoverability on a
   single server in the event of a hard machine crash.
  
   -Jay
  
  
   On Tue, Oct 15, 2013 at 2:07 PM, Simon Hørup Eskildsen
   si...@sirupsen.comwrote:
  
0.8, we're not on master, but we definitely can be.
   
   
On Tue, Oct 15, 2013 at 5:03 PM, Jay Kreps jay.kr...@gmail.com
  wrote:
   
 Hey Simon,

 What version of Kafka are you using?

 -Jay


 On Tue, Oct 15, 2013 at 9:56 AM, Simon Hørup Eskildsen
 si...@sirupsen.comwrote:

  Hi Kafkas!
 
  Reading through the documentation and code of Kafka, it seems
 there
   is
no
  feature to set flushing interval (messages/time) for a specific
   topic.
I
 am
  interested in this to get consistency for certain topics by
  flushing
 after
  every message, while having eventual consistency for other topics
  (default).
 
  Would there be interest in such feature? Then I might be curious
 to
give
  the Log module a dive and see if I can foster enough Scala to add
   this.
 
  Thanks!
 
  --
  Simon
  http://sirupsen.com/
 

   
   
   
--
Simon
http://sirupsen.com/
   
  
 
 
 
  --
  Simon
  http://sirupsen.com/
 



Re: Kafka and Zookeeper node removal from two nodes Kafka cluster

2013-10-15 Thread Steve Morin
If you have a double broker failure with replication factor of 2 and only
have 2 brokers in the cluster.  Wouldn't every partition be not available?


On Tue, Oct 15, 2013 at 8:48 AM, Jun Rao jun...@gmail.com wrote:

 If you have double broker failures with a replication factor of 2, some
 partitions will not be available. When one of the brokers comes back, the
 partition is made available again (there is potential data loss), but in an
 under replicated mode. After the second broker comes back, it will catch up
 from the other replica and the partition will eventually be fully
 replicated. There is no need to change the replication factor during this
 process.

 As for ZK, you can always use the full connection string. ZK will pick live
 servers to establish connections.

 Thanks,

 Jun


 On Tue, Oct 15, 2013 at 3:46 AM, Monika Garg gargmon...@gmail.com wrote:

  I have 2 nodes kafka cluster with default.replication.factor=2,is set in
  server.properties file.
 
  I removed one node-in removing that node,I killed Kafka process,removed
 all
  the kafka-logs and bundle from that node.
 
  Then I stopped my remaining running node in the cluster and started
  again(default.replication.factor is still set to 2 in this node
  server.properties file).
  I was expecting some error/exception as now I don't have two nodes in my
  cluster.But I didn't get any error/exception and my node successfully
  started and I am able to create topics on it.
 
  So should the default.replication.factor be updated from
  default.replication.factor=2 to default.replication.factor=1 , in the
  remaining running node?
 
  Similarly if there are two external zookeeper
  nodes(zookeeper.connect=host1:port1,host2:port1) in my cluster and now I
  have removed one zookeeper node(host1:port1) from the cluster,So should
 the
  property zookeeper.connect be updated from
  (zookeeper.connect=host1:port1,host2:port1) to
  (zookeeper.connect=host2:port1)?
 
  --
  *Moniii*