username related weirdness with kafka 0.8.1.1 (scala 2.10.1)

2014-06-15 Thread Max Kovgan
hi.
as subj. states, I'm trying to use kafka-0.8.1.1 with samza (0.7).
Initially the work started as a dev. user, e.g. devuser, and kafka
would run as that user, and it's been good.

Then, during the deployment attempt, we're trying to run kafka under a
different user,  kafka.
After having that set up, for some reason we get no leader problem.
That is the topics are leaderless.

This is consistent either using the lib from java OR the command line tools.
Our zookeeper comes from cloudera's hadoop (CDH4 and 5 behave the same way).

But alas, when run as devuser the problem does not happen, so I don't
think the problem is zookeeper's version and we did not enable any
zookeeper authentication mechanisms.

I thought the communication with both kafka and zookeeper is done via
a tcp socket, so as long as the connections occur, and corresponding
services can work, there should not be any relation between the
username of either consumer, producer or the service.


Maybe I'm missing something fundamental, but: Is it supposed to behave
like that?

I can provide more solid info (like command excerpts, logs, etc. upon request)



Thanks in advance,
Max.


Re: username related weirdness with kafka 0.8.1.1 (scala 2.10.1)

2014-06-15 Thread Joe Stein
Off the top of my head sounds like directory permission/ownership issue
for data.
On Jun 15, 2014 5:05 AM, Max Kovgan m...@fortscale.com wrote:

 hi.
 as subj. states, I'm trying to use kafka-0.8.1.1 with samza (0.7).
 Initially the work started as a dev. user, e.g. devuser, and kafka
 would run as that user, and it's been good.

 Then, during the deployment attempt, we're trying to run kafka under a
 different user,  kafka.
 After having that set up, for some reason we get no leader problem.
 That is the topics are leaderless.

 This is consistent either using the lib from java OR the command line
 tools.
 Our zookeeper comes from cloudera's hadoop (CDH4 and 5 behave the same
 way).

 But alas, when run as devuser the problem does not happen, so I don't
 think the problem is zookeeper's version and we did not enable any
 zookeeper authentication mechanisms.

 I thought the communication with both kafka and zookeeper is done via
 a tcp socket, so as long as the connections occur, and corresponding
 services can work, there should not be any relation between the
 username of either consumer, producer or the service.


 Maybe I'm missing something fundamental, but: Is it supposed to behave
 like that?

 I can provide more solid info (like command excerpts, logs, etc. upon
 request)



 Thanks in advance,
 Max.



Re: username related weirdness with kafka 0.8.1.1 (scala 2.10.1)

2014-06-15 Thread Max Kovgan
Thanks for the response, Joe.

in the non-working setup, kafka user had all the permissions for its own
directories and log files.
I ran normal tests via the command line, as user kafka and all worked.

I think my question is:
is there any non-AF_INET socket communication between kafka clients and
servers ?
something like shared memory, or (god forbid) same files access !?






On Sun, Jun 15, 2014 at 5:17 PM, Joe Stein joe.st...@stealth.ly wrote:

 Off the top of my head sounds like directory permission/ownership issue
 for data.
 On Jun 15, 2014 5:05 AM, Max Kovgan m...@fortscale.com wrote:

  hi.
  as subj. states, I'm trying to use kafka-0.8.1.1 with samza (0.7).
  Initially the work started as a dev. user, e.g. devuser, and kafka
  would run as that user, and it's been good.
 
  Then, during the deployment attempt, we're trying to run kafka under a
  different user,  kafka.
  After having that set up, for some reason we get no leader problem.
  That is the topics are leaderless.
 
  This is consistent either using the lib from java OR the command line
  tools.
  Our zookeeper comes from cloudera's hadoop (CDH4 and 5 behave the same
  way).
 
  But alas, when run as devuser the problem does not happen, so I don't
  think the problem is zookeeper's version and we did not enable any
  zookeeper authentication mechanisms.
 
  I thought the communication with both kafka and zookeeper is done via
  a tcp socket, so as long as the connections occur, and corresponding
  services can work, there should not be any relation between the
  username of either consumer, producer or the service.
 
 
  Maybe I'm missing something fundamental, but: Is it supposed to behave
  like that?
 
  I can provide more solid info (like command excerpts, logs, etc. upon
  request)
 
 
 
  Thanks in advance,
  Max.
 




-- 


 [image: cid:image010.png@01CE6B4D.2087AF20]


[image: cid:image011.png@01CE6B4D.2087AF20]

*MAXIM KOVGAN*

DevOps Engineer



*www.fortscale.com* http://www.fortscale.com/



US.  1745 Broadway, 17th floor, New York, NY 10019 | Tel: 1 (212) 519-9889
ISRAEL. 22A Raoul Wallenberg St., Tel Aviv 6971918 | Tel: 972 (3) 600-6078


Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread pushkar priyadarshi
what throughput are you getting from your kafka cluster alone?Storm
throughput can be dependent on what processing you are actually doing from
inside it.so must look at each component starting from kafka first.

Regards,
Pushkar


On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed rnsr.sha...@gmail.com wrote:

 Hi,

 Daily we are downloaded 28 Million of messages and Monthly it goes up to
 800+ million.

 We want to process this amount of data through our kafka and storm cluster
 and would like to store in HBase cluster.

 We are targeting to process one month of data in one day. Is it possible?

 We have setup our cluster thinking that we can process million of messages
 in one sec as mentioned on web. Unfortunately, we have ended-up with
 processing only 1200-1700 message per second.  if we continue with this
 speed than it will take min 10 days to process 30 days of data, which is
 the relevant solution in our case.

 I suspect that we have to change some configuration to achieve this goal.
 Looking for help from experts to support me in achieving this task.

 *Kafka Cluster:*
 Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of
 storage. We have total 11 nodes kafka cluster spread across these two
 servers.

 *Kafka Configuration:*
 producer.type=async
 compression.codec=none
 request.required.acks=-1
 serializer.class=kafka.serializer.StringEncoder
 queue.buffering.max.ms=10
 batch.num.messages=1
 queue.buffering.max.messages=10
 default.replication.factor=3
 controlled.shutdown.enable=true
 auto.leader.rebalance.enable=true
 num.network.threads=2
 num.io.threads=8
 num.partitions=4
 log.retention.hours=12
 log.segment.bytes=536870912
 log.retention.check.interval.ms=6
 log.cleaner.enable=false

 *Storm Cluster:*
 Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB
 of RAM and 8TB of storage. These servers are shared with hbase cluster.

 *Kafka spout configuration*
 kafkaConfig.bufferSizeBytes = 1024*1024*8;
 kafkaConfig.fetchSizeBytes = 1024*1024*4;
 kafkaConfig.forceFromStart = true;

 *Topology: StormTopology*
 Spout   - Partition: 4
 First Bolt -  parallelism hint: 6 and Num tasks: 5
 Second Bolt -  parallelism hint: 5
 Third Bolt -   parallelism hint: 3
 Fourth Bolt   -  parallelism hint: 3 and Num tasks: 4
 Fifth Bolt  -  parallelism hint: 3
 Sixth Bolt -  parallelism hint: 3

 *Supervisor configuration:*

 storm.local.dir: /app/storm
 storm.zookeeper.port: 2181
 storm.cluster.mode: distributed
 storm.local.mode.zmq: false
 supervisor.slots.ports:
 - 6700
 - 6701
 - 6702
 - 6703
 supervisor.worker.start.timeout.secs: 180
 supervisor.worker.timeout.secs: 30
 supervisor.monitor.frequency.secs: 3
 supervisor.heartbeat.frequency.secs: 5
 supervisor.enable: true

 storm.messaging.netty.server_worker_threads: 2
 storm.messaging.netty.client_worker_threads: 2
 storm.messaging.netty.buffer_size: 52428800 #50MB buffer
 storm.messaging.netty.max_retries: 25
 storm.messaging.netty.max_wait_ms: 1000
 storm.messaging.netty.min_wait_ms: 100


 supervisor.childopts: -Xmx1024m -Djava.net.preferIPv4Stack=true
 worker.childopts: -Xmx2048m -Djava.net.preferIPv4Stack=true


 Please let me know if more information needed..

 Thanks in advance.

 Regards,
 Riyaz



Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread pushkar priyadarshi
and one more thing.using kafka metrices you can easily monitor at what rate
you are able to publish on to kafka and what speed your consumer(in this
case your spout) is able to drain messages out of kafka.it's possible that
due to slowly draining out even publishing rate in worst case might get
effected as if consumer lags behind too much then it will result into disk
seeks while consuming the older messages.


On Sun, Jun 15, 2014 at 8:16 PM, pushkar priyadarshi 
priyadarshi.push...@gmail.com wrote:

 what throughput are you getting from your kafka cluster alone?Storm
 throughput can be dependent on what processing you are actually doing from
 inside it.so must look at each component starting from kafka first.

 Regards,
 Pushkar


 On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed rnsr.sha...@gmail.com
 wrote:

 Hi,

 Daily we are downloaded 28 Million of messages and Monthly it goes up to
 800+ million.

 We want to process this amount of data through our kafka and storm cluster
 and would like to store in HBase cluster.

 We are targeting to process one month of data in one day. Is it possible?

 We have setup our cluster thinking that we can process million of messages
 in one sec as mentioned on web. Unfortunately, we have ended-up with
 processing only 1200-1700 message per second.  if we continue with this
 speed than it will take min 10 days to process 30 days of data, which is
 the relevant solution in our case.

 I suspect that we have to change some configuration to achieve this goal.
 Looking for help from experts to support me in achieving this task.

 *Kafka Cluster:*
 Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of
 storage. We have total 11 nodes kafka cluster spread across these two
 servers.

 *Kafka Configuration:*
 producer.type=async
 compression.codec=none
 request.required.acks=-1
 serializer.class=kafka.serializer.StringEncoder
 queue.buffering.max.ms=10
 batch.num.messages=1
 queue.buffering.max.messages=10
 default.replication.factor=3
 controlled.shutdown.enable=true
 auto.leader.rebalance.enable=true
 num.network.threads=2
 num.io.threads=8
 num.partitions=4
 log.retention.hours=12
 log.segment.bytes=536870912
 log.retention.check.interval.ms=6
 log.cleaner.enable=false

 *Storm Cluster:*
 Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB
 of RAM and 8TB of storage. These servers are shared with hbase cluster.

 *Kafka spout configuration*
 kafkaConfig.bufferSizeBytes = 1024*1024*8;
 kafkaConfig.fetchSizeBytes = 1024*1024*4;
 kafkaConfig.forceFromStart = true;

 *Topology: StormTopology*
 Spout   - Partition: 4
 First Bolt -  parallelism hint: 6 and Num tasks: 5
 Second Bolt -  parallelism hint: 5
 Third Bolt -   parallelism hint: 3
 Fourth Bolt   -  parallelism hint: 3 and Num tasks: 4
 Fifth Bolt  -  parallelism hint: 3
 Sixth Bolt -  parallelism hint: 3

 *Supervisor configuration:*

 storm.local.dir: /app/storm
 storm.zookeeper.port: 2181
 storm.cluster.mode: distributed
 storm.local.mode.zmq: false
 supervisor.slots.ports:
 - 6700
 - 6701
 - 6702
 - 6703
 supervisor.worker.start.timeout.secs: 180
 supervisor.worker.timeout.secs: 30
 supervisor.monitor.frequency.secs: 3
 supervisor.heartbeat.frequency.secs: 5
 supervisor.enable: true

 storm.messaging.netty.server_worker_threads: 2
 storm.messaging.netty.client_worker_threads: 2
 storm.messaging.netty.buffer_size: 52428800 #50MB buffer
 storm.messaging.netty.max_retries: 25
 storm.messaging.netty.max_wait_ms: 1000
 storm.messaging.netty.min_wait_ms: 100


 supervisor.childopts: -Xmx1024m -Djava.net.preferIPv4Stack=true
 worker.childopts: -Xmx2048m -Djava.net.preferIPv4Stack=true


 Please let me know if more information needed..

 Thanks in advance.

 Regards,
 Riyaz





Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread Robert Hodges
Hi Riyaz,

There are a number of reasons that you may be getting low performance.
 Here are some questions to get started:

1. How big are your messages?  To meet your throughput requirement you need
a minimum of 10K messages per second continuously.  You specified a
replication factor of 3 so at a message length of 500 bytes (for example)
you would need to write a minimum of 15mb/second continuously across both
hosts.  That is a small amount or a large amount depending on your storage
configuration.

2. How did you determine the throughput rate? Is the throughput number
end-to-end including Storm and HBase or do you see the low throughput for
Kafka itself?  In either case can you isolate the rates of ingress and
egress to Kafka?

Assuming the problem is in Kafka here are some more questions.

3. Are you running VMs?  If so what kind and how many CPUs are allocated to
each VM?

4. What kind of storage do you have?  According to your description you
have 11 nodes over two hosts?   At the level you are attempting to reach
anything less than SSDs or very performant RAID may be an issue due to
random I/O. If you have network attached storage this can be a huge
bottleneck.

5. What kind of network cards do you have?

6. What kind of stats do you see on the hosts when your tests are running?

- What is the I/O wait?  Anything above a few percent indicates problems.
 (Top gives good numbers)
- What is the run queue length?  CPU starvation could be a problem
especially if you have VMs.  (Top and uptime give good numbers.)
- How much memory is in the OS page cache?  This has a big impact on I/O
efficiency if you are short of memory.  (free -g gives useful numbers)
- On a related topic are you reading from storage or are your reads served
from memory (iostat should ideally show no reads from storage, only writes,
because all reads are served from the OS page cache.)
- Are you swapping?

7. What is the memory size for your JVMs and are you using Java 7?  Do you
have G1 GC enabled according to current Kafka recommendations?

8. Where is zookeeper running?  It can be a bottleneck at high transaction
rates.

9. How many topics do you have?

10. How many producers do you have and where are they running?

11. How many consumers are you running?  I don't know Storm so it's hard to
tell from the configuration you have listed how many would run or where
they would operate.

It seems possible you need to spread processing across more independent
hosts but that is a guess pending other information.  It is hard to
evaluate your Kafka settings without this.

Best regards, Robert



On Sat, Jun 14, 2014 at 8:14 AM, Shaikh Ahmed rnsr.sha...@gmail.com wrote:

 Hi,

 Daily we are downloaded 28 Million of messages and Monthly it goes up to
 800+ million.

 We want to process this amount of data through our kafka and storm cluster
 and would like to store in HBase cluster.

 We are targeting to process one month of data in one day. Is it possible?

 We have setup our cluster thinking that we can process million of messages
 in one sec as mentioned on web. Unfortunately, we have ended-up with
 processing only 1200-1700 message per second.  if we continue with this
 speed than it will take min 10 days to process 30 days of data, which is
 the relevant solution in our case.

 I suspect that we have to change some configuration to achieve this goal.
 Looking for help from experts to support me in achieving this task.

 *Kafka Cluster:*
 Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of
 storage. We have total 11 nodes kafka cluster spread across these two
 servers.

 *Kafka Configuration:*
 producer.type=async
 compression.codec=none
 request.required.acks=-1
 serializer.class=kafka.serializer.StringEncoder
 queue.buffering.max.ms=10
 batch.num.messages=1
 queue.buffering.max.messages=10
 default.replication.factor=3
 controlled.shutdown.enable=true
 auto.leader.rebalance.enable=true
 num.network.threads=2
 num.io.threads=8
 num.partitions=4
 log.retention.hours=12
 log.segment.bytes=536870912
 log.retention.check.interval.ms=6
 log.cleaner.enable=false

 *Storm Cluster:*
 Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB
 of RAM and 8TB of storage. These servers are shared with hbase cluster.

 *Kafka spout configuration*
 kafkaConfig.bufferSizeBytes = 1024*1024*8;
 kafkaConfig.fetchSizeBytes = 1024*1024*4;
 kafkaConfig.forceFromStart = true;

 *Topology: StormTopology*
 Spout   - Partition: 4
 First Bolt -  parallelism hint: 6 and Num tasks: 5
 Second Bolt -  parallelism hint: 5
 Third Bolt -   parallelism hint: 3
 Fourth Bolt   -  parallelism hint: 3 and Num tasks: 4
 Fifth Bolt  -  parallelism hint: 3
 Sixth Bolt -  parallelism hint: 3

 *Supervisor configuration:*

 storm.local.dir: /app/storm
 storm.zookeeper.port: 2181
 storm.cluster.mode: distributed
 storm.local.mode.zmq: false
 supervisor.slots.ports:
 - 6700
 - 6701
 - 6702
 - 

Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread Robert Hodges
+1 for detailed examination of metrics.  You can see the main metrics here:

https://kafka.apache.org/documentation.html#monitoring

Jconsole is very helpful for looking quickly at what is going on.

Cheers, Robert


On Sun, Jun 15, 2014 at 7:49 AM, pushkar priyadarshi 
priyadarshi.push...@gmail.com wrote:

 and one more thing.using kafka metrices you can easily monitor at what rate
 you are able to publish on to kafka and what speed your consumer(in this
 case your spout) is able to drain messages out of kafka.it's possible that
 due to slowly draining out even publishing rate in worst case might get
 effected as if consumer lags behind too much then it will result into disk
 seeks while consuming the older messages.


 On Sun, Jun 15, 2014 at 8:16 PM, pushkar priyadarshi 
 priyadarshi.push...@gmail.com wrote:

  what throughput are you getting from your kafka cluster alone?Storm
  throughput can be dependent on what processing you are actually doing
 from
  inside it.so must look at each component starting from kafka first.
 
  Regards,
  Pushkar
 
 
  On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed rnsr.sha...@gmail.com
  wrote:
 
  Hi,
 
  Daily we are downloaded 28 Million of messages and Monthly it goes up to
  800+ million.
 
  We want to process this amount of data through our kafka and storm
 cluster
  and would like to store in HBase cluster.
 
  We are targeting to process one month of data in one day. Is it
 possible?
 
  We have setup our cluster thinking that we can process million of
 messages
  in one sec as mentioned on web. Unfortunately, we have ended-up with
  processing only 1200-1700 message per second.  if we continue with this
  speed than it will take min 10 days to process 30 days of data, which is
  the relevant solution in our case.
 
  I suspect that we have to change some configuration to achieve this
 goal.
  Looking for help from experts to support me in achieving this task.
 
  *Kafka Cluster:*
  Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of
  storage. We have total 11 nodes kafka cluster spread across these two
  servers.
 
  *Kafka Configuration:*
  producer.type=async
  compression.codec=none
  request.required.acks=-1
  serializer.class=kafka.serializer.StringEncoder
  queue.buffering.max.ms=10
  batch.num.messages=1
  queue.buffering.max.messages=10
  default.replication.factor=3
  controlled.shutdown.enable=true
  auto.leader.rebalance.enable=true
  num.network.threads=2
  num.io.threads=8
  num.partitions=4
  log.retention.hours=12
  log.segment.bytes=536870912
  log.retention.check.interval.ms=6
  log.cleaner.enable=false
 
  *Storm Cluster:*
  Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48
 GB
  of RAM and 8TB of storage. These servers are shared with hbase cluster.
 
  *Kafka spout configuration*
  kafkaConfig.bufferSizeBytes = 1024*1024*8;
  kafkaConfig.fetchSizeBytes = 1024*1024*4;
  kafkaConfig.forceFromStart = true;
 
  *Topology: StormTopology*
  Spout   - Partition: 4
  First Bolt -  parallelism hint: 6 and Num tasks: 5
  Second Bolt -  parallelism hint: 5
  Third Bolt -   parallelism hint: 3
  Fourth Bolt   -  parallelism hint: 3 and Num tasks: 4
  Fifth Bolt  -  parallelism hint: 3
  Sixth Bolt -  parallelism hint: 3
 
  *Supervisor configuration:*
 
  storm.local.dir: /app/storm
  storm.zookeeper.port: 2181
  storm.cluster.mode: distributed
  storm.local.mode.zmq: false
  supervisor.slots.ports:
  - 6700
  - 6701
  - 6702
  - 6703
  supervisor.worker.start.timeout.secs: 180
  supervisor.worker.timeout.secs: 30
  supervisor.monitor.frequency.secs: 3
  supervisor.heartbeat.frequency.secs: 5
  supervisor.enable: true
 
  storm.messaging.netty.server_worker_threads: 2
  storm.messaging.netty.client_worker_threads: 2
  storm.messaging.netty.buffer_size: 52428800 #50MB buffer
  storm.messaging.netty.max_retries: 25
  storm.messaging.netty.max_wait_ms: 1000
  storm.messaging.netty.min_wait_ms: 100
 
 
  supervisor.childopts: -Xmx1024m -Djava.net.preferIPv4Stack=true
  worker.childopts: -Xmx2048m -Djava.net.preferIPv4Stack=true
 
 
  Please let me know if more information needed..
 
  Thanks in advance.
 
  Regards,
  Riyaz
 
 
 



Re: username related weirdness with kafka 0.8.1.1 (scala 2.10.1)

2014-06-15 Thread Joe Stein
None. What client(s) are you using? Maybe send more logs along for the
error and other info about the setup.  If you revert back does everything
work?
 On Jun 15, 2014 10:33 AM, Max Kovgan m...@fortscale.com wrote:

 Thanks for the response, Joe.

 in the non-working setup, kafka user had all the permissions for its own
 directories and log files.
 I ran normal tests via the command line, as user kafka and all worked.

 I think my question is:
 is there any non-AF_INET socket communication between kafka clients and
 servers ?
 something like shared memory, or (god forbid) same files access !?






 On Sun, Jun 15, 2014 at 5:17 PM, Joe Stein joe.st...@stealth.ly wrote:

 Off the top of my head sounds like directory permission/ownership issue
 for data.
 On Jun 15, 2014 5:05 AM, Max Kovgan m...@fortscale.com wrote:

  hi.
  as subj. states, I'm trying to use kafka-0.8.1.1 with samza (0.7).
  Initially the work started as a dev. user, e.g. devuser, and kafka
  would run as that user, and it's been good.
 
  Then, during the deployment attempt, we're trying to run kafka under a
  different user,  kafka.
  After having that set up, for some reason we get no leader problem.
  That is the topics are leaderless.
 
  This is consistent either using the lib from java OR the command line
  tools.
  Our zookeeper comes from cloudera's hadoop (CDH4 and 5 behave the same
  way).
 
  But alas, when run as devuser the problem does not happen, so I don't
  think the problem is zookeeper's version and we did not enable any
  zookeeper authentication mechanisms.
 
  I thought the communication with both kafka and zookeeper is done via
  a tcp socket, so as long as the connections occur, and corresponding
  services can work, there should not be any relation between the
  username of either consumer, producer or the service.
 
 
  Maybe I'm missing something fundamental, but: Is it supposed to behave
  like that?
 
  I can provide more solid info (like command excerpts, logs, etc. upon
  request)
 
 
 
  Thanks in advance,
  Max.
 




 --


  [image: cid:image010.png@01CE6B4D.2087AF20]


 [image: cid:image011.png@01CE6B4D.2087AF20]

 *MAXIM KOVGAN*

 DevOps Engineer



 *www.fortscale.com* http://www.fortscale.com/



 US.  1745 Broadway, 17th floor, New York, NY 10019 | Tel: 1 (212) 519-9889
 ISRAEL. 22A Raoul Wallenberg St., Tel Aviv 6971918 | Tel: 972 (3) 600-6078



Re: LeaderNotAvailableException in 0.8.1.1

2014-06-15 Thread Prakash Gowri Shankor
yes, I gave it several minutes.


On Sat, Jun 14, 2014 at 2:18 PM, Michael G. Noll mich...@michael-noll.com
wrote:

 Have you given Kafka some time to re-elect a new leader for the
 missing partition when you re-try steps 1-5?

 See here:
  If you do, you should be able to go through steps
  1-8 without seeing LeaderNotAvailableExceptions (you may need to give
  Kafka some time to re-elect the remaining, second broker as the new
  leader for the first broker's partitions though).

 Best,
 Michael



 On 06/12/2014 08:43 PM, Prakash Gowri Shankor wrote:
  So if we go back to the 2 broker case, I tried your suggestion with
  replication-factor 2
 
  ./kafka-topics.sh  --topic test2 --create  --partitions 3 --zookeeper
  localhost:2181 --replication-factor
 
  When i repeat steps 1-5 i still see the exception. When i go to step 8 (
  back to 2 brokers ), I dont see it.
  Here is my topic description:
 
  ./kafka-topics.sh --describe --topic test2 --zookeeper localhost:2181
 
  Topic:test2 PartitionCount:3 ReplicationFactor:2 Configs:
 
  Topic: test2 Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0
 
  Topic: test2 Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0,1
 
  Topic: test2 Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0
 
 
  On Wed, Jun 11, 2014 at 3:20 PM, Michael G. Noll 
  michael+st...@michael-noll.com wrote:
 
  In your second case (1-broker cluster and putting your laptop to sleep)
  these exceptions should be transient and disappear after a while.
 
  In the logs you should see ZK session expirations (hence the
  initial/transient exceptions, which in this case are expected and ok),
  followed by new ZK sessions being established.
 
  So this case is (should?) be very different from your case number 1.
 
  --Michael
 
 
  On 11.06.2014, at 23:13, Prakash Gowri Shankor 
  prakash.shan...@gmail.com wrote:
 
  Thanks for your response Michael.
 
  In step 3, I am actually stopping the entire cluster and restarting it
  without the 2nd broker. But I see your point. When i look in
  /tmp/kafka-logs-2 ( which is the log dir for the 2nd broker ) I see it
  holds test2-1 ( ie 1st partition of test2 topic ).
  For /tmp/kafka-logs ( which is the log dir for the first broker ) I see
  it
  holds test2-0 and test2-2 ( 0th and 2nd partition of test2 topic ).
  So it would seem that kafka is missing the leader for partition 1 and
  hence
  throwing the exception on the producer side.
  Let me try your replication suggestion.
 
  While all of the above might explain the exception in the case of 2
  brokers, there are still times when I see it with just a single broker.
  In this case, I start from a normal working cluster with 1 broker only.
  Then I either put my machine into sleep/hibernation. On wake, I do
  shutdown
  the cluster ( for sanity ) and restart.
  On restart, I start seeing this exception. In this case i only have one
  broker. I still create the topic the way i described earlier.
  I understand this is not the ideal production topology, but its
 annoying
  to
  see it during development.
 
  Thanks
 
 
  On Wed, Jun 11, 2014 at 1:40 PM, Michael G. Noll 
  mich...@michael-noll.com
  wrote:
 
  Prakash,
 
  you are configure the topic with a replication factor of only 1, i.e.
 no
  additional replica beyond the original one.  This replication
 setting
  of 1 means that only one of the two brokers will ever host the
 (single)
  replica -- which is implied to also be the leader in-sync replica --
 of
  a given partition.
 
  In step 3 you are disabling one of the two brokers.  Because this
  stopped broker is the only broker that hosts one or more of the 3
  partitions you configured (I can't tell which partition(s) it is, but
  you can find out by --describe'ing the topic), your Kafka cluster --
  which is now running in degraded state -- will miss the leader of
 those
  affected partitions.  And because you set the replication factor to 1,
  the remaining, second broker will not and will never take over the
  leadership of those partitions from the stopped broker.  Hence you
 will
  keep getting the LeaderNotAvailableException's until you restart the
  stopped broker in step 7.
 
  So to me it looks as if the behavior of Kafka is actually correct and
 as
  expected.
 
  If you want to rectify your test setup, try increasing the
 replication
  factor from 1 to 2.  If you do, you should be able to go through steps
  1-8 without seeing LeaderNotAvailableExceptions (you may need to give
  Kafka some time to re-elect the remaining, second broker as the new
  leader for the first broker's partitions though).
 
  Hope this helps,
  Michael
 
 
 
  On 06/11/2014 07:49 PM, Prakash Gowri Shankor wrote:
  yes,
  here are the steps:
 
  Create topic as : ./kafka-topics.sh  --topic test2 --create
  --partitions 3
  --zookeeper localhost:2181 --replication-factor 1
 
  1) Start cluster with 2 brokers, 3 consumers.
  2) Dont start any producer
  3) Shutdown cluster and disable one broker from starting
  4) restart 

Use Kafka To Send Files

2014-06-15 Thread huqiming0403
Hi Sometimes we want use kafka to send files(like textfile,xml...), but I 
didn't see this in documention.
 Can kafka use to tansfer files? If can ,how can I do
Thanks

Re: Use Kafka To Send Files

2014-06-15 Thread Guozhang Wang
We have seen people sending files with tens of MB size like messages (i.e.
one file as a single message) using Kafka.

Guozhang

On Sun, Jun 15, 2014 at 7:41 PM, huqiming0...@live.com wrote:

 Hi Sometimes we want use kafka to send files(like textfile,xml...),
 but I didn't see this in documention.
  Can kafka use to tansfer files? If can ,how can I do
 Thanks




-- 
-- Guozhang


Re: Use Kafka To Send Files

2014-06-15 Thread Steve Morin
You also wouldn't have any meta data about the file so I would avoid doing this.

 On Jun 15, 2014, at 20:51, Mark Roberts wiz...@gmail.com wrote:
 
 You would ship the contents of the file across as a message. In general this 
 would mean that your maximum file size must be smaller than your maximum 
 message size. It would generally be a better choice to put a pointer to the 
 file in some shared location on the queue.
 
 -Mark
 
 On Jun 15, 2014, at 19:41, huqiming0...@live.com wrote:
 
 Hi Sometimes we want use kafka to send files(like textfile,xml...), but 
 I didn't see this in documention.
Can kafka use to tansfer files? If can ,how can I do
   Thanks