username related weirdness with kafka 0.8.1.1 (scala 2.10.1)
hi. as subj. states, I'm trying to use kafka-0.8.1.1 with samza (0.7). Initially the work started as a dev. user, e.g. devuser, and kafka would run as that user, and it's been good. Then, during the deployment attempt, we're trying to run kafka under a different user, kafka. After having that set up, for some reason we get no leader problem. That is the topics are leaderless. This is consistent either using the lib from java OR the command line tools. Our zookeeper comes from cloudera's hadoop (CDH4 and 5 behave the same way). But alas, when run as devuser the problem does not happen, so I don't think the problem is zookeeper's version and we did not enable any zookeeper authentication mechanisms. I thought the communication with both kafka and zookeeper is done via a tcp socket, so as long as the connections occur, and corresponding services can work, there should not be any relation between the username of either consumer, producer or the service. Maybe I'm missing something fundamental, but: Is it supposed to behave like that? I can provide more solid info (like command excerpts, logs, etc. upon request) Thanks in advance, Max.
Re: username related weirdness with kafka 0.8.1.1 (scala 2.10.1)
Off the top of my head sounds like directory permission/ownership issue for data. On Jun 15, 2014 5:05 AM, Max Kovgan m...@fortscale.com wrote: hi. as subj. states, I'm trying to use kafka-0.8.1.1 with samza (0.7). Initially the work started as a dev. user, e.g. devuser, and kafka would run as that user, and it's been good. Then, during the deployment attempt, we're trying to run kafka under a different user, kafka. After having that set up, for some reason we get no leader problem. That is the topics are leaderless. This is consistent either using the lib from java OR the command line tools. Our zookeeper comes from cloudera's hadoop (CDH4 and 5 behave the same way). But alas, when run as devuser the problem does not happen, so I don't think the problem is zookeeper's version and we did not enable any zookeeper authentication mechanisms. I thought the communication with both kafka and zookeeper is done via a tcp socket, so as long as the connections occur, and corresponding services can work, there should not be any relation between the username of either consumer, producer or the service. Maybe I'm missing something fundamental, but: Is it supposed to behave like that? I can provide more solid info (like command excerpts, logs, etc. upon request) Thanks in advance, Max.
Re: username related weirdness with kafka 0.8.1.1 (scala 2.10.1)
Thanks for the response, Joe. in the non-working setup, kafka user had all the permissions for its own directories and log files. I ran normal tests via the command line, as user kafka and all worked. I think my question is: is there any non-AF_INET socket communication between kafka clients and servers ? something like shared memory, or (god forbid) same files access !? On Sun, Jun 15, 2014 at 5:17 PM, Joe Stein joe.st...@stealth.ly wrote: Off the top of my head sounds like directory permission/ownership issue for data. On Jun 15, 2014 5:05 AM, Max Kovgan m...@fortscale.com wrote: hi. as subj. states, I'm trying to use kafka-0.8.1.1 with samza (0.7). Initially the work started as a dev. user, e.g. devuser, and kafka would run as that user, and it's been good. Then, during the deployment attempt, we're trying to run kafka under a different user, kafka. After having that set up, for some reason we get no leader problem. That is the topics are leaderless. This is consistent either using the lib from java OR the command line tools. Our zookeeper comes from cloudera's hadoop (CDH4 and 5 behave the same way). But alas, when run as devuser the problem does not happen, so I don't think the problem is zookeeper's version and we did not enable any zookeeper authentication mechanisms. I thought the communication with both kafka and zookeeper is done via a tcp socket, so as long as the connections occur, and corresponding services can work, there should not be any relation between the username of either consumer, producer or the service. Maybe I'm missing something fundamental, but: Is it supposed to behave like that? I can provide more solid info (like command excerpts, logs, etc. upon request) Thanks in advance, Max. -- [image: cid:image010.png@01CE6B4D.2087AF20] [image: cid:image011.png@01CE6B4D.2087AF20] *MAXIM KOVGAN* DevOps Engineer *www.fortscale.com* http://www.fortscale.com/ US. 1745 Broadway, 17th floor, New York, NY 10019 | Tel: 1 (212) 519-9889 ISRAEL. 22A Raoul Wallenberg St., Tel Aviv 6971918 | Tel: 972 (3) 600-6078
Re: Help is processing huge data through Kafka-storm cluster
what throughput are you getting from your kafka cluster alone?Storm throughput can be dependent on what processing you are actually doing from inside it.so must look at each component starting from kafka first. Regards, Pushkar On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed rnsr.sha...@gmail.com wrote: Hi, Daily we are downloaded 28 Million of messages and Monthly it goes up to 800+ million. We want to process this amount of data through our kafka and storm cluster and would like to store in HBase cluster. We are targeting to process one month of data in one day. Is it possible? We have setup our cluster thinking that we can process million of messages in one sec as mentioned on web. Unfortunately, we have ended-up with processing only 1200-1700 message per second. if we continue with this speed than it will take min 10 days to process 30 days of data, which is the relevant solution in our case. I suspect that we have to change some configuration to achieve this goal. Looking for help from experts to support me in achieving this task. *Kafka Cluster:* Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of storage. We have total 11 nodes kafka cluster spread across these two servers. *Kafka Configuration:* producer.type=async compression.codec=none request.required.acks=-1 serializer.class=kafka.serializer.StringEncoder queue.buffering.max.ms=10 batch.num.messages=1 queue.buffering.max.messages=10 default.replication.factor=3 controlled.shutdown.enable=true auto.leader.rebalance.enable=true num.network.threads=2 num.io.threads=8 num.partitions=4 log.retention.hours=12 log.segment.bytes=536870912 log.retention.check.interval.ms=6 log.cleaner.enable=false *Storm Cluster:* Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB of RAM and 8TB of storage. These servers are shared with hbase cluster. *Kafka spout configuration* kafkaConfig.bufferSizeBytes = 1024*1024*8; kafkaConfig.fetchSizeBytes = 1024*1024*4; kafkaConfig.forceFromStart = true; *Topology: StormTopology* Spout - Partition: 4 First Bolt - parallelism hint: 6 and Num tasks: 5 Second Bolt - parallelism hint: 5 Third Bolt - parallelism hint: 3 Fourth Bolt - parallelism hint: 3 and Num tasks: 4 Fifth Bolt - parallelism hint: 3 Sixth Bolt - parallelism hint: 3 *Supervisor configuration:* storm.local.dir: /app/storm storm.zookeeper.port: 2181 storm.cluster.mode: distributed storm.local.mode.zmq: false supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 supervisor.worker.start.timeout.secs: 180 supervisor.worker.timeout.secs: 30 supervisor.monitor.frequency.secs: 3 supervisor.heartbeat.frequency.secs: 5 supervisor.enable: true storm.messaging.netty.server_worker_threads: 2 storm.messaging.netty.client_worker_threads: 2 storm.messaging.netty.buffer_size: 52428800 #50MB buffer storm.messaging.netty.max_retries: 25 storm.messaging.netty.max_wait_ms: 1000 storm.messaging.netty.min_wait_ms: 100 supervisor.childopts: -Xmx1024m -Djava.net.preferIPv4Stack=true worker.childopts: -Xmx2048m -Djava.net.preferIPv4Stack=true Please let me know if more information needed.. Thanks in advance. Regards, Riyaz
Re: Help is processing huge data through Kafka-storm cluster
and one more thing.using kafka metrices you can easily monitor at what rate you are able to publish on to kafka and what speed your consumer(in this case your spout) is able to drain messages out of kafka.it's possible that due to slowly draining out even publishing rate in worst case might get effected as if consumer lags behind too much then it will result into disk seeks while consuming the older messages. On Sun, Jun 15, 2014 at 8:16 PM, pushkar priyadarshi priyadarshi.push...@gmail.com wrote: what throughput are you getting from your kafka cluster alone?Storm throughput can be dependent on what processing you are actually doing from inside it.so must look at each component starting from kafka first. Regards, Pushkar On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed rnsr.sha...@gmail.com wrote: Hi, Daily we are downloaded 28 Million of messages and Monthly it goes up to 800+ million. We want to process this amount of data through our kafka and storm cluster and would like to store in HBase cluster. We are targeting to process one month of data in one day. Is it possible? We have setup our cluster thinking that we can process million of messages in one sec as mentioned on web. Unfortunately, we have ended-up with processing only 1200-1700 message per second. if we continue with this speed than it will take min 10 days to process 30 days of data, which is the relevant solution in our case. I suspect that we have to change some configuration to achieve this goal. Looking for help from experts to support me in achieving this task. *Kafka Cluster:* Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of storage. We have total 11 nodes kafka cluster spread across these two servers. *Kafka Configuration:* producer.type=async compression.codec=none request.required.acks=-1 serializer.class=kafka.serializer.StringEncoder queue.buffering.max.ms=10 batch.num.messages=1 queue.buffering.max.messages=10 default.replication.factor=3 controlled.shutdown.enable=true auto.leader.rebalance.enable=true num.network.threads=2 num.io.threads=8 num.partitions=4 log.retention.hours=12 log.segment.bytes=536870912 log.retention.check.interval.ms=6 log.cleaner.enable=false *Storm Cluster:* Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB of RAM and 8TB of storage. These servers are shared with hbase cluster. *Kafka spout configuration* kafkaConfig.bufferSizeBytes = 1024*1024*8; kafkaConfig.fetchSizeBytes = 1024*1024*4; kafkaConfig.forceFromStart = true; *Topology: StormTopology* Spout - Partition: 4 First Bolt - parallelism hint: 6 and Num tasks: 5 Second Bolt - parallelism hint: 5 Third Bolt - parallelism hint: 3 Fourth Bolt - parallelism hint: 3 and Num tasks: 4 Fifth Bolt - parallelism hint: 3 Sixth Bolt - parallelism hint: 3 *Supervisor configuration:* storm.local.dir: /app/storm storm.zookeeper.port: 2181 storm.cluster.mode: distributed storm.local.mode.zmq: false supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 supervisor.worker.start.timeout.secs: 180 supervisor.worker.timeout.secs: 30 supervisor.monitor.frequency.secs: 3 supervisor.heartbeat.frequency.secs: 5 supervisor.enable: true storm.messaging.netty.server_worker_threads: 2 storm.messaging.netty.client_worker_threads: 2 storm.messaging.netty.buffer_size: 52428800 #50MB buffer storm.messaging.netty.max_retries: 25 storm.messaging.netty.max_wait_ms: 1000 storm.messaging.netty.min_wait_ms: 100 supervisor.childopts: -Xmx1024m -Djava.net.preferIPv4Stack=true worker.childopts: -Xmx2048m -Djava.net.preferIPv4Stack=true Please let me know if more information needed.. Thanks in advance. Regards, Riyaz
Re: Help is processing huge data through Kafka-storm cluster
Hi Riyaz, There are a number of reasons that you may be getting low performance. Here are some questions to get started: 1. How big are your messages? To meet your throughput requirement you need a minimum of 10K messages per second continuously. You specified a replication factor of 3 so at a message length of 500 bytes (for example) you would need to write a minimum of 15mb/second continuously across both hosts. That is a small amount or a large amount depending on your storage configuration. 2. How did you determine the throughput rate? Is the throughput number end-to-end including Storm and HBase or do you see the low throughput for Kafka itself? In either case can you isolate the rates of ingress and egress to Kafka? Assuming the problem is in Kafka here are some more questions. 3. Are you running VMs? If so what kind and how many CPUs are allocated to each VM? 4. What kind of storage do you have? According to your description you have 11 nodes over two hosts? At the level you are attempting to reach anything less than SSDs or very performant RAID may be an issue due to random I/O. If you have network attached storage this can be a huge bottleneck. 5. What kind of network cards do you have? 6. What kind of stats do you see on the hosts when your tests are running? - What is the I/O wait? Anything above a few percent indicates problems. (Top gives good numbers) - What is the run queue length? CPU starvation could be a problem especially if you have VMs. (Top and uptime give good numbers.) - How much memory is in the OS page cache? This has a big impact on I/O efficiency if you are short of memory. (free -g gives useful numbers) - On a related topic are you reading from storage or are your reads served from memory (iostat should ideally show no reads from storage, only writes, because all reads are served from the OS page cache.) - Are you swapping? 7. What is the memory size for your JVMs and are you using Java 7? Do you have G1 GC enabled according to current Kafka recommendations? 8. Where is zookeeper running? It can be a bottleneck at high transaction rates. 9. How many topics do you have? 10. How many producers do you have and where are they running? 11. How many consumers are you running? I don't know Storm so it's hard to tell from the configuration you have listed how many would run or where they would operate. It seems possible you need to spread processing across more independent hosts but that is a guess pending other information. It is hard to evaluate your Kafka settings without this. Best regards, Robert On Sat, Jun 14, 2014 at 8:14 AM, Shaikh Ahmed rnsr.sha...@gmail.com wrote: Hi, Daily we are downloaded 28 Million of messages and Monthly it goes up to 800+ million. We want to process this amount of data through our kafka and storm cluster and would like to store in HBase cluster. We are targeting to process one month of data in one day. Is it possible? We have setup our cluster thinking that we can process million of messages in one sec as mentioned on web. Unfortunately, we have ended-up with processing only 1200-1700 message per second. if we continue with this speed than it will take min 10 days to process 30 days of data, which is the relevant solution in our case. I suspect that we have to change some configuration to achieve this goal. Looking for help from experts to support me in achieving this task. *Kafka Cluster:* Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of storage. We have total 11 nodes kafka cluster spread across these two servers. *Kafka Configuration:* producer.type=async compression.codec=none request.required.acks=-1 serializer.class=kafka.serializer.StringEncoder queue.buffering.max.ms=10 batch.num.messages=1 queue.buffering.max.messages=10 default.replication.factor=3 controlled.shutdown.enable=true auto.leader.rebalance.enable=true num.network.threads=2 num.io.threads=8 num.partitions=4 log.retention.hours=12 log.segment.bytes=536870912 log.retention.check.interval.ms=6 log.cleaner.enable=false *Storm Cluster:* Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB of RAM and 8TB of storage. These servers are shared with hbase cluster. *Kafka spout configuration* kafkaConfig.bufferSizeBytes = 1024*1024*8; kafkaConfig.fetchSizeBytes = 1024*1024*4; kafkaConfig.forceFromStart = true; *Topology: StormTopology* Spout - Partition: 4 First Bolt - parallelism hint: 6 and Num tasks: 5 Second Bolt - parallelism hint: 5 Third Bolt - parallelism hint: 3 Fourth Bolt - parallelism hint: 3 and Num tasks: 4 Fifth Bolt - parallelism hint: 3 Sixth Bolt - parallelism hint: 3 *Supervisor configuration:* storm.local.dir: /app/storm storm.zookeeper.port: 2181 storm.cluster.mode: distributed storm.local.mode.zmq: false supervisor.slots.ports: - 6700 - 6701 - 6702 -
Re: Help is processing huge data through Kafka-storm cluster
+1 for detailed examination of metrics. You can see the main metrics here: https://kafka.apache.org/documentation.html#monitoring Jconsole is very helpful for looking quickly at what is going on. Cheers, Robert On Sun, Jun 15, 2014 at 7:49 AM, pushkar priyadarshi priyadarshi.push...@gmail.com wrote: and one more thing.using kafka metrices you can easily monitor at what rate you are able to publish on to kafka and what speed your consumer(in this case your spout) is able to drain messages out of kafka.it's possible that due to slowly draining out even publishing rate in worst case might get effected as if consumer lags behind too much then it will result into disk seeks while consuming the older messages. On Sun, Jun 15, 2014 at 8:16 PM, pushkar priyadarshi priyadarshi.push...@gmail.com wrote: what throughput are you getting from your kafka cluster alone?Storm throughput can be dependent on what processing you are actually doing from inside it.so must look at each component starting from kafka first. Regards, Pushkar On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed rnsr.sha...@gmail.com wrote: Hi, Daily we are downloaded 28 Million of messages and Monthly it goes up to 800+ million. We want to process this amount of data through our kafka and storm cluster and would like to store in HBase cluster. We are targeting to process one month of data in one day. Is it possible? We have setup our cluster thinking that we can process million of messages in one sec as mentioned on web. Unfortunately, we have ended-up with processing only 1200-1700 message per second. if we continue with this speed than it will take min 10 days to process 30 days of data, which is the relevant solution in our case. I suspect that we have to change some configuration to achieve this goal. Looking for help from experts to support me in achieving this task. *Kafka Cluster:* Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of storage. We have total 11 nodes kafka cluster spread across these two servers. *Kafka Configuration:* producer.type=async compression.codec=none request.required.acks=-1 serializer.class=kafka.serializer.StringEncoder queue.buffering.max.ms=10 batch.num.messages=1 queue.buffering.max.messages=10 default.replication.factor=3 controlled.shutdown.enable=true auto.leader.rebalance.enable=true num.network.threads=2 num.io.threads=8 num.partitions=4 log.retention.hours=12 log.segment.bytes=536870912 log.retention.check.interval.ms=6 log.cleaner.enable=false *Storm Cluster:* Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB of RAM and 8TB of storage. These servers are shared with hbase cluster. *Kafka spout configuration* kafkaConfig.bufferSizeBytes = 1024*1024*8; kafkaConfig.fetchSizeBytes = 1024*1024*4; kafkaConfig.forceFromStart = true; *Topology: StormTopology* Spout - Partition: 4 First Bolt - parallelism hint: 6 and Num tasks: 5 Second Bolt - parallelism hint: 5 Third Bolt - parallelism hint: 3 Fourth Bolt - parallelism hint: 3 and Num tasks: 4 Fifth Bolt - parallelism hint: 3 Sixth Bolt - parallelism hint: 3 *Supervisor configuration:* storm.local.dir: /app/storm storm.zookeeper.port: 2181 storm.cluster.mode: distributed storm.local.mode.zmq: false supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 supervisor.worker.start.timeout.secs: 180 supervisor.worker.timeout.secs: 30 supervisor.monitor.frequency.secs: 3 supervisor.heartbeat.frequency.secs: 5 supervisor.enable: true storm.messaging.netty.server_worker_threads: 2 storm.messaging.netty.client_worker_threads: 2 storm.messaging.netty.buffer_size: 52428800 #50MB buffer storm.messaging.netty.max_retries: 25 storm.messaging.netty.max_wait_ms: 1000 storm.messaging.netty.min_wait_ms: 100 supervisor.childopts: -Xmx1024m -Djava.net.preferIPv4Stack=true worker.childopts: -Xmx2048m -Djava.net.preferIPv4Stack=true Please let me know if more information needed.. Thanks in advance. Regards, Riyaz
Re: username related weirdness with kafka 0.8.1.1 (scala 2.10.1)
None. What client(s) are you using? Maybe send more logs along for the error and other info about the setup. If you revert back does everything work? On Jun 15, 2014 10:33 AM, Max Kovgan m...@fortscale.com wrote: Thanks for the response, Joe. in the non-working setup, kafka user had all the permissions for its own directories and log files. I ran normal tests via the command line, as user kafka and all worked. I think my question is: is there any non-AF_INET socket communication between kafka clients and servers ? something like shared memory, or (god forbid) same files access !? On Sun, Jun 15, 2014 at 5:17 PM, Joe Stein joe.st...@stealth.ly wrote: Off the top of my head sounds like directory permission/ownership issue for data. On Jun 15, 2014 5:05 AM, Max Kovgan m...@fortscale.com wrote: hi. as subj. states, I'm trying to use kafka-0.8.1.1 with samza (0.7). Initially the work started as a dev. user, e.g. devuser, and kafka would run as that user, and it's been good. Then, during the deployment attempt, we're trying to run kafka under a different user, kafka. After having that set up, for some reason we get no leader problem. That is the topics are leaderless. This is consistent either using the lib from java OR the command line tools. Our zookeeper comes from cloudera's hadoop (CDH4 and 5 behave the same way). But alas, when run as devuser the problem does not happen, so I don't think the problem is zookeeper's version and we did not enable any zookeeper authentication mechanisms. I thought the communication with both kafka and zookeeper is done via a tcp socket, so as long as the connections occur, and corresponding services can work, there should not be any relation between the username of either consumer, producer or the service. Maybe I'm missing something fundamental, but: Is it supposed to behave like that? I can provide more solid info (like command excerpts, logs, etc. upon request) Thanks in advance, Max. -- [image: cid:image010.png@01CE6B4D.2087AF20] [image: cid:image011.png@01CE6B4D.2087AF20] *MAXIM KOVGAN* DevOps Engineer *www.fortscale.com* http://www.fortscale.com/ US. 1745 Broadway, 17th floor, New York, NY 10019 | Tel: 1 (212) 519-9889 ISRAEL. 22A Raoul Wallenberg St., Tel Aviv 6971918 | Tel: 972 (3) 600-6078
Re: LeaderNotAvailableException in 0.8.1.1
yes, I gave it several minutes. On Sat, Jun 14, 2014 at 2:18 PM, Michael G. Noll mich...@michael-noll.com wrote: Have you given Kafka some time to re-elect a new leader for the missing partition when you re-try steps 1-5? See here: If you do, you should be able to go through steps 1-8 without seeing LeaderNotAvailableExceptions (you may need to give Kafka some time to re-elect the remaining, second broker as the new leader for the first broker's partitions though). Best, Michael On 06/12/2014 08:43 PM, Prakash Gowri Shankor wrote: So if we go back to the 2 broker case, I tried your suggestion with replication-factor 2 ./kafka-topics.sh --topic test2 --create --partitions 3 --zookeeper localhost:2181 --replication-factor When i repeat steps 1-5 i still see the exception. When i go to step 8 ( back to 2 brokers ), I dont see it. Here is my topic description: ./kafka-topics.sh --describe --topic test2 --zookeeper localhost:2181 Topic:test2 PartitionCount:3 ReplicationFactor:2 Configs: Topic: test2 Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0 Topic: test2 Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0,1 Topic: test2 Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0 On Wed, Jun 11, 2014 at 3:20 PM, Michael G. Noll michael+st...@michael-noll.com wrote: In your second case (1-broker cluster and putting your laptop to sleep) these exceptions should be transient and disappear after a while. In the logs you should see ZK session expirations (hence the initial/transient exceptions, which in this case are expected and ok), followed by new ZK sessions being established. So this case is (should?) be very different from your case number 1. --Michael On 11.06.2014, at 23:13, Prakash Gowri Shankor prakash.shan...@gmail.com wrote: Thanks for your response Michael. In step 3, I am actually stopping the entire cluster and restarting it without the 2nd broker. But I see your point. When i look in /tmp/kafka-logs-2 ( which is the log dir for the 2nd broker ) I see it holds test2-1 ( ie 1st partition of test2 topic ). For /tmp/kafka-logs ( which is the log dir for the first broker ) I see it holds test2-0 and test2-2 ( 0th and 2nd partition of test2 topic ). So it would seem that kafka is missing the leader for partition 1 and hence throwing the exception on the producer side. Let me try your replication suggestion. While all of the above might explain the exception in the case of 2 brokers, there are still times when I see it with just a single broker. In this case, I start from a normal working cluster with 1 broker only. Then I either put my machine into sleep/hibernation. On wake, I do shutdown the cluster ( for sanity ) and restart. On restart, I start seeing this exception. In this case i only have one broker. I still create the topic the way i described earlier. I understand this is not the ideal production topology, but its annoying to see it during development. Thanks On Wed, Jun 11, 2014 at 1:40 PM, Michael G. Noll mich...@michael-noll.com wrote: Prakash, you are configure the topic with a replication factor of only 1, i.e. no additional replica beyond the original one. This replication setting of 1 means that only one of the two brokers will ever host the (single) replica -- which is implied to also be the leader in-sync replica -- of a given partition. In step 3 you are disabling one of the two brokers. Because this stopped broker is the only broker that hosts one or more of the 3 partitions you configured (I can't tell which partition(s) it is, but you can find out by --describe'ing the topic), your Kafka cluster -- which is now running in degraded state -- will miss the leader of those affected partitions. And because you set the replication factor to 1, the remaining, second broker will not and will never take over the leadership of those partitions from the stopped broker. Hence you will keep getting the LeaderNotAvailableException's until you restart the stopped broker in step 7. So to me it looks as if the behavior of Kafka is actually correct and as expected. If you want to rectify your test setup, try increasing the replication factor from 1 to 2. If you do, you should be able to go through steps 1-8 without seeing LeaderNotAvailableExceptions (you may need to give Kafka some time to re-elect the remaining, second broker as the new leader for the first broker's partitions though). Hope this helps, Michael On 06/11/2014 07:49 PM, Prakash Gowri Shankor wrote: yes, here are the steps: Create topic as : ./kafka-topics.sh --topic test2 --create --partitions 3 --zookeeper localhost:2181 --replication-factor 1 1) Start cluster with 2 brokers, 3 consumers. 2) Dont start any producer 3) Shutdown cluster and disable one broker from starting 4) restart
Use Kafka To Send Files
Hi Sometimes we want use kafka to send files(like textfile,xml...), but I didn't see this in documention. Can kafka use to tansfer files? If can ,how can I do Thanks
Re: Use Kafka To Send Files
We have seen people sending files with tens of MB size like messages (i.e. one file as a single message) using Kafka. Guozhang On Sun, Jun 15, 2014 at 7:41 PM, huqiming0...@live.com wrote: Hi Sometimes we want use kafka to send files(like textfile,xml...), but I didn't see this in documention. Can kafka use to tansfer files? If can ,how can I do Thanks -- -- Guozhang
Re: Use Kafka To Send Files
You also wouldn't have any meta data about the file so I would avoid doing this. On Jun 15, 2014, at 20:51, Mark Roberts wiz...@gmail.com wrote: You would ship the contents of the file across as a message. In general this would mean that your maximum file size must be smaller than your maximum message size. It would generally be a better choice to put a pointer to the file in some shared location on the queue. -Mark On Jun 15, 2014, at 19:41, huqiming0...@live.com wrote: Hi Sometimes we want use kafka to send files(like textfile,xml...), but I didn't see this in documention. Can kafka use to tansfer files? If can ,how can I do Thanks