Re: Consuming Kafka Messages Inside of EC2 Instances
Thank you Dillian and Guozhang for the responses. Yes, Dillian you are understanding my issue correctly. I am not sure what the best approach to this is...I'm not sure if there's a way to whitelist certain IPs, create a VPC, use the cluster launcher as the kafka zookeeper/broker. I guess this is more of an AWS question, but I thought this is a problem some Kafka users must have solved already. Edit: I just tried using the cluster launcher as an intermediate. I started Zookeeper/Kafka Server on my Cluster launcher and then created a topic/produced messages. I set up a kafka consumer on one of my private EC2 instances, but I got a No Route to host error. I pinged the cluster launcher <-> private instance and it works fine. I was hoping I could use this is as a temporary solution...any suggestions on this issue would also be greatly appreciated. Thanks! Best, Su On Wed, Jan 28, 2015 at 9:11 PM, Guozhang Wang wrote: > Su, > > Does this help for your case? > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ > > Guozhang > > On Wed, Jan 28, 2015 at 3:36 PM, Dillian Murphey > wrote: > > > Am I understanding your question correctly... You're asking how do you > > establish connectivity to an instance in a private subnet from the > outside > > world? Are you thinking in terms of zookeeper or just general aws > network > > connectivity? > > > > On Wed, Jan 28, 2015 at 11:03 AM, Su She wrote: > > > > > Hello All, > > > > > > I have set up a cluster of EC2 instances using this method: > > > > > > > > > > > > http://blogs.aws.amazon.com/bigdata/post/Tx2D0J7QOVRJBRX/Deploying-Cloudera-s-Enterprise-Data-Hub-on-AWS > > > > > > As you can see the instances are w/in a private subnet. I was wondering > > if > > > anyone has any advice on how I can set up a Kafka zookeeper/server on > an > > > instance that receives messages from a Kafka Producer outside of the > > > private subnet. I have tried using the cluster launcher, but I feel > like > > it > > > is not a best practice and only a temporary situation. > > > > > > Thank you for the help! > > > > > > Best, > > > > > > Su > > > > > > > > > -- > -- Guozhang >
Re: Error writing to highwatermark file
Hi, Could you check if the specified file: /nfs/trust-machine/machine11/kafka-logs/replication-offset-checkpoint.tmp is not deleted beforehand? Guozhang On Tue, Jan 27, 2015 at 10:54 PM, wan...@act.buaa.edu.cn < wan...@act.buaa.edu.cn> wrote: > Hi, > I use 2 brokers as a cluster, and write data into nfs. > > I encountered this problem several days after starting the brokers : > FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: > (kafka.server.ReplicaManager) > java.io.IOException: File rename from > /nfs/trust-machine/machine11/kafka-logs/replication-offset-checkpoint.tmp > to /nfs/trust-machine/machine11/kafka-logs/replication-offset-checkpoint > failed. > > why? > My kafka version is 0.8.1. > > Thanks! -- -- Guozhang
[VOTE] 0.8.2.0 Candidate 3
This is the third candidate for release of Apache Kafka 0.8.2.0. Release Notes for the 0.8.2.0 release https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/RELEASE_NOTES.html *** Please download, test and vote by Saturday, Jan 31, 11:30pm PT Kafka's KEYS file containing PGP keys we use to sign the release: http://kafka.apache.org/KEYS in addition to the md5, sha1 and sha2 (SHA256) checksum. * Release artifacts to be voted upon (source and binary): https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/ * Maven artifacts to be voted upon prior to release: https://repository.apache.org/content/groups/staging/ * scala-doc https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/scaladoc/ * java-doc https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/javadoc/ * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.0 tag https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=223ac42a7a2a0dab378cc411f4938a9cea1eb7ea (commit 7130da90a9ee9e6fb4beb2a2a6ab05c06c9bfac4) /*** Thanks, Jun
Re: Consuming Kafka Messages Inside of EC2 Instances
Su, Does this help for your case? https://cwiki.apache.org/confluence/display/KAFKA/FAQ Guozhang On Wed, Jan 28, 2015 at 3:36 PM, Dillian Murphey wrote: > Am I understanding your question correctly... You're asking how do you > establish connectivity to an instance in a private subnet from the outside > world? Are you thinking in terms of zookeeper or just general aws network > connectivity? > > On Wed, Jan 28, 2015 at 11:03 AM, Su She wrote: > > > Hello All, > > > > I have set up a cluster of EC2 instances using this method: > > > > > > > http://blogs.aws.amazon.com/bigdata/post/Tx2D0J7QOVRJBRX/Deploying-Cloudera-s-Enterprise-Data-Hub-on-AWS > > > > As you can see the instances are w/in a private subnet. I was wondering > if > > anyone has any advice on how I can set up a Kafka zookeeper/server on an > > instance that receives messages from a Kafka Producer outside of the > > private subnet. I have tried using the cluster launcher, but I feel like > it > > is not a best practice and only a temporary situation. > > > > Thank you for the help! > > > > Best, > > > > Su > > > -- -- Guozhang
Re: Poll RESULTS: Producer/Consumer languages
Thanks for sharing this Otis, I am also quite surprised about Python / Go popularity. At LinkedIn we use a REST proxy server for our non-java clients, but introducing a second hop will also bring more overhead as well as complexities, such as producer acking and offset committing, etc. So I think by the end of the day non-java clients will still be around. Guozhang On Wed, Jan 28, 2015 at 8:54 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > I promised to share the results of this poll, and here they are: > > http://blog.sematext.com/2015/01/28/kafka-poll-results-producer-consumer/ > > List of "surprises" is there. I wonder if anyone else is surprised by any > aspect of the breakdown, or is the breakdown just as you expected? > > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > -- -- Guozhang
Re: How to mark a message as needing to retry in Kafka?
noodles, Without an external mechanism you won't be able to mark individual messages/offsets as needing to be retried at a later time. Guozhang is describing a way to get the offset of a message that's been received so that you can find it later. You would need to save that into a 'failed messages' store somewhere else and have code that looks in there to make retries happen (assuming you want the failure/retry to persist beyond the lifetime of the process). Christian On Wed, Jan 28, 2015 at 7:00 PM, Guozhang Wang wrote: > I see. If you are using the high-level consumer, once the message is > returned to the application it is considered "consumed", and current it is > not supported to "re-wind" to a previously consumed message. > > With the new consumer coming in 0.8.3 release, we have an api for you to > get the offset of each message and do the rewinding based on offsets. For > example, you can do sth. like > > > > message = // get one message from consumer > > try { > // process message > } catch { > consumer.seek(message.offset) > } > > > > Guozhang > > On Wed, Jan 28, 2015 at 6:26 PM, noodles wrote: > > > I did not describe my problem clearly. In my case, I got the message from > > Kakfa, but I could not handle this message because of some reason, for > > example the external server is down. So I want to mark the message as not > > being consumed directly. > > > > 2015-01-28 23:26 GMT+08:00 Guozhang Wang : > > > > > Hi, > > > > > > Which consumer are you using? If you are using a high level consumer > then > > > retry would be automatic upon network exceptions. > > > > > > Guozhang > > > > > > On Wed, Jan 28, 2015 at 1:32 AM, noodles > wrote: > > > > > > > Hi group: > > > > > > > > I'm working for building a webhook notification service based on > > Kafka. I > > > > produce all of the payloads into Kafka, and consumers consume these > > > > payloads by offset. > > > > > > > > Sometimes some payloads cannot be consumed because of network > exception > > > or > > > > http server exception. So I want to mark the failed payloads and > retry > > > them > > > > by timers. But I have no idea if I don't use a storage (like MySQL) > > > except > > > > kafka and zookeeper. > > > > > > > > > > > > -- > > > > *noodles!* > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > > > > > -- > > *Yeah, I'm noodles!* > > > > > > -- > -- Guozhang >
Re: How to mark a message as needing to retry in Kafka?
I did not describe my problem clearly. In my case, I got the message from Kakfa, but I could not handle this message because of some reason, for example the external server is down. So I want to mark the message as not being consumed directly. 2015-01-28 23:26 GMT+08:00 Guozhang Wang : > Hi, > > Which consumer are you using? If you are using a high level consumer then > retry would be automatic upon network exceptions. > > Guozhang > > On Wed, Jan 28, 2015 at 1:32 AM, noodles wrote: > > > Hi group: > > > > I'm working for building a webhook notification service based on Kafka. I > > produce all of the payloads into Kafka, and consumers consume these > > payloads by offset. > > > > Sometimes some payloads cannot be consumed because of network exception > or > > http server exception. So I want to mark the failed payloads and retry > them > > by timers. But I have no idea if I don't use a storage (like MySQL) > except > > kafka and zookeeper. > > > > > > -- > > *noodles!* > > > > > > -- > -- Guozhang > -- *Yeah, I'm noodles!*
Re: How to mark a message as needing to retry in Kafka?
I see. If you are using the high-level consumer, once the message is returned to the application it is considered "consumed", and current it is not supported to "re-wind" to a previously consumed message. With the new consumer coming in 0.8.3 release, we have an api for you to get the offset of each message and do the rewinding based on offsets. For example, you can do sth. like message = // get one message from consumer try { // process message } catch { consumer.seek(message.offset) } Guozhang On Wed, Jan 28, 2015 at 6:26 PM, noodles wrote: > I did not describe my problem clearly. In my case, I got the message from > Kakfa, but I could not handle this message because of some reason, for > example the external server is down. So I want to mark the message as not > being consumed directly. > > 2015-01-28 23:26 GMT+08:00 Guozhang Wang : > > > Hi, > > > > Which consumer are you using? If you are using a high level consumer then > > retry would be automatic upon network exceptions. > > > > Guozhang > > > > On Wed, Jan 28, 2015 at 1:32 AM, noodles wrote: > > > > > Hi group: > > > > > > I'm working for building a webhook notification service based on > Kafka. I > > > produce all of the payloads into Kafka, and consumers consume these > > > payloads by offset. > > > > > > Sometimes some payloads cannot be consumed because of network exception > > or > > > http server exception. So I want to mark the failed payloads and retry > > them > > > by timers. But I have no idea if I don't use a storage (like MySQL) > > except > > > kafka and zookeeper. > > > > > > > > > -- > > > *noodles!* > > > > > > > > > > > -- > > -- Guozhang > > > > > > -- > *Yeah, I'm noodles!* > -- -- Guozhang
Error writing to highwatermark file
Hi, I use 2 brokers as a cluster, and write data into nfs. I encountered this problem several days after starting the brokers : FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: (kafka.server.ReplicaManager) java.io.IOException: File rename from /nfs/trust-machine/machine11/kafka-logs/replication-offset-checkpoint.tmp to /nfs/trust-machine/machine11/kafka-logs/replication-offset-checkpoint failed. why? My kafka version is 0.8.1. Thanks!
Re: Poll: Producer/Consumer impl/language you use?
Yeah Joe is exactly right. Let's not confuse scala apis with the existing Scala clients There are a ton of downsides to those clients. They aren't going away any time in the forceable future, so don't stress, but I think we can kind of "deprecate" them and try to shame people into upgrading. For the Java producer I actually think that API is no worse in Scala than the existing Scala api, so I'm not sure if there is much we can do to improve it for scala, but if there is there would be no harm in adding a scala wrapper. For the new Java consumer there are some Java-isms in the client (e.g. a couple methods take java maps as arguments). I actually think the existing scala implicit conversions for java collections might be totally sufficient but someone would have to try. If not we can add a scala wrapper. I actually think there is room for lots of wrappers that experiment with different styles of data access, especially for the consumer. The reactive people have a bunch of stream related things, Joe mentioned scalaz streams, etc. Actually the iterator api in the existing consumer was an attempt to make stream processing easy (since you can apply some of the scala collections things to the resulting iterator) but I think it wasn't thought all the way through. I think having this simpler, more flexible base api will let people experiment with this stuff in any way they want. -Jay On Wed, Jan 28, 2015 at 10:45 AM, Joe Stein wrote: > I kind of look at the Storm, Spark, Samza, etc integrations as > producers/consumers too. > > Not sure if that maybe was getting lumped also into other. > > I think Jason's 90/10 80/20 70/30 would be found to be typical. > > As far as the Scala API goes, I think we should have a wrapper around the > shiny new Java Consumer. Folks I know use things like scalaz streams > https://github.com/scalaz/scalaz-stream which the new consumer can work > nicely with I think. It would be great if we could come up with a new Scala > layer on top of the Java consumer that we release in the project. One of my > engineers is taking a look at that now unless someone is already working on > that? We are using the partition static assignment in the new consumer and > just using Mesos for handling re-balance for us. When he gets further along > and if it makes sense we will shoot a KIP around people can chat about it > on dev. > > - Joe Stein > > On Wed, Jan 28, 2015 at 10:33 AM, Otis Gospodnetic < > otis.gospodne...@gmail.com> wrote: > > > Good point, Jason. Not sure how we could account for that easily. But > > maybe that is at least a partial explanation of the Java % being under > 50% > > when Java in general is more popular than that... > > > > Otis > > -- > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > On Wed, Jan 28, 2015 at 1:22 PM, Jason Rosenberg > wrote: > > > > > I think the results could be a bit skewed, in cases where an > organization > > > uses multiple languages, but not equally. In our case, we > overwhelmingly > > > use java clients (>90%). But we also have ruby and Go clients too. > But > > in > > > the poll, these come out as equally used client languages. > > > > > > Jason > > > > > > On Wed, Jan 28, 2015 at 12:05 PM, David McNelis < > > > dmcne...@emergingthreats.net> wrote: > > > > > > > I agree with Stephen, it would be really unfortunate to see the Scala > > api > > > > go away. > > > > > > > > On Wed, Jan 28, 2015 at 11:57 AM, Stephen Boesch > > > > wrote: > > > > > > > > > The scala API going away would be a minus. As Koert mentioned we > > could > > > > use > > > > > the java api but it is less .. well .. functional. > > > > > > > > > > Kafka is included in the Spark examples and external modules and is > > > > popular > > > > > as a component of ecosystems on Spark (for which scala is the > primary > > > > > language). > > > > > > > > > > 2015-01-28 8:51 GMT-08:00 Otis Gospodnetic < > > otis.gospodne...@gmail.com > > > >: > > > > > > > > > > > Hi, > > > > > > > > > > > > I don't have a good excuse here. :( > > > > > > I thought about including Scala, but for some reason didn't do > > it. I > > > > see > > > > > > 12-13% of people chose "Other". Do you think that is because I > > > didn't > > > > > > include Scala? > > > > > > > > > > > > Also, is the Scala API reeally going away? > > > > > > > > > > > > Otis > > > > > > -- > > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > > > Management > > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > > On Tue, Jan 20, 2015 at 4:43 PM, Koert Kuipers < > ko...@tresata.com> > > > > > wrote: > > > > > > > > > > > > > no scala? although scala can indeed use the java api, its > > ugly > > > we > > > > > > > prefer to use the scala api (which i believe will go away > > > > > unfortunately) > > > > > > > > > > > > > > On Tue, Jan 20, 2015 at 2:52 PM, Otis G
Re: One or multiple instances of MM to aggregate kafka data to one hadoop
Hi Mingjie I would recommend the first option of running one mirrormaker instance pulling from multiple DC's. A single MM instance will be able to make more efficient use of the machine resources in two ways: 1. You will only have to run one process which will be able to be allocated the full amount of resources 2. Within the process, if you run enough consumer threads, I think that they should be able to rebalance and pick up the load if they don't have anything to do. I'm not 100% sure on this, but 1 still holds. A single MM instance should handle connectivity issues with one DC without affecting the rest of the consumer threads for other DC's. You would gain process isolation running a MM per DC, but this would raise the operational burden and resource requirements. I'm not sure what benefit you'd actually get from process isolation, so I'd recommend against it. However I'd be interested to hear if others do things differently. Daniel. On Thu Jan 29 2015 at 11:14:29 AM Mingjie Lai wrote: > Hi. > > We have a pretty typical data ingestion use case that we use mirrormaker at > one hadoop data center, to mirror kafka data from multiple remote > application data centers. I know mirrormaker can support to consume kafka > data from multiple kafka source, by one instance at one physical node. By > this, we can give one instance of mm multiple consumer config files, so it > can consume data from muti places. > > Another option is to have multiple mirrormaker instances at one node, each > mm instance is dedicated to grab data from one single source data center. > Certainly there will be multiple mm nodes to balance the load. > > The second option looks better since it kind of has an isolation for > different data centers. > > Any recommendation for this kind of data aggregation cases? > > Still new to kafka and mirrormaker. Welcome any information. > > Thanks, > Mingjie >
Re: Consuming Kafka Messages Inside of EC2 Instances
Am I understanding your question correctly... You're asking how do you establish connectivity to an instance in a private subnet from the outside world? Are you thinking in terms of zookeeper or just general aws network connectivity? On Wed, Jan 28, 2015 at 11:03 AM, Su She wrote: > Hello All, > > I have set up a cluster of EC2 instances using this method: > > > http://blogs.aws.amazon.com/bigdata/post/Tx2D0J7QOVRJBRX/Deploying-Cloudera-s-Enterprise-Data-Hub-on-AWS > > As you can see the instances are w/in a private subnet. I was wondering if > anyone has any advice on how I can set up a Kafka zookeeper/server on an > instance that receives messages from a Kafka Producer outside of the > private subnet. I have tried using the cluster launcher, but I feel like it > is not a best practice and only a temporary situation. > > Thank you for the help! > > Best, > > Su >
Re: Resilient Producer
We have been using Flume to solve a very similar usecase. Our servers write the log files to a local file system, and then we have flume agent which ships the data to kafka. Flume you can use as exec source running tail. Though the exec source runs well with tail, there are issues if the agent goes down or the file channel starts building up. If the agent goes down, you can request flume exec tail source to go back n number of lines or read from beginning of the file. The challenge is we roll our log files on a daily basis. What if goes down in the evening. We need to go back to the entire days worth of data for reprocessing which slows down the data flow. We can also go back arbitarily number of lines, but then we dont know what is the right number to go back. This is kind of challenge for us. We have tried spooling directory. Which works, but we need to have a different log file rotation policy. We considered evening going a file rotation for a minute, but it will still affect the real time data flow in our kafka--->storm-->Elastic search pipeline with a minute delay. We are going to do a poc on logstash to see how this solves the problem of flume. On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. wrote: > Hi all, > I'm evaluating using Kafka. > > I liked this thing of Facebook scribe that you log to your own machine and > then there's a separate process that forwards messages to the central > logger. > > With Kafka it seems that I have to embed the publisher in my app, and deal > with any communication problem managing that on the producer side. > > I googled quite a bit trying to find a project that would basically use > daemon that parses a log file and send the lines to the Kafka cluster > (something like a tail file.log but instead of redirecting the output to > the console: send it to kafka) > > Does anyone knows about something like that? > > > Thanks! > Fernando. >
Re: Routing modifications at runtime
Hi Toni, Couple of thoughts. 1. Kafka behaviour need not be changed at run time. Your producers which push your MAC data into kafka should know to which topic it should write. Your producer can be flume, log stash or it can be your own custom written java producer. As long as your producer know which topic to write, they can keep creating new topics as new MAC data comes through your pipeline. On Wed, Jan 28, 2015 at 12:10 PM, Toni Cebrián wrote: > Hi, > > I'm starting to weight different alternatives for data ingestion and > I'd like to know whether Kafka meets the problem I have. > Say we have a set of devices each with its own MAC and then we receive > data in Kafka. There is a dictionary defined elsewhere that says each MAC > to which topic must publish. So I have basically 2 questions: > New MACs keep comming and the dictionary must be updated accordingly. How > could I change this Kafka behaviour during runtime? > A problem for the future. Say that dictionaries are so big that they don't > fit in memory. Are there any patterns for bookkeeping internal data > structures and how route to them? > > T. >
One or multiple instances of MM to aggregate kafka data to one hadoop
Hi. We have a pretty typical data ingestion use case that we use mirrormaker at one hadoop data center, to mirror kafka data from multiple remote application data centers. I know mirrormaker can support to consume kafka data from multiple kafka source, by one instance at one physical node. By this, we can give one instance of mm multiple consumer config files, so it can consume data from muti places. Another option is to have multiple mirrormaker instances at one node, each mm instance is dedicated to grab data from one single source data center. Certainly there will be multiple mm nodes to balance the load. The second option looks better since it kind of has an isolation for different data centers. Any recommendation for this kind of data aggregation cases? Still new to kafka and mirrormaker. Welcome any information. Thanks, Mingjie
Re: Proper Relationship Between Partition and Threads
Thank you very much Christian. That's what I concluded too, I wanted just to double check. Best regards, Ricardo Ferreira On Wed, Jan 28, 2015 at 4:44 PM, Christian Csar wrote: > Ricardo, >The parallelism of each logical consumer (consumer group) is the number > of partitions. So with four partitions it could make sense to have one > logical consumer (application) have two processes on different machines > each with two threads, or one process with four. While with two logical > consumers (two different applications) you would want each to have 4 > threads (4*2 = 8 threads total). > > There are also considerations depending on which consumer code you are > using (which I'm decidedly not someone with good information on) > > Christian > > On Wed, Jan 28, 2015 at 1:28 PM, Ricardo Ferreira < > jricardoferre...@gmail.com> wrote: > > > Hi experts, > > > > I'm newbie in the Kafka world, so excuse me for such basic question. > > > > I'm in the process of designing a client for Kafka, and after few hours > of > > study, I was told that to achieve a proper level of parallelism, it is a > > best practice having one thread for each partition of an topic. > > > > My question is that this rule-of-thumb also applies for multiple consumer > > applications. For instance: > > > > Considering a topic with 4 partitions, it is OK to have one consumer > > application with 4 threads, just like would be OK to have two consumer > > applications with 2 threads each. But what about having two consumer > > applications with 4 threads each? It would break any load-balancing made > by > > Kafka brokers? > > > > Anyway, I'd like to understand if the proper number of threads that > should > > match the number of partitions is per application or if there is some > other > > best practice. > > > > Thanks in advance, > > > > Ricardo Ferreira > > >
Re: Proper Relationship Between Partition and Threads
Ricardo, The parallelism of each logical consumer (consumer group) is the number of partitions. So with four partitions it could make sense to have one logical consumer (application) have two processes on different machines each with two threads, or one process with four. While with two logical consumers (two different applications) you would want each to have 4 threads (4*2 = 8 threads total). There are also considerations depending on which consumer code you are using (which I'm decidedly not someone with good information on) Christian On Wed, Jan 28, 2015 at 1:28 PM, Ricardo Ferreira < jricardoferre...@gmail.com> wrote: > Hi experts, > > I'm newbie in the Kafka world, so excuse me for such basic question. > > I'm in the process of designing a client for Kafka, and after few hours of > study, I was told that to achieve a proper level of parallelism, it is a > best practice having one thread for each partition of an topic. > > My question is that this rule-of-thumb also applies for multiple consumer > applications. For instance: > > Considering a topic with 4 partitions, it is OK to have one consumer > application with 4 threads, just like would be OK to have two consumer > applications with 2 threads each. But what about having two consumer > applications with 4 threads each? It would break any load-balancing made by > Kafka brokers? > > Anyway, I'd like to understand if the proper number of threads that should > match the number of partitions is per application or if there is some other > best practice. > > Thanks in advance, > > Ricardo Ferreira >
Proper Relationship Between Partition and Threads
Hi experts, I'm newbie in the Kafka world, so excuse me for such basic question. I'm in the process of designing a client for Kafka, and after few hours of study, I was told that to achieve a proper level of parallelism, it is a best practice having one thread for each partition of an topic. My question is that this rule-of-thumb also applies for multiple consumer applications. For instance: Considering a topic with 4 partitions, it is OK to have one consumer application with 4 threads, just like would be OK to have two consumer applications with 2 threads each. But what about having two consumer applications with 4 threads each? It would break any load-balancing made by Kafka brokers? Anyway, I'd like to understand if the proper number of threads that should match the number of partitions is per application or if there is some other best practice. Thanks in advance, Ricardo Ferreira
Routing modifications at runtime
Hi, I'm starting to weight different alternatives for data ingestion and I'd like to know whether Kafka meets the problem I have. Say we have a set of devices each with its own MAC and then we receive data in Kafka. There is a dictionary defined elsewhere that says each MAC to which topic must publish. So I have basically 2 questions: New MACs keep comming and the dictionary must be updated accordingly. How could I change this Kafka behaviour during runtime? A problem for the future. Say that dictionaries are so big that they don't fit in memory. Are there any patterns for bookkeeping internal data structures and how route to them? T.
Re: Can't create a topic; can't delete it either
On Tue, Jan 27, 2015 at 10:54 PM, Joel Koshy wrote: > Do you still have the controller and state change logs from the time > you originally tried to delete the topic? > > If you can tell me where the find the logs I can check. I haven't restarted my brokers since the issue. Sumit > On Tue, Jan 27, 2015 at 03:11:48PM -0800, Sumit Rangwala wrote: > > I am using 0.8.2-beta on brokers 0.8.1.1 for client (producer and > > consumers). delete.topic.enable=true on all brokers. replication factor > is > > < number of brokers. I see this issue with just one single topic, all > other > > topics are fine (creation and deletion). Even after a day it is still in > > marked for deletion stage. Let me know what other information from the > > brokers or the zookeepers can help me debug this issue. > > > > On Tue, Jan 27, 2015 at 9:47 AM, Gwen Shapira > wrote: > > > > > Also, do you have delete.topic.enable=true on all brokers? > > > > > > The automatic topic creation can fail if the default number of > > > replicas is greater than number of available brokers. Check the > > > default.replication.factor parameter. > > > > > > Gwen > > > > > > On Tue, Jan 27, 2015 at 12:29 AM, Joel Koshy > wrote: > > > > Which version of the broker are you using? > > > > > > > > On Mon, Jan 26, 2015 at 10:27:14PM -0800, Sumit Rangwala wrote: > > > >> While running kafka in production I found an issue where a topic > wasn't > > > >> getting created even with auto topic enabled. I then went ahead and > > > created > > > >> the topic manually (from the command line). I then delete the topic, > > > again > > > >> manually. Now my broker won't allow me to either create *the* topic > or > > > >> delete *the* topic. (other topic creation and deletion is working > fine). > > > >> > > > >> The topic is in "marked for deletion" stage for more than 3 hours. > > > >> > > > >> $ bin/kafka-topics.sh --zookeeper zookeeper1:2181/replication/kafka > > > --list > > > >> --topic GRIFFIN-TldAdFormat.csv-1422321736886 > > > >> GRIFFIN-TldAdFormat.csv-1422321736886 - marked for deletion > > > >> > > > >> If this is a known issue, is there a workaround? > > > >> > > > >> Sumit > > > > > > > > >
Consuming Kafka Messages Inside of EC2 Instances
Hello All, I have set up a cluster of EC2 instances using this method: http://blogs.aws.amazon.com/bigdata/post/Tx2D0J7QOVRJBRX/Deploying-Cloudera-s-Enterprise-Data-Hub-on-AWS As you can see the instances are w/in a private subnet. I was wondering if anyone has any advice on how I can set up a Kafka zookeeper/server on an instance that receives messages from a Kafka Producer outside of the private subnet. I have tried using the cluster launcher, but I feel like it is not a best practice and only a temporary situation. Thank you for the help! Best, Su
Re: Resilient Producer
The big syslog daemons support Kafka since a while back. rsyslog: http://www.rsyslog.com/doc/master/configuration/modules/omkafka.html syslog-ng: https://czanik.blogs.balabit.com/2015/01/syslog-ng-kafka-destination-support/#more-1013 And Bruce might be of interest aswell: https://github.com/tagged/bruce On the less daemony and more tooly side of things are: https://github.com/fsaintjacques/tail-kafka https://github.com/mguindin/tail-kafka https://github.com/edenhill/kafkacat 2015-01-28 19:47 GMT+01:00 Gwen Shapira : > It sounds like you are describing Flume, with SpoolingDirectory source > (or exec source running tail) and Kafka channel. > > On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. wrote: > > Hi all, > > I'm evaluating using Kafka. > > > > I liked this thing of Facebook scribe that you log to your own machine > and > > then there's a separate process that forwards messages to the central > > logger. > > > > With Kafka it seems that I have to embed the publisher in my app, and > deal > > with any communication problem managing that on the producer side. > > > > I googled quite a bit trying to find a project that would basically use > > daemon that parses a log file and send the lines to the Kafka cluster > > (something like a tail file.log but instead of redirecting the output to > > the console: send it to kafka) > > > > Does anyone knows about something like that? > > > > > > Thanks! > > Fernando. >
Re: Resilient Producer
Logstash -- Colin Clark +1 612 859 6129 Skype colin.p.clark > On Jan 28, 2015, at 10:47 AM, Gwen Shapira wrote: > > It sounds like you are describing Flume, with SpoolingDirectory source > (or exec source running tail) and Kafka channel. > >> On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. wrote: >> Hi all, >>I'm evaluating using Kafka. >> >> I liked this thing of Facebook scribe that you log to your own machine and >> then there's a separate process that forwards messages to the central >> logger. >> >> With Kafka it seems that I have to embed the publisher in my app, and deal >> with any communication problem managing that on the producer side. >> >> I googled quite a bit trying to find a project that would basically use >> daemon that parses a log file and send the lines to the Kafka cluster >> (something like a tail file.log but instead of redirecting the output to >> the console: send it to kafka) >> >> Does anyone knows about something like that? >> >> >> Thanks! >> Fernando.
Re: Resilient Producer
It sounds like you are describing Flume, with SpoolingDirectory source (or exec source running tail) and Kafka channel. On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. wrote: > Hi all, > I'm evaluating using Kafka. > > I liked this thing of Facebook scribe that you log to your own machine and > then there's a separate process that forwards messages to the central > logger. > > With Kafka it seems that I have to embed the publisher in my app, and deal > with any communication problem managing that on the producer side. > > I googled quite a bit trying to find a project that would basically use > daemon that parses a log file and send the lines to the Kafka cluster > (something like a tail file.log but instead of redirecting the output to > the console: send it to kafka) > > Does anyone knows about something like that? > > > Thanks! > Fernando.
Re: Poll: Producer/Consumer impl/language you use?
I kind of look at the Storm, Spark, Samza, etc integrations as producers/consumers too. Not sure if that maybe was getting lumped also into other. I think Jason's 90/10 80/20 70/30 would be found to be typical. As far as the Scala API goes, I think we should have a wrapper around the shiny new Java Consumer. Folks I know use things like scalaz streams https://github.com/scalaz/scalaz-stream which the new consumer can work nicely with I think. It would be great if we could come up with a new Scala layer on top of the Java consumer that we release in the project. One of my engineers is taking a look at that now unless someone is already working on that? We are using the partition static assignment in the new consumer and just using Mesos for handling re-balance for us. When he gets further along and if it makes sense we will shoot a KIP around people can chat about it on dev. - Joe Stein On Wed, Jan 28, 2015 at 10:33 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Good point, Jason. Not sure how we could account for that easily. But > maybe that is at least a partial explanation of the Java % being under 50% > when Java in general is more popular than that... > > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > On Wed, Jan 28, 2015 at 1:22 PM, Jason Rosenberg wrote: > > > I think the results could be a bit skewed, in cases where an organization > > uses multiple languages, but not equally. In our case, we overwhelmingly > > use java clients (>90%). But we also have ruby and Go clients too. But > in > > the poll, these come out as equally used client languages. > > > > Jason > > > > On Wed, Jan 28, 2015 at 12:05 PM, David McNelis < > > dmcne...@emergingthreats.net> wrote: > > > > > I agree with Stephen, it would be really unfortunate to see the Scala > api > > > go away. > > > > > > On Wed, Jan 28, 2015 at 11:57 AM, Stephen Boesch > > > wrote: > > > > > > > The scala API going away would be a minus. As Koert mentioned we > could > > > use > > > > the java api but it is less .. well .. functional. > > > > > > > > Kafka is included in the Spark examples and external modules and is > > > popular > > > > as a component of ecosystems on Spark (for which scala is the primary > > > > language). > > > > > > > > 2015-01-28 8:51 GMT-08:00 Otis Gospodnetic < > otis.gospodne...@gmail.com > > >: > > > > > > > > > Hi, > > > > > > > > > > I don't have a good excuse here. :( > > > > > I thought about including Scala, but for some reason didn't do > it. I > > > see > > > > > 12-13% of people chose "Other". Do you think that is because I > > didn't > > > > > include Scala? > > > > > > > > > > Also, is the Scala API reeally going away? > > > > > > > > > > Otis > > > > > -- > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > > Management > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > On Tue, Jan 20, 2015 at 4:43 PM, Koert Kuipers > > > > wrote: > > > > > > > > > > > no scala? although scala can indeed use the java api, its > ugly > > we > > > > > > prefer to use the scala api (which i believe will go away > > > > unfortunately) > > > > > > > > > > > > On Tue, Jan 20, 2015 at 2:52 PM, Otis Gospodnetic < > > > > > > otis.gospodne...@gmail.com> wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I was wondering which implementations/languages people use for > > > their > > > > > > Kafka > > > > > > > Producer/Consumers not everyone is using the Java APIs. So > > > > here's > > > > > a > > > > > > > 1-question poll: > > > > > > > > > > > > > > > > > > > > > > > http://blog.sematext.com/2015/01/20/kafka-poll-producer-consumer-client/ > > > > > > > > > > > > > > Will share the results in about a week when we have enough > votes. > > > > > > > > > > > > > > Thanks! > > > > > > > Otis > > > > > > > -- > > > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > > > > Management > > > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: Resilient Producer
Something like Heka but lightweight :D On Wed, Jan 28, 2015 at 3:39 PM, Fernando O. wrote: > Hi all, > I'm evaluating using Kafka. > > I liked this thing of Facebook scribe that you log to your own machine and > then there's a separate process that forwards messages to the central > logger. > > With Kafka it seems that I have to embed the publisher in my app, and deal > with any communication problem managing that on the producer side. > > I googled quite a bit trying to find a project that would basically use > daemon that parses a log file and send the lines to the Kafka cluster > (something like a tail file.log but instead of redirecting the output to > the console: send it to kafka) > > Does anyone knows about something like that? > > > Thanks! > Fernando. >
Resilient Producer
Hi all, I'm evaluating using Kafka. I liked this thing of Facebook scribe that you log to your own machine and then there's a separate process that forwards messages to the central logger. With Kafka it seems that I have to embed the publisher in my app, and deal with any communication problem managing that on the producer side. I googled quite a bit trying to find a project that would basically use daemon that parses a log file and send the lines to the Kafka cluster (something like a tail file.log but instead of redirecting the output to the console: send it to kafka) Does anyone knows about something like that? Thanks! Fernando.
Re: Poll: Producer/Consumer impl/language you use?
Good point, Jason. Not sure how we could account for that easily. But maybe that is at least a partial explanation of the Java % being under 50% when Java in general is more popular than that... Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Wed, Jan 28, 2015 at 1:22 PM, Jason Rosenberg wrote: > I think the results could be a bit skewed, in cases where an organization > uses multiple languages, but not equally. In our case, we overwhelmingly > use java clients (>90%). But we also have ruby and Go clients too. But in > the poll, these come out as equally used client languages. > > Jason > > On Wed, Jan 28, 2015 at 12:05 PM, David McNelis < > dmcne...@emergingthreats.net> wrote: > > > I agree with Stephen, it would be really unfortunate to see the Scala api > > go away. > > > > On Wed, Jan 28, 2015 at 11:57 AM, Stephen Boesch > > wrote: > > > > > The scala API going away would be a minus. As Koert mentioned we could > > use > > > the java api but it is less .. well .. functional. > > > > > > Kafka is included in the Spark examples and external modules and is > > popular > > > as a component of ecosystems on Spark (for which scala is the primary > > > language). > > > > > > 2015-01-28 8:51 GMT-08:00 Otis Gospodnetic >: > > > > > > > Hi, > > > > > > > > I don't have a good excuse here. :( > > > > I thought about including Scala, but for some reason didn't do it. I > > see > > > > 12-13% of people chose "Other". Do you think that is because I > didn't > > > > include Scala? > > > > > > > > Also, is the Scala API reeally going away? > > > > > > > > Otis > > > > -- > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > Management > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > On Tue, Jan 20, 2015 at 4:43 PM, Koert Kuipers > > > wrote: > > > > > > > > > no scala? although scala can indeed use the java api, its ugly > we > > > > > prefer to use the scala api (which i believe will go away > > > unfortunately) > > > > > > > > > > On Tue, Jan 20, 2015 at 2:52 PM, Otis Gospodnetic < > > > > > otis.gospodne...@gmail.com> wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I was wondering which implementations/languages people use for > > their > > > > > Kafka > > > > > > Producer/Consumers not everyone is using the Java APIs. So > > > here's > > > > a > > > > > > 1-question poll: > > > > > > > > > > > > > > > > > > http://blog.sematext.com/2015/01/20/kafka-poll-producer-consumer-client/ > > > > > > > > > > > > Will share the results in about a week when we have enough votes. > > > > > > > > > > > > Thanks! > > > > > > Otis > > > > > > -- > > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > > > Management > > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > > > > >
Re: Poll: Producer/Consumer impl/language you use?
I think the results could be a bit skewed, in cases where an organization uses multiple languages, but not equally. In our case, we overwhelmingly use java clients (>90%). But we also have ruby and Go clients too. But in the poll, these come out as equally used client languages. Jason On Wed, Jan 28, 2015 at 12:05 PM, David McNelis < dmcne...@emergingthreats.net> wrote: > I agree with Stephen, it would be really unfortunate to see the Scala api > go away. > > On Wed, Jan 28, 2015 at 11:57 AM, Stephen Boesch > wrote: > > > The scala API going away would be a minus. As Koert mentioned we could > use > > the java api but it is less .. well .. functional. > > > > Kafka is included in the Spark examples and external modules and is > popular > > as a component of ecosystems on Spark (for which scala is the primary > > language). > > > > 2015-01-28 8:51 GMT-08:00 Otis Gospodnetic : > > > > > Hi, > > > > > > I don't have a good excuse here. :( > > > I thought about including Scala, but for some reason didn't do it. I > see > > > 12-13% of people chose "Other". Do you think that is because I didn't > > > include Scala? > > > > > > Also, is the Scala API reeally going away? > > > > > > Otis > > > -- > > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > On Tue, Jan 20, 2015 at 4:43 PM, Koert Kuipers > > wrote: > > > > > > > no scala? although scala can indeed use the java api, its ugly we > > > > prefer to use the scala api (which i believe will go away > > unfortunately) > > > > > > > > On Tue, Jan 20, 2015 at 2:52 PM, Otis Gospodnetic < > > > > otis.gospodne...@gmail.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > I was wondering which implementations/languages people use for > their > > > > Kafka > > > > > Producer/Consumers not everyone is using the Java APIs. So > > here's > > > a > > > > > 1-question poll: > > > > > > > > > > > > > > http://blog.sematext.com/2015/01/20/kafka-poll-producer-consumer-client/ > > > > > > > > > > Will share the results in about a week when we have enough votes. > > > > > > > > > > Thanks! > > > > > Otis > > > > > -- > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > > Management > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > >
question on the mailing list
Hi all, Sorry for asking, but is there some easier way to use the mailing list? Maybe a tool which makes reading and replying to messages more like google groups? I like the hadoop searcher, but the UI on that is really bad. tnx
Re: Poor performance running performance test
You could be right Ewen. I was starting to wonder about the load balancer too. Is using a load balancer a bad idea? How else do users know which kafka broker to connect to? I'm using one of the IPs directly and I don't see that error. I am seeing an occasional connection refused. What the heck. Maybe this is another aws specific thing. OR, I am running kafka brokers in a docker container. I think I will remove the docker component and see if that makes a difference. Thanks for the reply.
Re: Poll: Producer/Consumer impl/language you use?
I agree with Stephen, it would be really unfortunate to see the Scala api go away. On Wed, Jan 28, 2015 at 11:57 AM, Stephen Boesch wrote: > The scala API going away would be a minus. As Koert mentioned we could use > the java api but it is less .. well .. functional. > > Kafka is included in the Spark examples and external modules and is popular > as a component of ecosystems on Spark (for which scala is the primary > language). > > 2015-01-28 8:51 GMT-08:00 Otis Gospodnetic : > > > Hi, > > > > I don't have a good excuse here. :( > > I thought about including Scala, but for some reason didn't do it. I see > > 12-13% of people chose "Other". Do you think that is because I didn't > > include Scala? > > > > Also, is the Scala API reeally going away? > > > > Otis > > -- > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > On Tue, Jan 20, 2015 at 4:43 PM, Koert Kuipers > wrote: > > > > > no scala? although scala can indeed use the java api, its ugly we > > > prefer to use the scala api (which i believe will go away > unfortunately) > > > > > > On Tue, Jan 20, 2015 at 2:52 PM, Otis Gospodnetic < > > > otis.gospodne...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > I was wondering which implementations/languages people use for their > > > Kafka > > > > Producer/Consumers not everyone is using the Java APIs. So > here's > > a > > > > 1-question poll: > > > > > > > > > > http://blog.sematext.com/2015/01/20/kafka-poll-producer-consumer-client/ > > > > > > > > Will share the results in about a week when we have enough votes. > > > > > > > > Thanks! > > > > Otis > > > > -- > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > Management > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > >
Re: Poll: Producer/Consumer impl/language you use?
The scala API going away would be a minus. As Koert mentioned we could use the java api but it is less .. well .. functional. Kafka is included in the Spark examples and external modules and is popular as a component of ecosystems on Spark (for which scala is the primary language). 2015-01-28 8:51 GMT-08:00 Otis Gospodnetic : > Hi, > > I don't have a good excuse here. :( > I thought about including Scala, but for some reason didn't do it. I see > 12-13% of people chose "Other". Do you think that is because I didn't > include Scala? > > Also, is the Scala API reeally going away? > > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > On Tue, Jan 20, 2015 at 4:43 PM, Koert Kuipers wrote: > > > no scala? although scala can indeed use the java api, its ugly we > > prefer to use the scala api (which i believe will go away unfortunately) > > > > On Tue, Jan 20, 2015 at 2:52 PM, Otis Gospodnetic < > > otis.gospodne...@gmail.com> wrote: > > > > > Hi, > > > > > > I was wondering which implementations/languages people use for their > > Kafka > > > Producer/Consumers not everyone is using the Java APIs. So here's > a > > > 1-question poll: > > > > > > > http://blog.sematext.com/2015/01/20/kafka-poll-producer-consumer-client/ > > > > > > Will share the results in about a week when we have enough votes. > > > > > > Thanks! > > > Otis > > > -- > > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > >
Poll RESULTS: Producer/Consumer languages
Hi, I promised to share the results of this poll, and here they are: http://blog.sematext.com/2015/01/28/kafka-poll-results-producer-consumer/ List of "surprises" is there. I wonder if anyone else is surprised by any aspect of the breakdown, or is the breakdown just as you expected? Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: Poll: Producer/Consumer impl/language you use?
Hi, I don't have a good excuse here. :( I thought about including Scala, but for some reason didn't do it. I see 12-13% of people chose "Other". Do you think that is because I didn't include Scala? Also, is the Scala API reeally going away? Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Tue, Jan 20, 2015 at 4:43 PM, Koert Kuipers wrote: > no scala? although scala can indeed use the java api, its ugly we > prefer to use the scala api (which i believe will go away unfortunately) > > On Tue, Jan 20, 2015 at 2:52 PM, Otis Gospodnetic < > otis.gospodne...@gmail.com> wrote: > > > Hi, > > > > I was wondering which implementations/languages people use for their > Kafka > > Producer/Consumers not everyone is using the Java APIs. So here's a > > 1-question poll: > > > > http://blog.sematext.com/2015/01/20/kafka-poll-producer-consumer-client/ > > > > Will share the results in about a week when we have enough votes. > > > > Thanks! > > Otis > > -- > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > Solr & Elasticsearch Support * http://sematext.com/ > > >
WARN Error in I/O with NetworkReceive.readFrom(NetworkReceive.java
Running the performance test. What is the nature of this error?? I'm running a very high end cluster on aws. Tried this even within the same subnet on aws. bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance topic9 5000 100 -1 acks=1 bootstrap.servers=$IP:9092 buffer.memory=67108864 batch.size=8196 2015-01-28 16:32:22,178] WARN Error in I/O with / (org.apache.kafka.common.network.Selector) java.io.EOFException at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62) at org.apache.kafka.common.network.Selector.poll(Selector.java:248) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) at java.lang.Thread.run(Thread.java:745) 65910 records sent, 13158.3 records/sec (1.25 MB/sec), 38525.9 ms avg latency, 39478.0 max latency. Thanks for any ideas
Re: How to mark a message as needing to retry in Kafka?
Hi, Which consumer are you using? If you are using a high level consumer then retry would be automatic upon network exceptions. Guozhang On Wed, Jan 28, 2015 at 1:32 AM, noodles wrote: > Hi group: > > I'm working for building a webhook notification service based on Kafka. I > produce all of the payloads into Kafka, and consumers consume these > payloads by offset. > > Sometimes some payloads cannot be consumed because of network exception or > http server exception. So I want to mark the failed payloads and retry them > by timers. But I have no idea if I don't use a storage (like MySQL) except > kafka and zookeeper. > > > -- > *noodles!* > -- -- Guozhang
How to mark a message as needing to retry in Kafka?
Hi group: I'm working for building a webhook notification service based on Kafka. I produce all of the payloads into Kafka, and consumers consume these payloads by offset. Sometimes some payloads cannot be consumed because of network exception or http server exception. So I want to mark the failed payloads and retry them by timers. But I have no idea if I don't use a storage (like MySQL) except kafka and zookeeper. -- *noodles!*
Re: Poor performance running performance test
That error indicates the broker closed the connection for some reason. Any useful logs from the broker? It looks like you're using ELB, which could also be the culprit. A connection timeout seems doubtful, but ELB can also close connections for other reasons, like failed health checks. -Ewen On Tue, Jan 27, 2015 at 11:21 PM, Dillian Murphey wrote: > I was running the performance command from a virtual box server, so that > seems like it was part of the problem. I'm getting better results running > this on a server on aws, but that's kind of expected. Can you look at > these results, and comment on the occasional warning I see? I appreciate > it! > > 1220375 records sent, 243928.6 records/sec (23.26 MB/sec), 2111.5 ms avg > latency, 4435.0 max latency. > 1195090 records sent, 239018.0 records/sec (22.79 MB/sec), 2203.1 ms avg > latency, 4595.0 max latency. > 1257165 records sent, 251433.0 records/sec (23.98 MB/sec), 2172.6 ms avg > latency, 4525.0 max latency. > 1230981 records sent, 246196.2 records/sec (23.48 MB/sec), 2173.5 ms avg > latency, 4465.0 max latency. > [2015-01-28 07:19:07,274] WARN Error in I/O with > (org.apache.kafka.common.network.Selector) > java.io.EOFException > at > > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62) > at org.apache.kafka.common.network.Selector.poll(Selector.java:248) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) > at java.lang.Thread.run(Thread.java:745) > 1090689 records sent, 218137.8 records/sec (20.80 MB/sec), 2413.6 ms avg > latency, 4829.0 max latency. > > On Tue, Jan 27, 2015 at 7:37 PM, Ewen Cheslack-Postava > wrote: > > > Where are you running ProducerPerformance in relation to ZK and the Kafka > > brokers? You should definitely see much higher performance than this. > > > > A couple of other things I can think of that might be going wrong: Are > all > > your VMs in the same AZ? Are you storing Kafka data in EBS or local > > ephemeral storage? If EBS, have you provisioned enough IOPS. > > > > > > On Tue, Jan 27, 2015 at 4:29 PM, Dillian Murphey < > crackshotm...@gmail.com> > > wrote: > > > > > I'm a new user/admin to kafka. I'm running a 3 node ZK and a 6 brokers > on > > > aws. > > > > > > The performance I'm seeing is shockingly bad. I need some advice! > > > > > > bin/kafka-run-class.sh > org.apache.kafka.clients.tools.ProducerPerformance > > > test2 5000 100 -1 acks=1 bootstrap.servers=5:9092 > > > buffer.memory=67108864 batch.size=8196 > > > > > > > > > > > > > > > 6097 records sent, 13198.3 records/sec (1.26 MB/sec), 2098.0 ms avg > > > latency, 4306.0 max latency. > > > 71695 records sent, 14339.0 records/sec (1.37 MB/sec), 6658.1 ms avg > > > latency, 9053.0 max latency. > > > 65195 records sent, 13028.6 records/sec (1.24 MB/sec), 11504.0 ms avg > > > latency, 13809.0 max latency. > > > 71955 records sent, 14391.0 records/sec (1.37 MB/sec), 16137.4 ms avg > > > latency, 18541.0 max latency. > > > > > > Thanks for any help! > > > > > > > > > > > -- > > Thanks, > > Ewen > > > -- Thanks, Ewen