Reading messages offset in Apache Kafka
I am very much new to Kafka and we are using Kafka 0.8.1. What I need to do is to consume a message from topic. For that, I will have to write one consumer in Java which will consume a message from topic and then save that message to database. After a message is saved, some acknowledgement will be sent to Java consumer. If acknowledgement is true, then next message should be consumed from the topic. If acknowldgement is false(which means due to some error message,read from the topic, couldn't be saved into the database), then again that message should be read. I think I need to use Simple Consumer,to have control over message offset and have gone through the Simple Consumer example as given in this link https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example . In this example, offset is evaluated in run method as 'readOffset'. Do I need to play with that? For e.g. I can use LatestTime() instead of EarliestTime() and in case of false, I will reset the offset to the one before using offset - 1. Is this how I should proceed? Or can the same be done using High Level API?
Delete message after consuming it
I want to delete the message from a Kafka broker after consuming it(Java consumer). How can I do that?
Re: Delete message after consuming it
Kafka is a log and not a queue. The client is remembering a position in the log rather than working with individual messages. On Fri, Aug 1, 2014 at 4:02 PM, anand jain anandjain1...@gmail.com wrote: I want to delete the message from a Kafka broker after consuming it(Java consumer). How can I do that? -- “ The difference between ramen and varelse is not in the creature judged, but in the creature judging. When we declare an alien species to be ramen, it does not mean that *they* have passed a threshold of moral maturity. It means that *we* have. —Demosthenes, *Letter to the Framlings* ”
Data is been written in only 1 partition
HI all! I think I already saw this question on the mailing list, but I'm not able to find it back... I'm using kafka 0.8.1.1, i have 3 brokers and I have a default replication factor of 2 and a default partitioning factor of 2. My partition are distributed fairly on every brokers. My problem is that for all the topics I have, my data is only send to either the partition 0 or the partition 1 (and correctly replicated). All my brokers are in sync and the data is on every brokers (depending on where the partitions are). How can I made my producer / brokers write to the other partition?? Thanks! François Langelier Étudiant en génie Logiciel - École de Technologie Supérieure http://www.etsmtl.ca/ Capitaine Club Capra http://capra.etsmtl.ca/ VP-Communication - CS Games http://csgames.org 2014 Jeux de Génie http://www.jdgets.com/ 2011 à 2014 Argentier Fraternité du Piranha http://fraternitedupiranha.com/ 2012-2014 Comité Organisateur Olympiades ÉTS 2012 Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
Request: Adding us to the Powered By list
Dear Kafka team, Would you mind add us @ https://cwiki.apache.org/confluence/display/KAFKA/Powered+By ? We're using it as part of our ticket sequencing system for our helpdesk software. -- *Vitaliy Verbenko - Business Development at Helprace * vitaliy.verbe...@helprace.com Customer Service Software and Help Desk by Helprace http://helprace.com/help-desk-software
Re: Consume more than produce
You have to remember statsd uses udp and possibly lossy which might account for the errors. -Steve On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg guy.doulb...@perion.com wrote: Hey, After a year or so I have Kafka as my streaming layer in my production, I decided it is time to audit, and to test how many events do I lose, if I lose events at all. I discovered something interesting which I can't explain. The producer produces less events that the consumer group consumes. It is not much more, it is about 0.1% more events I use the Consumer API (not the simple consumer API) I was thinking I might had rebalancing going on in my system, but it doesn't look like that. Did anyone see such a behaviour In order to audit, I calculated for each event the minute it arrived, and assigned this value to the event, I used statsd do to count all events from all my producer cluster, and all consumer group cluster. I must say that it is not a happening for every minute, Thanks, Guy
Re: Most common kafka client comsumer implementations?
Thanks Guozhang, I was looking for actual real world workflows. I realize you can commit after each message but if you’re using ZK for offsets for instance you’ll put too much write load on the nodes and crush your throughput. So I was interested in batching strategies people have used that balance high/full throughput and fully committed events. On Thu, Jul 31, 2014 at 8:16 AM, Guozhang Wang wangg...@gmail.com wrote: Hi Jim, Whether to use high level or simple consumer depends on your use case. If you need to manually manage partition assignments among your consumers, or you need to commit your offsets elsewhere than ZK, or you do not want auto rebalancing of consumers upon failures etc, you will use simple consumers; otherwise you use high level consumer. From your description of pulling a batch of messages it seems you are currently using the simple consumer. Suppose you are using the high level consumer, to achieve at-lease-once basically you can do sth like: message = consumer.iter.next() process(message) consumer.commit() which is effectively the same as option 2 for using a simple consumer. Of course, doing so has a heavy overhead of one-commit-per-message, you can also do option 1, by the cost of duplicates, which is tolerable for at-least-once. Guozhang On Wed, Jul 30, 2014 at 8:25 PM, Jim jimi...@gmail.com wrote: Curious on a couple questions... Are most people(are you?) using the simple consumer vs the high level consumer in production? What is the common processing paradigm for maintaining a full pipeline for kafka consumers for at-least-once messaging? E.g. you pull a batch of 1000 messages and: option 1. you wait for the slowest worker to finish working on that message, when you get back 1000 acks internally you commit your offset and pull another batch option 2. you feed your workers n msgs at a time in sequence and move your offset up as you work through your batch option 3. you maintain a full stream of 1000 messages ideally and as you get acks back from your workers you see if you can move your offset up in the stream to pull n more messages to fill up your pipeline so you're not blocked by the slowest consumer (probability wise) any good docs or articles on the subject would be great, thanks! -- -- Guozhang
Re: find topic partition count through simpleclient api
One seed broker should be enough, and the the number of partitionMetadata should be the same as num. of partitions. One note here is that the metadata is propagated asynchronously to the brokers, and hence the metadata returned by any broker may be stale by small chances, so you need to periodically pull the metadata. Guozhang On Fri, Aug 1, 2014 at 8:46 AM, Weide Zhang weo...@gmail.com wrote: Hi, What's the way to find a topic's partition count dynamically using simpleconsumer api ? If I use one seed broker within a cluster of 10 brokers, and add list of topic name into the simple consumer request to find topics' metadata, when it returns, is the size of partitionsMetadata per topicmetadata same as the number of partitions for a given topic ? Also, for retrieval, do I need to have more than 1 seed broker to get all metadata info of a topic? Is only 1 seed broker enough ? Thanks, Weide -- -- Guozhang
Re: Consume more than produce
Do you have producer retries (due to broker failure) in those minutes when you see a diff? Thanks, Jun On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg guy.doulb...@perion.com wrote: Hey, After a year or so I have Kafka as my streaming layer in my production, I decided it is time to audit, and to test how many events do I lose, if I lose events at all. I discovered something interesting which I can't explain. The producer produces less events that the consumer group consumes. It is not much more, it is about 0.1% more events I use the Consumer API (not the simple consumer API) I was thinking I might had rebalancing going on in my system, but it doesn't look like that. Did anyone see such a behaviour In order to audit, I calculated for each event the minute it arrived, and assigned this value to the event, I used statsd do to count all events from all my producer cluster, and all consumer group cluster. I must say that it is not a happening for every minute, Thanks, Guy
Re: Request: Adding us to the Powered By list
Sure, can you give me the blurb you want? -Jay On Fri, Aug 1, 2014 at 6:58 AM, Vitaliy Verbenko vitaliy.verbe...@helprace.com wrote: Dear Kafka team, Would you mind add us @ https://cwiki.apache.org/confluence/display/KAFKA/Powered+By ? We're using it as part of our ticket sequencing system for our helpdesk software. -- *Vitaliy Verbenko - Business Development at Helprace * vitaliy.verbe...@helprace.com Customer Service Software and Help Desk by Helprace http://helprace.com/help-desk-software
Updated Kafka Roadmap?
Howdy, I was wondering if it would be possible to update the release plan: https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan aligned with the feature roadmap: https://cwiki.apache.org/confluence/display/KAFKA/Index We have several active projects actively and planning to use Kafka, and any current guidance on the new releases related to ZK dependence, producer and consumer API/client timing would be very helpful. For example, is 0.8.2 possible in August, or is September likely? Also, any chance something like: https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer …might make it into 0.9? Thanks!
Re: Updated Kafka Roadmap?
I too could benefit from an updated roadmap. We're in a similar situation where some components in our stream processing stack could use an overhaul, but I'm waiting for the offset API to be fully realized before doing any meaningful planning. On Fri, Aug 1, 2014 at 11:52 AM, Jonathan Weeks jonathanbwe...@gmail.com wrote: Howdy, I was wondering if it would be possible to update the release plan: https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan aligned with the feature roadmap: https://cwiki.apache.org/confluence/display/KAFKA/Index We have several active projects actively and planning to use Kafka, and any current guidance on the new releases related to ZK dependence, producer and consumer API/client timing would be very helpful. For example, is 0.8.2 possible in August, or is September likely? Also, any chance something like: https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer …might make it into 0.9? Thanks!
Re: kafka consumer fail over
Hello Weide, That should be doable via high-level consumer, you can take a look at this page: https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example Guozhang On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang weo...@gmail.com wrote: Hi, I have a use case for a master slave cluster where the logic inside master need to consume data from kafka and publish some aggregated data to kafka again. When master dies, slave need to take the latest committed offset from master and continue consuming the data from kafka and doing the push. My questions is what will be easiest kafka consumer design for this scenario to work ? I was thinking about using simpleconsumer and doing manual consumer offset syncing between master and slave. That seems to solve the problem but I was wondering if it can be achieved by using high level consumer client ? Thanks, Weide -- -- Guozhang