Reading messages offset in Apache Kafka

2014-08-01 Thread anand jain
I am very much new to Kafka and we are using Kafka 0.8.1.

What I need to do is to consume a message from topic. For that, I will have
to write one consumer in Java which will consume a message from topic and
then save that message to database. After a message is saved, some
acknowledgement will be sent to Java consumer. If acknowledgement is true,
then next message should be consumed from the topic. If acknowldgement is
false(which means due to some error message,read from the topic, couldn't
be saved into the database), then again that message should be read.

I think I need to use Simple Consumer,to have control over message offset
and have gone through the Simple Consumer example as given in this link
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
.

In this example, offset is evaluated in run method as 'readOffset'. Do I
need to play with that? For e.g. I can use LatestTime() instead of
EarliestTime() and in case of false, I will reset the offset to the one
before using offset - 1.

Is this how I should proceed? Or can the same be done using High Level API?


Delete message after consuming it

2014-08-01 Thread anand jain
I want to delete the message from a Kafka broker after consuming it(Java
consumer). How can I do that?


Re: Delete message after consuming it

2014-08-01 Thread Kashyap Paidimarri
Kafka is a log and not a queue. The client is remembering a position in the
log rather than working with individual messages.


On Fri, Aug 1, 2014 at 4:02 PM, anand jain anandjain1...@gmail.com wrote:

 I want to delete the message from a Kafka broker after consuming it(Java
 consumer). How can I do that?




-- 
“ The difference between ramen and varelse is not in the creature judged,
but in the creature judging. When we declare an alien species to be ramen,
it does not mean that *they* have passed a threshold of moral maturity. It
means that *we* have.

—Demosthenes, *Letter to the Framlings*
”


Data is been written in only 1 partition

2014-08-01 Thread François Langelier
HI all!

I think I already saw this question on the mailing list, but I'm not able
to find it back...

I'm using kafka 0.8.1.1, i have 3 brokers and I have a default replication
factor of 2 and a default partitioning factor of 2.

My partition are distributed fairly on every brokers.

My problem is that for all the topics I have, my data is only send to
either the partition 0 or the partition 1 (and correctly replicated). All
my brokers are in sync and the data is on every brokers (depending on where
the partitions are).

How can I made my producer / brokers write to the other partition??

Thanks!

François Langelier
Étudiant en génie Logiciel - École de Technologie Supérieure
http://www.etsmtl.ca/
Capitaine Club Capra http://capra.etsmtl.ca/
VP-Communication - CS Games http://csgames.org 2014
Jeux de Génie http://www.jdgets.com/ 2011 à 2014
Argentier Fraternité du Piranha http://fraternitedupiranha.com/ 2012-2014
Comité Organisateur Olympiades ÉTS 2012
Compétition Québécoise d'Ingénierie 2012 - Compétition Senior


Request: Adding us to the Powered By list

2014-08-01 Thread Vitaliy Verbenko

Dear Kafka team,

Would you mind add us @ 
https://cwiki.apache.org/confluence/display/KAFKA/Powered+By ?
We're using it as part of our ticket sequencing system for our helpdesk 
software.

--
*Vitaliy Verbenko - Business Development at Helprace *
vitaliy.verbe...@helprace.com
Customer Service Software and Help Desk by Helprace 
http://helprace.com/help-desk-software


Re: Consume more than produce

2014-08-01 Thread Steve Morin
You have to remember statsd uses udp and possibly lossy which might account
for the errors.
-Steve


On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg guy.doulb...@perion.com
wrote:

 Hey,


 After a year or so I have Kafka as my streaming layer in my production, I
 decided it is time to audit, and to test how many events do I lose, if I
 lose events at all.


 I discovered something interesting which I can't explain.


 The producer produces less events that the consumer group consumes.


 It is not much more, it is about 0.1% more events


 I use the Consumer API (not the simple consumer API)


 I was thinking I might had rebalancing going on in my system, but it
 doesn't look like that.


 Did anyone see such a behaviour


 In order to audit, I calculated for each event the minute it arrived, and
 assigned this value to the event, I used statsd do to count all events from
 all my producer cluster, and all consumer group cluster.


 I must say that it is not a happening for every minute,


 Thanks, Guy





Re: Most common kafka client comsumer implementations?

2014-08-01 Thread Jim
Thanks Guozhang,

I was looking for actual real world workflows. I realize you can commit
after each message but if you’re using ZK for offsets for instance you’ll
put too much write load on the nodes and crush your throughput. So I was
interested in batching strategies people have used that balance high/full
throughput and fully committed events.


On Thu, Jul 31, 2014 at 8:16 AM, Guozhang Wang wangg...@gmail.com wrote:

 Hi Jim,

 Whether to use high level or simple consumer depends on your use case. If
 you need to manually manage partition assignments among your consumers, or
 you need to commit your offsets elsewhere than ZK, or you do not want auto
 rebalancing of consumers upon failures etc, you will use simple consumers;
 otherwise you use high level consumer.

 From your description of pulling a batch of messages it seems you are
 currently using the simple consumer. Suppose you are using the high level
 consumer, to achieve at-lease-once basically you can do sth like:

 message = consumer.iter.next()
 process(message)
 consumer.commit()

 which is effectively the same as option 2 for using a simple consumer. Of
 course, doing so has a heavy overhead of one-commit-per-message, you can
 also do option 1, by the cost of duplicates, which is tolerable for
 at-least-once.

 Guozhang


 On Wed, Jul 30, 2014 at 8:25 PM, Jim jimi...@gmail.com wrote:

  Curious on a couple questions...
 
  Are most people(are you?) using the simple consumer vs the high level
  consumer in production?
 
 
  What is the common processing paradigm for maintaining a full pipeline
 for
  kafka consumers for at-least-once messaging? E.g. you pull a batch of
 1000
  messages and:
 
  option 1.
  you wait for the slowest worker to finish working on that message, when
 you
  get back 1000 acks internally you commit your offset and pull another
 batch
 
  option 2.
  you feed your workers n msgs at a time in sequence and move your offset
 up
  as you work through your batch
 
  option 3.
  you maintain a full stream of 1000 messages ideally and as you get acks
  back from your workers you see if you can move your offset up in the
 stream
  to pull n more messages to fill up your pipeline so you're not blocked by
  the slowest consumer (probability wise)
 
 
  any good docs or articles on the subject would be great, thanks!
 



 --
 -- Guozhang



Re: find topic partition count through simpleclient api

2014-08-01 Thread Guozhang Wang
One seed broker should be enough, and the the number of partitionMetadata
should be the same as num. of partitions. One note here is that the
metadata is propagated asynchronously to the brokers, and hence the
metadata returned by any broker may be stale by small chances, so you need
to periodically pull the metadata.

Guozhang


On Fri, Aug 1, 2014 at 8:46 AM, Weide Zhang weo...@gmail.com wrote:

 Hi,

 What's the way to find a topic's partition count dynamically using
 simpleconsumer api ?

 If I use one seed broker within a cluster of 10 brokers, and add list of
 topic name into the simple consumer request to find topics' metadata, when
 it returns,
 is the size of partitionsMetadata per topicmetadata same as the number of
 partitions for a given topic ? Also, for retrieval, do I need to have more
 than 1 seed broker to get all metadata info of a topic? Is only 1 seed
 broker enough ?

 Thanks,

 Weide




-- 
-- Guozhang


Re: Consume more than produce

2014-08-01 Thread Jun Rao
Do you have producer retries (due to broker failure) in those minutes when
you see a diff?

Thanks,

Jun


On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg guy.doulb...@perion.com
wrote:

 Hey,


 After a year or so I have Kafka as my streaming layer in my production, I
 decided it is time to audit, and to test how many events do I lose, if I
 lose events at all.


 I discovered something interesting which I can't explain.


 The producer produces less events that the consumer group consumes.


 It is not much more, it is about 0.1% more events


 I use the Consumer API (not the simple consumer API)


 I was thinking I might had rebalancing going on in my system, but it
 doesn't look like that.


 Did anyone see such a behaviour


 In order to audit, I calculated for each event the minute it arrived, and
 assigned this value to the event, I used statsd do to count all events from
 all my producer cluster, and all consumer group cluster.


 I must say that it is not a happening for every minute,


 Thanks, Guy





Re: Request: Adding us to the Powered By list

2014-08-01 Thread Jay Kreps
Sure, can you give me the blurb you want?

-Jay

On Fri, Aug 1, 2014 at 6:58 AM, Vitaliy Verbenko
vitaliy.verbe...@helprace.com wrote:
 Dear Kafka team,

 Would you mind add us @
 https://cwiki.apache.org/confluence/display/KAFKA/Powered+By ?
 We're using it as part of our ticket sequencing system for our helpdesk
 software.
 --
 *Vitaliy Verbenko - Business Development at Helprace *
 vitaliy.verbe...@helprace.com
 Customer Service Software and Help Desk by Helprace
 http://helprace.com/help-desk-software


Updated Kafka Roadmap?

2014-08-01 Thread Jonathan Weeks
Howdy, 

I was wondering if it would be possible to update the release plan:

https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan

aligned with the feature roadmap:

https://cwiki.apache.org/confluence/display/KAFKA/Index

We have several active projects actively and planning to use Kafka, and any 
current guidance on the new releases related to ZK dependence, producer and 
consumer API/client timing would be very helpful. For example, is 0.8.2 
possible in August, or is September likely?

Also, any chance something like:

https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer

…might make it into 0.9?

Thanks!

Re: Updated Kafka Roadmap?

2014-08-01 Thread David Birdsong
I too could benefit from an updated roadmap.

We're in a similar situation where some components in our stream processing
stack could use an overhaul, but I'm waiting for the offset API to be fully
realized before doing any meaningful planning.


On Fri, Aug 1, 2014 at 11:52 AM, Jonathan Weeks jonathanbwe...@gmail.com
wrote:

 Howdy,

 I was wondering if it would be possible to update the release plan:

 https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan

 aligned with the feature roadmap:

 https://cwiki.apache.org/confluence/display/KAFKA/Index

 We have several active projects actively and planning to use Kafka, and
 any current guidance on the new releases related to ZK dependence, producer
 and consumer API/client timing would be very helpful. For example, is 0.8.2
 possible in August, or is September likely?

 Also, any chance something like:

 https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer

 …might make it into 0.9?

 Thanks!


Re: kafka consumer fail over

2014-08-01 Thread Guozhang Wang
Hello Weide,

That should be doable via high-level consumer, you can take a look at this
page:

https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example

Guozhang


On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang weo...@gmail.com wrote:

 Hi,

 I have a use case for a master slave  cluster where the logic inside master
 need to consume data from kafka and publish some aggregated data to kafka
 again. When master dies, slave need to take the latest committed offset
 from master and continue consuming the data from kafka and doing the push.

 My questions is what will be easiest kafka consumer design for this
 scenario to work ? I was thinking about using simpleconsumer and doing
 manual consumer offset syncing between master and slave. That seems to
 solve the problem but I was wondering if it can be achieved by using high
 level consumer client ?

 Thanks,

 Weide




-- 
-- Guozhang