Re: Message throughput per day - Kafka

2018-02-20 Thread Sharninder
Why are the per second metrics for messages and bytes not useful for
capacity planning? I can't think of a situation where knowing the number of
metrics would be more useful. If you really want that, you can always
extrapolate the per second number and get an approximation.

--
Sharninder


On Tue, Feb 20, 2018 at 1:20 PM, siva prasad 
wrote:

> Thanks for Responding.
>
> I am not looking for throughput achieved by the broker.
>
> Metrics is needed to see how the platform is used for business, to know
> daily usage trends and variation after every release. It also helps to do
> capacity planning.
>
> I am not producer/consumer. My part is to host Kafka clusters and enable
> all functionalities for clients to make use of it.
>
> Cheers,
> Siva
>
> On Tue, Feb 20, 2018 at 12:55 PM, Sharninder  wrote:
>
> > >
> > > >
> > > > I am wondering if there is way to get messages through put of Kafka
> > > > brokers.
> > > >
> > > > Ex:
> > > > 1) Number of messages sent per day to a Broker/Cluster
> > > > 2) Number of messages consumed per day by a Broker/Cluster
> > > >
> > >
> >
> > I don't think it makes sense to have total metrics per day for
> > brokers/producers. There are per second metrics such as MessagesInPerSec
> at
> > brokers that you can use to gauge throughput. If you really want total
> > metrics, measure at your client/producer end yourself.
> >
> > --
> > Sharninder
> >
>
>
>
> --
> Cheers,
> Siva
>



-- 
--
Sharninder


Re: Message throughput per day - Kafka

2018-02-19 Thread Sharninder
>
> >
> > I am wondering if there is way to get messages through put of Kafka
> > brokers.
> >
> > Ex:
> > 1) Number of messages sent per day to a Broker/Cluster
> > 2) Number of messages consumed per day by a Broker/Cluster
> >
>

I don't think it makes sense to have total metrics per day for
brokers/producers. There are per second metrics such as MessagesInPerSec at
brokers that you can use to gauge throughput. If you really want total
metrics, measure at your client/producer end yourself.

--
Sharninder


Re: Kafka monitoring

2017-02-09 Thread Sharninder
All consumers will "eventually" get the messages. What is it that you want
to achieve by monitoring that? For the brokers you can monitor lag, for the
producers you can have a counter that tracks messages sent and for
consumers have one that tracks messages consumed. Although, just tracking
lag will probably be sufficient.


On Thu, Feb 9, 2017 at 2:19 PM, Nabajyoti Dash  wrote:

> Hi all,
> I have requirement to monitor — wheather every message sent by the
> producer is received by each and every kafka consumers or not.
>
> That is if any message is not delivered then it should be taken care of
> properly.
>
> I googled out but didn’t find any satisfactory answers.
> Please suggest.
>
> Thanks,
> NJ Dash




-- 
--
Sharninder


Re: Kafka Multiple Consumer Group for Same Topic

2017-01-24 Thread Sharninder Khera
I don't have benchmarks but multiple consumer groups are possible. For Kafka 
the performance should be similar or close to as having multiple consumers 
using a single group. 


_
From: Senthil Kumar 
Sent: Tuesday, January 24, 2017 10:38 PM
Subject: Kafka Multiple Consumer Group for Same Topic
To:  
Cc:  


Hi Team ,  Sorry if the same question asked already in this group !

Say we have topic => ad_events ..  I want to read events from ad_events
topic and send it to two different systems... This can be achieved by
creating two  Consume Groups..

Example :  Consumer Group SYS1 with 10 threads
  Consume Group SYS2 with 10 threads

Would like to know having two different Consumer Groups will impact
performance of Kafka Read ??  Also want to see the *Benchmarking Result*(
Numbers ) of   Single Topic Read with *One Consumer Group* Vs Single Topic
with *Two/Three Consumer Group*..


Cheers,
Senthil





Re: Kafka as a data ingest

2017-01-09 Thread Sharninder
If you want to know if "kafka" can read hadoop files, then no. But you can
write your own producer that reads from hdfs any which way and pushes to
kafka. We use kafka as the ingestion pipeline's main queue. Read from
various sources and push everything to kafka.


On Tue, Jan 10, 2017 at 6:26 AM, Cas Apanowicz  wrote:

> Hi,
>
> I have general understanding of main Kafka functionality as a streaming
> tool.
> However, I'm trying to figure out if I can use Kafka to read Hadoop file.
> Can you please advise?
> Thanks
>
> Cas
>
>


-- 
--
Sharninder


Re: Kafka Queue

2016-12-07 Thread Sharninder
Have you a set them to the same consumer group ID? That's what "identities"
a consumer group.

On Thu, Dec 8, 2016 at 2:06 AM, Justin Smith  wrote:

> I read this paragraph under Kafka as a Messaging System.
>
>
>
> “The consumer group concept in Kafka generalizes these two concepts. As
> with a queue the consumer group allows you to divide up processing over a
> collection of processes (the members of the consumer group). As with
> publish-subscribe, Kafka allows you to broadcast messages to multiple
> consumer groups.”
>
>
>
> But I cannot find where you are able to divide up processing over a
> collection of processors. I have 2 consumers and no matter what I do they
> always both get the same message. Is there a java reference you can point
> me to where I can understand how to accomplish this?
>
>
>
> Thanks a lot,
>
> *Justin Smith*
> *Software Developer*
>
> [image: Clearsense] <http://www.clearsense.com/>
>
> c: 904.497.3035 <(904)%20497-3035> 1300 Marsh Landing Pkwy., Ste.
> 105,  Jacksonville Beach, FL 32250
> clearsense.com <http://www.clearsense.com/>   |   *jsm...@clearsense.com
> *   |   Follow on Twitter
> <https://twitter.com/ClearsenseCan>   |   Connect on LinkedIn
> <https://www.linkedin.com/>
>
>
>



-- 
--
Sharninder


Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Sharninder
What do you mean by streaming way? The logic to push to S3 will be in your
consumer, so it totally depends on how you want to read and store. I think
that's an easier way to do what you want to, instead of trying to backup
kafka and then read messages from there. Not even sure that's possible.

On Tue, Dec 6, 2016 at 5:11 PM, Aseem Bansal  wrote:

> I get that we can read them and store them in batches but is there some
> streaming way?
>
> On Tue, Dec 6, 2016 at 5:09 PM, Aseem Bansal  wrote:
>
> > Because we need to do exploratory data analysis and machine learning. We
> > need to backup the messages somewhere so that the data scientists can
> > query/load them.
> >
> > So we need something like a router that just opens up a new consumer
> group
> > which just keeps on storing them to S3.
> >
> > On Tue, Dec 6, 2016 at 5:05 PM, Sharninder Khera 
> > wrote:
> >
> >> Why not just have a parallel consumer read all messages from whichever
> >> topics you're interested in and store them wherever you want to? You
> don't
> >> need to "backup" Kafka messages.
> >>
> >> _
> >> From: Aseem Bansal 
> >> Sent: Tuesday, December 6, 2016 4:55 PM
> >> Subject: Storing Kafka Message JSON to deep storage like S3
> >> To:  
> >>
> >>
> >> Hi
> >>
> >> Has anyone done a storage of Kafka JSON messages to deep storage like
> S3.
> >> We are looking to back up all of our raw Kafka JSON messages for
> >> Exploration. S3, HDFS, MongoDB come to mind initially.
> >>
> >> I know that it can be stored in kafka itself but storing them in Kafka
> >> itself does not seem like a good option as we won't be able to query it
> >> and
> >> the configurations of machines containing kafka will have to be
> increased
> >> as we go. Something like S3 we won't have to manage.
> >>
> >>
> >>
> >>
> >>
> >
> >
>



-- 
--
Sharninder


Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread Sharninder Khera
Why not just have a parallel consumer read all messages from whichever topics 
you're interested in and store them wherever you want to? You don't need to 
"backup" Kafka messages. 

_
From: Aseem Bansal 
Sent: Tuesday, December 6, 2016 4:55 PM
Subject: Storing Kafka Message JSON to deep storage like S3
To:  


Hi

Has anyone done a storage of Kafka JSON messages to deep storage like S3.
We are looking to back up all of our raw Kafka JSON messages for
Exploration. S3, HDFS, MongoDB come to mind initially.

I know that it can be stored in kafka itself but storing them in Kafka
itself does not seem like a good option as we won't be able to query it and
the configurations of machines containing kafka will have to be increased
as we go. Something like S3 we won't have to manage.





Re: Kafka consumers are not equally distributed

2016-11-22 Thread Sharninder
Could it be because of the partition key ?

On Wed, Nov 23, 2016 at 12:33 AM, Ghosh, Achintya (Contractor) <
achintya_gh...@comcast.com> wrote:

> Hi there,
>
> We are doing the load test in Kafka with 25tps and first 9 hours it went
> fine almost 80K/hr messages were processed after that we see a lot of lags
> and we stopped the incoming load.
>
> Currently we see 15K/hr messages are processing. We have 40 consumer
> instances with concurrency 4 and 2 topics and both is having 160 partitions
> so each consumer with each partition.
>
> What we found that some of the partitions are sitting idle and some of are
> overloaded and its really slowing down the consumer message processing.
>
> Why rebalancing is not happening and existing messages are not distributed
> equally among the instances? We tried to restart the app still the same
> pace. Any idea what could be the reason?
>
> Thanks
> Achintya
>
>


-- 
--
Sharninder


Re: Need to add & remove consumers dynamically in diffrent group and consume consecutively.

2016-10-19 Thread Sharninder Khera
Do you have only one partition in the topic? The way Kafka works is that all 
messages are first distributed into partitions in the topic and then the 
consumers are distributed among them and they read them sequentially. 
If you have only one partition in the topic, all your messages will be in it 
but only one consumer can read messages per partition. If you have multiple 
partitions, you can have that many consumers reading from it in parallel.
Let me know if that wasn't clear. 

_
From: Kaushil Rambhia/ MUM/CORP/ ENGINEERING 
Sent: Wednesday, October 19, 2016 8:40 PM
Subject: Need to add & remove consumers dynamically in diffrent group and 
consume consecutively.
To:  


Hi guys,
i am using apache kafka with phprd kafka, i want to know how can i use
multiple Kafka consumers on same partition from different groups to consume
message parallel, say if consumer are c1,c2,c3 consuming single partition
0,
than if c1 is consuming from 0 offset than c2 should start from 1 and c3
from 2 and if any new consumer comes up it should start from latest i.e 4th
 offset which is yet to be consumed by any consumer.

So in short all consumers should consume consecutively from kafka partition
and should not consumed same message again no matter any new consumer is
added or removed from different group.It should automatically select
consecutive offsets.

It would be good if anyone can recommend any solution or can this
be achieved by using kafka with some other tools like spark or something
else.

-- 
Regards,
Kaushil Rambhia

-- 



DISCLAIMER: The contents of this message may be legally privileged and 
confidential and are for the use of the intended recipient(s) only. It 
should not be read, copied and used by anyone other than the intended 
recipient(s). If you have received this message in error, please 
immediately notify the sender, preserve its confidentiality and delete it. 
Before opening any attachments please check them for viruses and defects.





Re: Kafka Producer performance - 400GB of transfer on single instance taking > 72 hours?

2016-08-25 Thread Sharninder
I think what Dana is suggesting is that since Python isn't doing a good job
utilising all the available CPU power, you could run multiple python
processes to process the load. Divide the mongodb collection between, say,
4 parts and process each part with one python process. On kafka side.

Or use a multi threaded java producer that is able to use the machine
optimally.


On Thu, Aug 25, 2016 at 10:21 PM, Dominik Safaric 
wrote:

> Dear Dana,
>
> > I would recommend
> > other tools for bulk transfers.
>
>
> What tools/languages would you rather recommend then using Python?
>
> I could for sure accomplish the same by using the native Java Kafka
> Producer API, but should this really affect the performance under the
> assumption that the Kafka configuration stays as is?
>
> > On 25 Aug 2016, at 18:43, Dana Powers  wrote:
> >
> > python is generally restricted to a single CPU, and kafka-python will max
> > out a single CPU well before it maxes a network card. I would recommend
> > other tools for bulk transfers. Otherwise you may find that partitioning
> > your data set and running separate python processes for each will
> increase
> > the overall CPU available and therefore the throughput.
> >
> > One day I will spend time improving the CPU performance of kafka-python,
> > but probably not in the near term.
> >
> > -Dana
>
>


-- 
--
Sharninder


Re: Using Kafka without persisting message to disk

2016-07-14 Thread Sharninder Khera
I'd second Tom here. Create a ram disk or just let Kafka write to disk. Use 
compression and batch messages and the OS fscache would take care of the rest. 
Kafka is pretty fast and you probably won't notice. 



_
From: Tom Crayford 
Sent: Friday, July 15, 2016 12:12 AM
Subject: Re: Using Kafka without persisting message to disk
To: Users 


Hi Jack,

No, kafka doesn't support not writing to disk. If you're really 100% sure
of yourself you could use a ramdisk and mount Kafka on it, but that's not
supported. I'd recommend "just" writing to disk, it's plenty fast enough
for nearly all use cases.

Thanks

Tom Crayford
Heroku Kafka

On Thu, Jul 14, 2016 at 7:33 PM, Jack Huang  wrote:

> Hi all,
>
> Is there a way to make a topic to be stored in memory only and not writing
> to disk? If not, what's the best way to minimize writing to disk? For this
> application we only need the notion of partitions and a short retention
> time (1hr or so) from Kafka. We want to use Kafka because we want to keep
> the flexibility to add persistence back if we need to.
>
> Thanks,
> Jack
>





Re: Streams Compatibility

2016-07-13 Thread Sharninder
requires 0.10

On Thu, Jul 14, 2016 at 6:08 AM, Matt Anderson 
wrote:

> Is the new Kafka Streams API compatible with Kafka 0.9.x API and Broker or
> does it require v0.10.x?
>
> Thanks,
> Matt
>



-- 
--
Sharninder


Re: Large kafka deployment on virtual hardware

2016-05-24 Thread Sharninder
I'm sure you checked this but since these are virtual machines, is it
possible there is just contention for resources? Network clogged or some
other simpler explanation like that?

On Mon, May 23, 2016 at 9:42 PM, Jahn Roux  wrote:

> I have a large Kafka deployment on virtual hardware: 120 brokers on 32gb
> memory 8 core virtual machines. Gigabit network, RHEL 6.7. 4 Topics, 1200
> partitions each, replication factor of 2 and running Kafka 0.8.1.2
>
>
>
> We are running into issues where our cluster is not keeping up. We have 4
> sets of producers (30 producers per set) set to produce to the 4 topics
> (producers produce to multiple topics). The messages are about 150 byte on
> average and we are attempting to produce between 1 million and 2 million
> messages a second per producer set.
>
>
>
> We run into issues after about 1 million messages a second - just for that
> producer set, the producer buffers fill up and we are blocked from
> producing
> messages. This does not seem to impact the other producer sets - they run
> without issues until they too reach about 1m messages a second.
>
>
>
> Looking at the metrics available to us we do not see a bottleneck, we don't
> see disk I/O maxing out, CPU and network are nominal. We have tried
> increasing and decreasing the Kafka cluster size to no avail, we have gone
> from 100 partitions to 1200 partitions per topic. We have increased and
> decreased the number of producers and yet we run into the same issues. Our
> Kafka config is mostly out the box - 1 hour log roll/retention, increased
> the buffer sizes a bit but other than that it's out the box.
>
>
>
> I was wondering if someone has some recommendations for identifying the
> bottleneck and/or what configuration values we should be taking a look at?
> Is there known issues with Kafka on virtualized hardware or things to watch
> out for when deploying to VMs? Are there use cases where Kafka is being
> used
> in a similar way - +4 million messages a second of discrete 150 byte
> messages?
>
>
>
> Kind regards,
>
>
>
> Jahn Roux
>
>
>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>



-- 
--
Sharninder


Re: Question about 'key'

2016-03-30 Thread Sharninder
The documentation says that the only purpose of the "key" is to decide the
partition the data ends up in. The consumer doesn't decide that. I'll have
to look at the documentation but I'm not entirely sure if the consumers
have access to this key. The producer does. You can override the default
partitioner class and write one that uses your understands and interprets
your definition of the key to place data in a specific partition. By
default, I believe data is distributed using a round robin partitioner.



On Thu, Mar 31, 2016 at 2:58 AM, Marcelo Oikawa  wrote:

> Hi, list.
>
> We're working on a project that uses Kafka and we notice that for every
> message we have a key (or null). I searched for more info about the key
> itself and the documentation says that it is only used to decide the
> partition where the message is placed.
>
> Is there a problem if we use keys with the application semantics
> (metadata)? For instance, we can use the key "origin:foo;target:boo" and
> the consumers may use the key info to make decisions. But, a lot of
> messages may use the same key and it may produce unbalanced partitions, is
> that right?
>
> Does anyone know more about the key and your role inside kafka?
>
> []s
>



-- 
--
Sharninder


Re: Queue implementation

2016-03-28 Thread Sharninder
What kind of queue are you looking for? Kafka works as a nice FIFO queue by
default anyway.



On Mon, Mar 28, 2016 at 5:19 PM, Vinod Kakad  wrote:

> Can anybody share any good example(code) for kafka as a queue
> implementation?
>
> Thanks & Regards,
> Vinod Kakad.
>



-- 
--
Sharninder


Re: Kafka as master data store

2016-02-15 Thread Sharninder Khera
This topic comes up often on this list. Kafka can be used as a datastore if 
that’s what your application wants with the caveat that Kafka isn’t designed to 
keep data around forever. There is a default retention time after which older 
data gets deleted. The high level consumer essentially reads data as a stream 
and while you can do sort of random access with the low level consumer, its not 
ideal.



> On 15-Feb-2016, at 10:26 PM, Ted Swerve  wrote:
> 
> Hello,
> 
> Is it viable to use infinite-retention Kafka topics as a master data
> store?  I'm not talking massive volumes of data here, but still potentially
> extending into tens of terabytes.
> 
> Are there any drawbacks or pitfalls to such an approach?  It seems like a
> compelling design, but there seem to be mixed messages about its
> suitability for this kind of role.
> 
> Regards,
> Ted



Re: Kafka advice for small startup

2016-02-15 Thread Sharninder
Kafka is pretty nice and as long as you have basic monitoring in place,
doesn't take too much attention, but keep in mind that it still depends on
zookeeper and I've seen that being the bottleneck. in the past. I also
think as a single engineering person in your startup, if you don't need
kafka or any other component, don't get it. All said and done, it still is
one extra component to manage.

--
Sharninder


On Mon, Feb 15, 2016 at 6:07 AM, Todd Snyder  wrote:

> So long as you put some basic monitoring in place, it should run nicely
> with very little intervention and let you be confident everything is as it
> should be.
>
> Key things to watch:
>
> * disk space - a disk filling up really makes things difficult for you.
> Make sure your retention fits your footprint.
> * consumer lag
> * isr (in sync replica) state - really, you just want to know about out of
> sync replicas
> * simple rates (produce/consume) to make sure things are sane.
> * standard up/down monitoring (though isr status will typically catch dead
> brokers)
>
> If you dont already have jmx monitoring tools, I suggest you look at
> jmxtrans and graphite/grafana, or elk stack. Both will enable you to
> monitor all the jmx stats available, and visualize easily, so you can
> hopefully just fire and (mostly) forget.
>
> We run about 100 brokers, moving 60billion+ messages a day, in a few
> different clusters, and aside from glancing at the dashboard occasionally,
> there is really no day-to-day maintenance required.‎ Kafka just works.
>
> Probably our biggest operational overhead is when it comes time to roll a
> cluster to apply a change or do an upgrade. Waiting for everything to get
> back in sync and stable before starting another broker means it can take a
> couple days to complete an upgrade/change safely.
>
> Cheers
>
> Todd‎
>
> Sent from my BlackBerry 10 smartphone on the TELUS network.
>   Original Message
> From: Kyle Mathews
> Sent: Sunday, February 14, 2016 7:07 PM
> To: users@kafka.apache.org
> Reply To: users@kafka.apache.org
> Subject: Kafka advice for small startup
>
>
> Hi I'm the technical co-founder of a startup in San Francisco and I've been
> investigating using Kafka for some time.
>
> I'm thinking about biting the bullet and integrating it soon into our
> system. My main concern is how much operational overhead it'll add. I'm the
> only engineer right now and am already stretched thin. Since Kafka isn't a
> must have yet, it only makes sense to adopt it if it's pretty care free.
>
> What are people's experiences in similar situations to mine? Is Kafka
> something you can setup and forget like say redis or is something I'll be
> babysitting a decent bit and spending a lot of time on upgrades etc.?
>
> Also I'd love to take anyone in San Francisco with Kafka experience out to
> lunch or coffee.
>
> Thanks!
> Kyle
>



-- 
--
Sharninder


Re: Kafka 0.8.2 ConsumerGroup Example

2016-02-10 Thread Sharninder
How you do this in the high level consumer is slightly different. First,
set the auto offset commit property to false. Then you can keep reading
messages from kafka and when you're done filling up your buffer, commit
offsets manually. The following link gives an example:

http://ingest.tips/2014/10/12/kafka-high-level-consumer-frequently-missing-pieces/

--
Sharninder


On Wed, Feb 10, 2016 at 9:11 PM, Joe San  wrote:

> I'm following the ConsumerGroup example,
>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
>
> How can I specify the batch size of the messages that I want to consume? I
> see that if I use the SimpleConsumer, I can specify a size that I want to
> read. How can I do it here with the ConsumerGroup? I want to be able to
> consume let's say 1000 records, process them and so on...
>



-- 
--
Sharninder


Re: RAM usage of kafka

2015-11-10 Thread Sharninder
Kafka uses and relies on OS level caching heavily. Likely you're just
measuring that. No need to worry unless you're actually seeing any GC
related problems.

--
Sharninder



On Tue, Nov 10, 2015 at 4:44 PM, Birendra Kumar Singh 
wrote:

> I am looking to understand the RAM usage of kafka cluster
>
> I have the following running
>
> VM -1 (3.75GB RAM)
>kafka-server 1
> VM-2 (3.75 GB RAM)
>   kafka-server 2
>
> I have one topic created.
> Prior to any message in the topic - RAM used by kafka server process was
> around 8%.
> I pushed around 20 messages on the topic within a span of couple of
> hours -
> Its been almost 20 hours since the messages were pushed. I see that right
> now the kafka server process is using around 30% of memory.
>
> Why is so much of memory being used ? Can someone help me understand how to
> optimize it.
>
> Thanks
> Birendra
>



-- 
--
Sharninder


Re: Kafka - Rest api query

2015-10-20 Thread Sharninder
Sounds like an app design decision. What help can this list give you ? 


> On 20-Oct-2015, at 8:07 PM, Kudumula, Surender  
> wrote:
> 
> Dear sir/madam
> I have a query. We are working on POC at the moment and we are using kafka to 
> produce and consume messages. I have one component which consumes the request 
> from topic and processes it and creates a file and again produce the java 
> object as byte array to another kafka topic. Now I have two options
> 1. Is to write a java rest client and send the bytearray java object to 
> rest api service in another component
> 2. Or I was wondering if its possible to route the message from kafka 
> topic to another component rest APi server.
>  
> Please let me know we are using kafka 0.8 and new producer. Any suggestions 
> would be appreciated?
>  
> Regards
>  
> Surender Kudumula
> Big Data Consultant - EMEA
> Analytics & Data Management
>  
> surender.kudum...@hpe.com
> M +44 7795970923
> 
> Hewlett-Packard Enterprise
> Cain Rd,
> Bracknell
> RG12 1HN
> UK
> 
> 
>  


Re: which producer should be used

2015-09-28 Thread Sharninder
Sorry, you're correct. Looks like a silly mistake on my side. I'll check my
imports

--
Sharninder


On Mon, Sep 28, 2015 at 6:24 PM, Gwen Shapira  wrote:

> KafkaProducer takes ProducerRecord as an argument to send().
>
> See the code:
>
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L298
>
> and a usage example here:
>
> https://github.com/gwenshap/kafka-examples/blob/master/SimpleCounter/src/main/java/com/shapira/examples/producer/simplecounter/DemoProducerNewJava.java
>
> If you can send Seq instead, it looks like you are still
> using an older Scala producer. Please check your import statements and
> which object you are instantiating.
>
> Gwen
>
> On Sun, Sep 27, 2015 at 10:58 PM, Sharninder  wrote:
>
> > Hi Gwen,
> >
> > We were using kafka.javaapi.producer.Producer so far and I'm trying to
> move
> > the code to use KafkaProducer now but it seems the send api for producer
> > takes Seq as arguement instead of ProducerRecord as the
> > documentation for 0.8.2 says:
> >
> >
> >
> http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html
> >
> > Is there an equivalent Java API for 0.8.2 yet or is the older one the
> most
> > current?
> >
> > --
> > Sharninder
> >
> >
> > On Mon, Sep 28, 2015 at 9:15 AM, Gwen Shapira  wrote:
> >
> > > KafkaProducer is the most current and full-featured one, and it should
> be
> > > used.
> > >
> > > The other producers will be deprecated in a release or two, so I
> > recommend
> > > not to use them.
> > >
> > > On Sun, Sep 27, 2015 at 8:40 PM, Li Tao 
> wrote:
> > >
> > > > Hi there,
> > > > I noticed that there are several producers our there:
> > > >
> > > > **  **org.apache.kafka.clients.producer.KafkaProducer*
> > > > **  **kafka.javaapi.producer.Producer*
> > > >
> > > > What's the difference between them? Which one should be used?
> > > >
> > >
> >
> >
> >
> > --
> > --
> > Sharninder
> >
>



-- 
--
Sharninder


Re: which producer should be used

2015-09-27 Thread Sharninder
Hi Gwen,

We were using kafka.javaapi.producer.Producer so far and I'm trying to move
the code to use KafkaProducer now but it seems the send api for producer
takes Seq as arguement instead of ProducerRecord as the
documentation for 0.8.2 says:

http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html

Is there an equivalent Java API for 0.8.2 yet or is the older one the most
current?

--
Sharninder


On Mon, Sep 28, 2015 at 9:15 AM, Gwen Shapira  wrote:

> KafkaProducer is the most current and full-featured one, and it should be
> used.
>
> The other producers will be deprecated in a release or two, so I recommend
> not to use them.
>
> On Sun, Sep 27, 2015 at 8:40 PM, Li Tao  wrote:
>
> > Hi there,
> > I noticed that there are several producers our there:
> >
> > **  **org.apache.kafka.clients.producer.KafkaProducer*
> > **  **kafka.javaapi.producer.Producer*
> >
> > What's the difference between them? Which one should be used?
> >
>



-- 
--
Sharninder


Re: Mapping a consumer in a consumer group to a partition in a topic

2015-09-22 Thread Sharninder
As far as I know, with a consumer group implementation you cannot pin consumers 
to partitions. That logic is taken care of by the high level API on its own. 

> On 23-Sep-2015, at 6:38 AM, Spandan Harithas Karamchedu 
>  wrote:
> 
> Hi,
> 
> We created a topic with 3 partitions and a replication factor of 3. We are
> able to implement a consumer to get the data from a specific partition in a
> topic but we are struck in implementing a consumer within a specified
> consumer group to be mapped to single partition of a topic and get the data
> from the single partition of the topic. Can you let us how we can configure
> a consumer to a partition within consumer group?
> 
> 
> 
> We wanted to develop a kafka cluster similar to below one as specified in
> kafka docs. We are using kafka 0.8.1.
> 
> 
> 
> 
> Thanks
> 
> Spandan


Re: How to skip malformed messages in a custom Kafka decoder?

2015-08-19 Thread Sharninder
What do you mean by malformed messages? Consuming messages and what to do
with them is the application's logic. Consume them, if they're not relevant
pick up the next message.

On Wed, Aug 19, 2015 at 1:10 PM, Petr Novak  wrote:

> Hi all,
> ... by returning null?
>
> Many thanks,
> Petr
>



-- 
--
Sharninder


Re: Consumer limit for pub-sub mode

2015-08-08 Thread Sharninder
Well, you're right. But that's not what I meant. I meant using consumers
with the same consumer id. With different IDs, obviously the problem (and
solution) changes. But then I think the OP is just trying to twist his
problem to fit in the kafka way of things.

--
Sharninder


On Sat, Aug 8, 2015 at 10:36 PM, Jiangjie Qin 
wrote:

> That's not true Shaninder. Assuming you are using the high level consumer,
> if all the consumers have different group id, each of them will get the all
> the messages.
>
> Also, another way to think about this a little bit is that maybe you can
> produce the same data to many different topics each have one partition, so
> the consumers needs only to consume from one of the topic.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Aug 3, 2015 at 10:40 AM, Sharninder  wrote:
>
> > I don't know of any limits as such but I don't think your problem is
> > suitable for Kafka. The third point especially wouldn't work with Kafka.
> > Using Kafka, only one consumer will get a message out of the 30k
> >
> >
> >
> > > On 03-Aug-2015, at 10:39 am, Vaibhav Kirte 
> > wrote:
> > >
> > > Hi,
> > >
> > > I need to know how many consumers can subscribe to a single topic (
> with
> > > one partition ).
> > >
> > > I have a requirement such that,
> > >1. The producer will post to 1 topic having 1 partition.
> > >1. *20,000-30,000 consumers *should to be able to consume messages.
> > >2. All of the consumers should receive all messages that are
> produced.
> > >
> > > will this be possible using kafka ?
> > > is there a limit on number of consumers ?
> > > what will be the number of machines that I will need to satisfy the
> > > requirements ?
> > >
> > > --
> > > Regards,
> > > Vaibhav Kirte
> >
>



-- 
--
Sharninder


Re: How to read in batch using HighLevel Consumer?

2015-08-04 Thread Sharninder
You can't. Kafka is essentially a queue, so you always read messages one by 
one. What you can do is disable auto offset commit, read 100 messages, process 
them and then manually commit offset.

--
Sharninder 

> On 04-Aug-2015, at 9:07 pm, shahab  wrote:
> 
> Hi,
> 
> While we the producer can put data as batch in kafka server,  I couldn't
> find any API (or any document) saying how we can fetch data as batch from
> Kafka ?
> Even when data is placed as batch in kafka server, still using High Level
> consumer I can only read one by one, and I can not specify. for example,
> read 100 items at once!
> 
> Is this correct observation? or I am missing something?
> 
> best,
> /Shahab


Re: Consumer limit for pub-sub mode

2015-08-03 Thread Sharninder
I don't know of any limits as such but I don't think your problem is suitable 
for Kafka. The third point especially wouldn't work with Kafka. Using Kafka, 
only one consumer will get a message out of the 30k



> On 03-Aug-2015, at 10:39 am, Vaibhav Kirte  wrote:
> 
> Hi,
> 
> I need to know how many consumers can subscribe to a single topic ( with
> one partition ).
> 
> I have a requirement such that,
>1. The producer will post to 1 topic having 1 partition.
>1. *20,000-30,000 consumers *should to be able to consume messages.
>2. All of the consumers should receive all messages that are produced.
> 
> will this be possible using kafka ?
> is there a limit on number of consumers ?
> what will be the number of machines that I will need to satisfy the
> requirements ?
> 
> -- 
> Regards,
> Vaibhav Kirte


Re: Noobie question about consumers and ordering of messages

2015-06-12 Thread Sharninder
The producer can use a key to determine which partition to put data in. For 
example we use a partitioning scheme based on the header of the message so that 
all messages with a specific header go to a specific partition. The consumer 
can also include the same logic to process those messages. 

> On 13-Jun-2015, at 4:01 am, Dillian Murphey  wrote:
> 
> So I just realized that when creating multiple partitions for a topic, it
> is the responsibility of the producer and the consumer to  decide where to
> put data.
> 
> So if I am a consumer, how do I know what partition to read data from?
> What if the producer only put data in 1 partition out of 50.
> 
> And so there isn't any order, here, right? Only order per partition, but
> there is no concept of consuming messages in the order they were produced,
> correct?
> 
> Thanks. Just struggling on the practical aspects now that I've moved past
> some of the administration tasks.


Re: Multiple instances of HL Consumer

2015-06-05 Thread Sharninder Khera
You can have the same consumer id and Kafka will balance partitions across the 
two instances automatically. When one of them dies the partitions are 
rebalanced and assigned to the remaining alive consumers. 



_
From: Panda, Samaresh 
Sent: Friday, June 5, 2015 7:42 pm
Subject: Multiple instances of HL Consumer
To:  


I've a HL consumer receiving messages using four threads (four partitions). 
This is a stand-alone Java client. For fail-safe reasons, I want to run another 
instance of the exact same Java client in a different box.

Here are my questions:

> Can I keep the same consumer group name or it must be different for the 2nd 
> instance?
> If same consumer group, will the 2nd client receive same set of messages 
> again?
> In general what's the best practice to designing fail-safe clients?

Thanks
Sam

Re: Need some guidance in how to solve this problem.

2015-05-20 Thread Sharninder
> I have a backend service written in PHP. This service pushes messages to
> Apache Kafka (over the topic "posts") when posts are created, read and
> removed. I also have a backend service written in Java. This service
> consumes messages from Apache Kafka (for the "posts" topic) and push them
> out over WebSockets to whoever is listening. This works out great.
>
> The problem I have is as follows. Say, for some reason, the WebSocket
> client timeout and because of this he or she does not get some of the
> messages. The messages are still being consumed but the client isn't there
> to acknowledge them. Now, the client want "rewind" and get all the messages
> that they missed may have missed. This is where I'm stuck.
>
> Ideally, the client would provide a timestamp for when they last got a
> message and I'd then use that timestamp to get messages from that point and
> onwards. I'm not sure if this is even possible, it's just my thoughts.
>
>
Something like this was discussed a few weeks back on the list. But,
essentially what you need to do is turn off auto commit and only commit
when you're sure the message has been acknowledged by the next stage.

I think the link is this ->
http://ingest.tips/2014/10/12/kafka-high-level-consumer-frequently-missing-pieces/

Regards,
Sharninder


Re: Horizontally Scaling Kafka Consumers

2015-04-30 Thread Sharninder
You need to first decide the conditions that need to be met for you to
scale to 50 consumers. These can be as simple as the consumer lag. Look at
the console offset checker tool and see if any of those numbers make sense.
Your existing consumers could also produce some metrics based on which
another process will decide when to spawn new customers.

--
Sharninder


On Wed, Apr 29, 2015 at 11:58 PM, Nimi Wariboko Jr 
wrote:

> Hi,
>
> I was wondering what options there are/what other people are doing for
> horizontally scaling kafka consumers? Basically if I have 100 partitions
> and 10 consumers, and want to temporarily scale up to 50 consumers, what
> can I do?
>
> So far I've thought of just simply tracking consumer membership somehow
> (either through zookeeper's ephemeral nodes or maybe using gossip) on the
> consumers to achieve consensus on who consumes what. Another option would
> be having a router, possibly using something like nsq (I understand that
> they are similar pieces of software, but what we are going for is a
> persistent distributed queue (sharding) which is why I'm looking into
> Kafka)?
>



-- 
--
Sharninder


Re: Plugable metadata store

2014-11-14 Thread Sharninder
I haven't been following closely but getting rid of zookeeper is in the 
pipeline. Look up 0.9 plans. They're somewhere on the wiki. 

Sent from my iPhone

> On 14-Nov-2014, at 5:18 pm, Stevo Slavić  wrote:
> 
> Hello Apache Kafka community,
> 
> Is it already possible to configure/use a different metadata store (topics,
> consumer groups, consumer to partition assignments, etc.) instead of
> ZooKeeper?
> If not, are there any plans to make it plugable in future?
> 
> Kind regards,
> Stevo Slavic


Re: One of two consumers is always Idle though I have 2 partitions

2014-11-13 Thread Sharninder
If you're not using your own partitioning logic, messages are partitioned
randomly. This is the current default behavior I believe.


On Fri, Nov 14, 2014 at 12:01 PM, Palur Sandeep 
wrote:

> Thank you Chia-chun,Joe and Jagat.
>
> I am not using any custom partitioner logic. Here is what I observed when I
> ran kafka on 4 nodes with the following structure:
>
> 1. Each node has a producer, consumer and a broker (that contains one
> partition of my topic) and one of the machine has the Zookeeper too.
> 2. Producer in each node sends 1 messages to my topic.
> 3. I observed that consumer in all 4 nodes gets some messages and some
> times only 2 nodes receive messages and one doesn't and sometimes only one
> node receives messages and 3 doesnt receive any messages.
>
> So according to my observation, producer is sending messages to random
> partition.
>
> Am I correct?
>
> Thank you
> Sandeep
>
>
>
>
>
>
> On Thu, Nov 13, 2014 at 9:34 PM, Joe Stein  wrote:
>
> > Yup, sounds like
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
> > ?
> >
> > This should go away with 0.8.2 with the default partions now being 1 =8^)
> > with auto create topics.
> >
> > /***
> > Joe Stein
> > Founder, Principal Consultant
> > Big Data Open Source Security LLC
> > http://www.stealth.ly
> > Twitter: @allthingshadoop
> > /
> > On Nov 13, 2014 8:34 PM, "Chia-Chun Shih" 
> wrote:
> >
> > > Hi Palur,
> > >
> > > When producing messages, did you specify a key in your KeyedMessage? If
> > > not, producer will send all messages to ONE randomly selected partition
> > and
> > > stick to this partition for 10 minutes by default.
> > >
> > > regards,
> > > Chia-Chun
> > >
> > > 2014-11-14 7:19 GMT+08:00 Jagat Singh :
> > >
> > > > It would be worth reading once the consumer section from the
> > > documentation.
> > > >
> > > > https://kafka.apache.org/documentation.html
> > > >
> > > >
> > > >
> > > > On Fri, Nov 14, 2014 at 10:09 AM, Palur Sandeep <
> psand...@hawk.iit.edu
> > >
> > > > wrote:
> > > >
> > > > > Yes, they are on the same consumer group, but I have two
> partitions.
> > > > >
> > > > > On Thu, Nov 13, 2014 at 5:04 PM, Jagat Singh  >
> > > > wrote:
> > > > >
> > > > > > Are both of them in same Consumer Group?
> > > > > >
> > > > > > On Fri, Nov 14, 2014 at 9:12 AM, Palur Sandeep <
> > > psand...@hawk.iit.edu>
> > > > > > wrote:
> > > > > >
> > > > > > > Dear Developers,
> > > > > > >
> > > > > > > I am 2nd year masters student at IIT. I am using Kafka for one
> of
> > > my
> > > > > > > research projects.My question is the following:
> > > > > > >
> > > > > > > 1. I have a producer, consumer and a broker(that contains 1st
> > > > partition
> > > > > > of
> > > > > > > my topic)  on node1
> > > > > > > 2. I have a producer, consumer, zookeeper and a broker(that
> > > contains
> > > > > 2nd
> > > > > > > partition of my topic)  on node2
> > > > > > > 3. Here comes my problem: though I have two partitions only one
> > > > > consumer
> > > > > > > pulls messages and the other one is always idle.
> > > > > > >
> > > > > > > What is that I can do to keep both of my consumer busy?
> > > > > > >
> > > > > > > Thank you
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > > Sandeep Palur
> > > > > > > Data-Intensive Distributed Systems Laboratory, CS/IIT
> > > > > > > Department of Computer Science, Illinois Institute of
> Technology
> > > > (IIT)
> > > > > > > Phone : 312-647-9833
> > > > > > > Email : psand...@hawk.iit.edu 
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Sandeep Palur
> > > > > Data-Intensive Distributed Systems Laboratory, CS/IIT
> > > > > Department of Computer Science, Illinois Institute of Technology
> > (IIT)
> > > > > Phone : 312-647-9833
> > > > > Email : psand...@hawk.iit.edu 
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Regards,
> Sandeep Palur
> Data-Intensive Distributed Systems Laboratory, CS/IIT
> Department of Computer Science, Illinois Institute of Technology (IIT)
> Phone : 312-647-9833
> Email : psand...@hawk.iit.edu 
>



-- 
--
Sharninder


Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Sharninder
Thanks Gwen. This really helped.

Yes, Kafka is the best thing ever :)

Now how would this be done with the Simple consumer? I'm guessing I'll have
to maintain my own state in Zookeeper or something of that sort?


On Thu, Oct 9, 2014 at 12:01 AM, Gwen Shapira  wrote:

> Here's an example (from ConsumerOffsetChecker tool) of 1 topic (t1)
> and 1 consumer group (flume), each of the 3 topic partitions is being
> read by a different machine running the flume consumer:
> Group   Topic  Pid Offset
> logSize Lag Owner
> flume   t1 0   50172068
> 100210042   50037974
> flume_kafkacdh-1.ent.cloudera.com-1412722833783-3d6d80db-0
> flume   t1 1   49914701
> 499147010
> flume_kafkacdh-2.ent.cloudera.com-1412722838536-a6a4915d-0
> flume   t1 2   54218841
> 8273338028514539
> flume_kafkacdh-3.ent.cloudera.com-1412722832793-b23eaa63-0
>
> If flume_kafkacdh-1 crashed, another broker will pick up the partition:
> Group   Topic  Pid Offset
> logSize Lag Owner
> flume   t1 0   59669715
> 100210042   40540327
> flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
> flume   t1 1   49914701
> 499147010
> flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
> flume   t1 2   65796205
> 8273338016937175
> flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0
>
> Then I can start flume_kafkacdh-4 and see things rebalance again:
> flume   t1 0   60669715
> 100210042   39540327
> flume_kafkacdh-2.ent.cloudera.com-1412792880818-b4aa6feb-0
> flume   t1 1   49914701
> 499147010
> flume_kafkacdh-3.ent.cloudera.com-1412792871089-cabd4934-0
> flume   t1 2   66829740
> 8273338015903640
> flume_kafkacdh-4.ent.cloudera.com-1412793053882-9bfddff9-0
>
> Isn't Kafka the best thing ever? :)
>
> Gwen
>
> On Wed, Oct 8, 2014 at 11:23 AM, Gwen Shapira 
> wrote:
> > yep. exactly.
> >
> > On Wed, Oct 8, 2014 at 11:07 AM, Sharninder 
> wrote:
> >> Thanks Gwen.
> >>
> >> When you're saying that I can add consumers to the same group, does that
> >> also hold true if those consumers are running on different machines? Or
> in
> >> different JVMs?
> >>
> >> --
> >> Sharninder
> >>
> >>
> >> On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira 
> wrote:
> >>
> >>> If you use the high level consumer implementation, and register all
> >>> consumers as part of the same group - they will load-balance
> >>> automatically.
> >>>
> >>> When you add a consumer to the group, if there are enough partitions
> >>> in the topic, some of the partitions will be assigned to the new
> >>> consumer.
> >>> When a consumer crashes, once its node in ZK times out, other
> >>> consumers will get its partitions.
> >>>
> >>> Gwen
> >>>
> >>> On Wed, Oct 8, 2014 at 10:39 AM, Sharninder 
> wrote:
> >>> > Hi,
> >>> >
> >>> > I'm not even sure if this is a valid use-case, but I really wanted
> to run
> >>> > it by you guys. How do I load balance my consumers? For example, if
> my
> >>> > consumer machine is under load, I'd like to spin up another VM with
> >>> another
> >>> > consumer process to keep reading messages off any topic. On similar
> >>> lines,
> >>> > how do you guys handle consumer failures? Suppose one consumer
> process
> >>> gets
> >>> > an exception and crashes, is it possible for me to somehow make sure
> that
> >>> > there is another process that is still reading the queue for me?
> >>> >
> >>> > --
> >>> > Sharninder
> >>>
>


Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Sharninder
Thanks Gwen.

When you're saying that I can add consumers to the same group, does that
also hold true if those consumers are running on different machines? Or in
different JVMs?

--
Sharninder


On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira  wrote:

> If you use the high level consumer implementation, and register all
> consumers as part of the same group - they will load-balance
> automatically.
>
> When you add a consumer to the group, if there are enough partitions
> in the topic, some of the partitions will be assigned to the new
> consumer.
> When a consumer crashes, once its node in ZK times out, other
> consumers will get its partitions.
>
> Gwen
>
> On Wed, Oct 8, 2014 at 10:39 AM, Sharninder  wrote:
> > Hi,
> >
> > I'm not even sure if this is a valid use-case, but I really wanted to run
> > it by you guys. How do I load balance my consumers? For example, if my
> > consumer machine is under load, I'd like to spin up another VM with
> another
> > consumer process to keep reading messages off any topic. On similar
> lines,
> > how do you guys handle consumer failures? Suppose one consumer process
> gets
> > an exception and crashes, is it possible for me to somehow make sure that
> > there is another process that is still reading the queue for me?
> >
> > --
> > Sharninder
>


Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Sharninder
Hi,

I'm not even sure if this is a valid use-case, but I really wanted to run
it by you guys. How do I load balance my consumers? For example, if my
consumer machine is under load, I'd like to spin up another VM with another
consumer process to keep reading messages off any topic. On similar lines,
how do you guys handle consumer failures? Suppose one consumer process gets
an exception and crashes, is it possible for me to somehow make sure that
there is another process that is still reading the queue for me?

--
Sharninder


Re: Read a specific number of messages using kafka

2014-09-25 Thread Sharninder
Slight off-topic, but is it also possible to replay a specific number of
messages? For example, using the simple consumer, can I go back/reset the
offset so that I always go read the last 10 messages assuming the size of
each individual message could be different. All I found in the simple
consumer example was that replaying needs a byte parameter, but maybe I
didn't look hard enough.

--
Sharninder


On Thu, Sep 25, 2014 at 10:15 PM, pankaj ojha 
wrote:

> Thank You. I will try this out.
>
> On Thu, Sep 25, 2014 at 10:01 PM, Gwen Shapira 
> wrote:
>
> > Using high level consumer and assuming you already created an iterator:
> >
> > while (msgCount < maxMessages && it.hasNext()) {
> >  bytes = it.next().message();
> >  eventList.add(bytes);
> > }
> >
> > (See a complete example here:
> >
> >
> https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-kafka-source/src/main/java/org/apache/flume/source/kafka/KafkaSource.java
> > )
> >
> > Gwen
> >
> > On Thu, Sep 25, 2014 at 9:15 AM, pankaj ojha 
> > wrote:
> > > Hi,
> > >
> > > My requirement is to read a specific number of messages from kafka
> topic
> > > which contains data in json format and after reading number of
> messges, i
> > > need to write that in a file and then stop. How can I count number of
> > > messages read by my consumer code(either simpleconsumer or high level)
> ?
> > >
> > > Please help.
> > >
> > > --
> > > Thanks,
> > > Pankaj Ojha
> >
>
>
>
> --
> Thanks,
> Pankaj Ojha
>


Re: Copying messages from a single partition topic to a multi-partition topic

2014-09-19 Thread Sharninder
Agree. Why not write a service that reads from existing topics and writes
to new ones. Point existing producers to the new partitions wait for the
service to finish reading. Since you have only one partition right now, I'd
assume you don't have any partitioning logic per se. Think about that since
with multiple partitions you'd want your messages distributed evenly.


On Fri, Sep 19, 2014 at 9:32 PM, Jonathan Weeks 
wrote:

> I would look at writing a service that reads from your existing topic and
> writes to a new topic with (e.g. four) partitions.
>
> You will also need to pay attention to the partitioning policy (or
> implement your own), as the default hashing in the current kafka version
> default can lead to poor distribution.
>
> Best Regards,
>
> -Jonathan
>
>
> On Sep 19, 2014, at 8:57 AM, Dennis Haller  wrote:
>
> > Hi,
> >
> > We have an interesting problem to solve due to a very large traffic
> volumes
> > on particular topics. In our initial system configuration we had only one
> > partition per topic, and in in a couple of topics we have built up huge
> > backlogs of several million messages that our consumers are slowly
> > processing.
> >
> > However, now that we have this constant backlog, we wish to repartition
> > those topics into several partitions, and allow parallel consumers to run
> > to handle the high message volume.
> >
> > If we simply repartition the topic, say from 1 to 4 partitions, the
> > backlogged messages stay in partition 1, while partitions 2,3,4 only get
> > newly arrived messages. To eat away the backlog, we need to redistribute
> > the backlogged messages evenly among the 4 partitions.
> >
> > The tools I've seen do not allow me to rewrite or "replay" the existing
> > backlogged messages from one partition into the same or another topic
> with
> > several partitions.  - using kafka.tools.MirrorMaker does not allow me to
> > move the data within the same zookeeper network, and
> > - using kafka.tools.ReplayLogProducer does not write to multiple
> > partitions. It seems that it will write only from a single partition to a
> > single partition.
> >
> > Does anyone have any other way to solve this problem or a better way of
> > using the kafka tools?
> >
> > Thanks
> > Dennis
>
>


Re: Kafka data can be stored on a Dbase instead of file system?

2014-09-09 Thread Sharninder
For your first question, I don't know of a way to do that.

For the second question, consumers commit the offset after every/a few
reads. When the consumer restarts, it starts from the last committed offset.


On Tue, Sep 9, 2014 at 7:23 PM, siddharth ubale 
wrote:

> Hi Guys,
>
> I wanted to know if we can store data sent to consumers via brokers on to
> any database like how ActiveMQ persists data to kahaDB or any other Db like
> PostGRE,SQL etc.
>
> Also,when the broker is sending message to consumer and consumer fails, so
> when the consumer is up again will the consumer pick data from where it
> last left or will it start reading from beginning if i have set the
> consumer  from beginning  while coding the consumer.
>
> Thanks,
> Siddharth Ubale
>


Re: Use case

2014-09-05 Thread Sharninder
I'm not really sure about your exact use-case but I don't think having a
topic per user is very efficient. Deleting topics in kafka, at the moment,
isn't really straightforward. You should rethink your date pipeline a bit.

Also, just because kafka has the ability to store messages for a certain
time, don't think of it as a data store. Kafka is a streaming system, think
of it as a fast queue that gives you the ability to move your pointer back.

--
Sharninder



On Fri, Sep 5, 2014 at 4:27 PM, Aris Alexis 
wrote:

> Thanks for the reply. If I use it only for activity streams like twitter:
>
> I would want a topic for each #tag and a topic for each user and maybe
> foreach city. Would that be too many topics or it doesn't matter since most
> of them will be deleted in a specified interval.
>
>
>
> Best Regards,
> Aris Giachnis
>
>
> On Fri, Sep 5, 2014 at 6:57 AM, Sharninder  wrote:
>
> > Since you want all chats and mail history persisted all the time, I
> > personally wouldn't recommend kafka for your requirement. Kafka is more
> > suitable as a streaming system where events expire after a certain time.
> > Look at something more general purpose like hbase for persisting data
> > indefinitely.
> >
> > So, for example all activity streams can go into kafka from where
> consumers
> > will pick up messages to parse and put them to hbase or other clients.
> >
> > --
> > Sharninder
> >
> >
> >
> >
> >
> > On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis 
> > wrote:
> >
> > > Hello,
> > >
> > > I am building a big web application that I want to be massively
> scalable
> > (I
> > > am using cassandra and titan as a general db).
> > >
> > > I want to implement the following:
> > >
> > > real time web chat that is persisted so that user a in the future can
> > > recall his chat with user b,c,d much like facebook.
> > > mail like messages in the web application (not sure about this as it is
> > > somewhat covered by the first one)
> > > user activity streams
> > > users subscribing to topics for example florida/musicevents
> > >
> > > Could i use kafka for this? can you recommend another technology maybe?
> > >
> >
>


Re: Use case

2014-09-04 Thread Sharninder
Since you want all chats and mail history persisted all the time, I
personally wouldn't recommend kafka for your requirement. Kafka is more
suitable as a streaming system where events expire after a certain time.
Look at something more general purpose like hbase for persisting data
indefinitely.

So, for example all activity streams can go into kafka from where consumers
will pick up messages to parse and put them to hbase or other clients.

--
Sharninder





On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis  wrote:

> Hello,
>
> I am building a big web application that I want to be massively scalable (I
> am using cassandra and titan as a general db).
>
> I want to implement the following:
>
> real time web chat that is persisted so that user a in the future can
> recall his chat with user b,c,d much like facebook.
> mail like messages in the web application (not sure about this as it is
> somewhat covered by the first one)
> user activity streams
> users subscribing to topics for example florida/musicevents
>
> Could i use kafka for this? can you recommend another technology maybe?
>


Re: Data inputs for Kafka.

2014-08-21 Thread Sharninder
You can write your producer in c# (
https://github.com/kafka-dev/kafka/blob/master/clients/csharp/README.md),
java, python or a number of other languages.

This of Kafka as a separate system that is waiting to receive data in a
particular format. The language used to send that data doesn't matter.

You can even abstract the producer with a, for eg, REST service. That way
your applications only need to send a REST request to the producer and
it'll parse it and do the needful.



On Thu, Aug 21, 2014 at 3:59 PM, siddharth ubale 
wrote:

> Hi ,
>
> In continuity to this thread, can you let me know if i  will have to write
> a kafka producer in C# if i have to get data from c# application onto the
> kafka bus or it will be handled by the native java producer API?? and like
> wise for any system generating data which is using any other technology
> other than java??
>
> Thanks,
> Siddharth Ubale
>
>
> On Thu, Aug 21, 2014 at 11:48 AM, Sharninder  wrote:
>
> > there are kafka producers/consumers in a lot of languages. There is an
> > Ecosystem page somewhere in the wiki. Take a look at that.
> >
> > Finally, I think you should experiment with both activemq and kafka and
> > then take a decision. Don't try to find a solution and fit your problem
> to
> > it, should be the other way round.
> >
> > --
> > Sharninder
> >
> >
> >
> > On Thu, Aug 21, 2014 at 11:40 AM, siddharth ubale <
> > siddharth.ub...@gmail.com
> > > wrote:
> >
> > > Hi ,
> > > Thanks for the quick follow up Philip.
> > > Also , can you let me know whether the kafka implementation can be as
> > > versatile as ActiveMQ  with regards to connectivity?
> > > I mean to say a web-app generating transactional data in a data base
> from
> > > java webapp, ruby web app or .net web app  can kafka talk to these
> > with
> > > ease as ActiveMQ does or any AMQP protocol messaging system does?
> > > finally, i would also like to know whether 0.9.2 incubating can be used
> > in
> > > production
> > >
> > > Thanks,
> > > Siddharth
> > >
> > >
> > > On Wed, Aug 20, 2014 at 9:50 PM, Philip O'Toole <
> > > philip.oto...@yahoo.com.invalid> wrote:
> > >
> > > > Kafka can ingest any kind of data, and connect to many types of
> > systems.
> > > > Much work exists in this area already, for hooking a wide variety of
> > > > systems to Kafka. If your system isn't supported, then you write a
> > Kafka
> > > > Producer to pull (or receive) messages from your system, and write
> them
> > > to
> > > > Kafka. Take a look at the Ecosystem page:
> > > >
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
> > > >
> > > >
> > > > Philip
> > > >
> > > >
> > > > -
> > > > http://www.philipotoole.com
> > > >
> > > >
> > > > On Wednesday, August 20, 2014 5:05 AM, siddharth ubale <
> > > > siddharth.ub...@gmail.com> wrote:
> > > >
> > > >
> > > >
> > > > Hi guys,
> > > >
> > > > We are implementing a messaging system in order to perform Real time
> > > > analytics. However, we are confused on using ActiveMQ or Apache
> Kafka.
> > > The
> > > > point of worry for us are :
> > > >
> > > > 1. What kind of data can be ingested by Apache kafka? If we see ,
> > Active
> > > MQ
> > > > uses the JMS which has an API to Java as well as other open source
> > > > technologies. Therefore, in future we need not worry about interface
> to
> > > or
> > > > from ActiveMQ while performing analytics. Does kafka have any feature
> > > like
> > > > this where it can use a messaging service which can handle wide range
> > of
> > > > input types and also have connectors to various other systems?
> > > >
> > > > 2. As compared to network of brokers concept in ActiveMQ do we have
> > > > something like that in kafka also??
> > > >
> > > > Thanks,
> > > > Siddharth Ubale
> > > >
> > >
> >
>


Re: Data inputs for Kafka.

2014-08-20 Thread Sharninder
there are kafka producers/consumers in a lot of languages. There is an
Ecosystem page somewhere in the wiki. Take a look at that.

Finally, I think you should experiment with both activemq and kafka and
then take a decision. Don't try to find a solution and fit your problem to
it, should be the other way round.

--
Sharninder



On Thu, Aug 21, 2014 at 11:40 AM, siddharth ubale  wrote:

> Hi ,
> Thanks for the quick follow up Philip.
> Also , can you let me know whether the kafka implementation can be as
> versatile as ActiveMQ  with regards to connectivity?
> I mean to say a web-app generating transactional data in a data base from
> java webapp, ruby web app or .net web app  can kafka talk to these with
> ease as ActiveMQ does or any AMQP protocol messaging system does?
> finally, i would also like to know whether 0.9.2 incubating can be used in
> production
>
> Thanks,
> Siddharth
>
>
> On Wed, Aug 20, 2014 at 9:50 PM, Philip O'Toole <
> philip.oto...@yahoo.com.invalid> wrote:
>
> > Kafka can ingest any kind of data, and connect to many types of systems.
> > Much work exists in this area already, for hooking a wide variety of
> > systems to Kafka. If your system isn't supported, then you write a Kafka
> > Producer to pull (or receive) messages from your system, and write them
> to
> > Kafka. Take a look at the Ecosystem page:
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
> >
> >
> > Philip
> >
> >
> > -
> > http://www.philipotoole.com
> >
> >
> > On Wednesday, August 20, 2014 5:05 AM, siddharth ubale <
> > siddharth.ub...@gmail.com> wrote:
> >
> >
> >
> > Hi guys,
> >
> > We are implementing a messaging system in order to perform Real time
> > analytics. However, we are confused on using ActiveMQ or Apache Kafka.
> The
> > point of worry for us are :
> >
> > 1. What kind of data can be ingested by Apache kafka? If we see , Active
> MQ
> > uses the JMS which has an API to Java as well as other open source
> > technologies. Therefore, in future we need not worry about interface to
> or
> > from ActiveMQ while performing analytics. Does kafka have any feature
> like
> > this where it can use a messaging service which can handle wide range of
> > input types and also have connectors to various other systems?
> >
> > 2. As compared to network of brokers concept in ActiveMQ do we have
> > something like that in kafka also??
> >
> > Thanks,
> > Siddharth Ubale
> >
>


Re: consumer read from specific partition

2014-08-18 Thread Sharninder
On Mon, Aug 18, 2014 at 7:27 PM, Josh J  wrote:

> > You can see an example of using the SimpleConsumer here
> <
>
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> >
>
> Any suggestions on where in the code to modify the high level producer to
> support reading from specific partitions ?
>
>
High level producer? I'm assuming you meant to write the high level
consumer, in which case it isn't possible. The link above, which has an
example for reading messages off a specific partition, is for the Simple
consumer, which ironically, is more complex than the high level consumer.

In short, if you have a usecase where you want to read from a specific
partition, you will need to implement a simple consumer.

--
Sharninder

Josh
> .
>
> On Thu, Aug 14, 2014 at 4:27 PM, Neha Narkhede 
> wrote:
>
> > You can see an example of using the SimpleConsumer here
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > >
> > .
> >
> >
> >
> > On Thu, Aug 14, 2014 at 3:23 AM, Sharninder 
> wrote:
> >
> > > Implement the low level "Simple Consumer".
> > >
> > > --
> > > Sharninder
> > >
> > >
> > >
> > > On Thu, Aug 14, 2014 at 2:16 PM, Josh J  wrote:
> > >
> > > > Hi,
> > > >
> > > > Suppose I have N partitions. I would like to have X different
> consumer
> > > > threads ( X < N) read from a specified set of partitions. How can I
> > > achieve
> > > > this?
> > > >
> > > > Thanks,
> > > >
> > > > Josh
> > > >
> > >
> >
>


Re: consumer read from specific partition

2014-08-14 Thread Sharninder
Implement the low level "Simple Consumer".

--
Sharninder



On Thu, Aug 14, 2014 at 2:16 PM, Josh J  wrote:

> Hi,
>
> Suppose I have N partitions. I would like to have X different consumer
> threads ( X < N) read from a specified set of partitions. How can I achieve
> this?
>
> Thanks,
>
> Josh
>


Re: Consumer is never shutdown

2014-08-05 Thread Sharninder
Without looking at your complete code, I'm only going to make some guesses.

Unless you really want to shutdown the consumer, why would you call
consumer.shutdown()? A consumer will keep waiting for new messages unless
otherwise specified.

I'm guessing you used something like while iterator.hasNext() to get
messages from the queue. This will not return unless you explicitly call
shutdown which means you will have to write a condition which calls
shutdown based on the message contents.

--
Sharninder



On Tue, Aug 5, 2014 at 10:51 AM, anand jain  wrote:

> Hi,
>
> I just started with Apache Kafka and wrote a high level consumer program
> following the example given here
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example.
>
> Though, I was able to run the program and consume the messages, I have one
> doubt regarding *consumer.shutdown()*. It has never been called. I used the
> below piece of code to verify
>  if (consumer != null) {
> System.out.println("shutting down consumer");
> consumer.shutdown();
> }
>
> Has someone encountered this before? Also, even if consumer didn't
> shutdown, I didn't notice any bottleneck. Is it really needed?
>
> Regards
> Anand
>


Re: How to use kafka as flume source.

2014-08-03 Thread Sharninder
https://github.com/baniuyao/flume-ng-kafka-source



On Sun, Aug 3, 2014 at 6:15 PM, rafeeq s  wrote:

> Hi,
>
> We are planning to use kafka as *flume source*. Please advice me, how to
> use kafka as source in flume.
>
> please share if there is any best example of *flume- kafka source- hdfs
> sink*.
>
> Regards,
>
> Rafeeq S
> *("What you do is what matters, not what you think or say or plan." )*
>


Re: Partitions per Machine for a topic

2014-07-26 Thread Sharninder
You can add partitions at runtime but not delete (AFAIK, someone correct me
if I'm wrong). The reason it's usually discouraged is that when partitions
get added to a topic, the message assignment changes and if you're
application depends on some partitioning logic, that will fail.

--
Sharninder



On Sat, Jul 26, 2014 at 2:29 AM, Kashyap Mhaisekar 
wrote:

> Great. Thanks all. But there is no easy way to change partitions once the
> topic is created right? I run kafka 0.8.2 and its a difficult job to
> reassign partitions.
>
>
> On Fri, Jul 25, 2014 at 3:10 PM, Clark Haskins <
> chask...@linkedin.com.invalid> wrote:
>
> > You can have more partitions than machines in the cluster, you cannot
> > however have a replication factor that is great than the number of
> > machines in the cluster.
> >
> > You could easily have a topic with 100 partitions on a 3 node cluster.
> >
> > -Clark
> >
> >
> > Clark Elliott Haskins III
> > LinkedIn DDS Site Reliability Engineer
> > Kafka, Zookeeper, Samza SRE
> > Mobile: 505.385.1484
> > BlueJeans: https://www.bluejeans.com/chaskins
> >
> >
> > chask...@linkedin.com
> > https://www.linkedin.com/in/clarkhaskins
> > There is no place like 127.0.0.1
> >
> >
> >
> >
> > On 7/23/14, 2:14 PM, "Kashyap Mhaisekar"  wrote:
> >
> > >HI,
> > >Is the maximum no. of partitions for a topic dependent on the no. of
> > >machines in a kafka cluster?
> > >For e.g., if I have 3 machines in a cluster, can I have 5 partitions
> with
> > >a
> > >caveat that one machine can host multiple partitions for a given topic?
> > >
> > >Regards,
> > >Kashyap
> >
> >
>