Error for partition [__consumer_offsets,15] to broker

2017-12-05 Thread Abhit Kalsotra
Hello *

I am running Kafka(*0.10.2.0*) on windows from the past one year ...

But off late there has been unique Broker issues that I have observed 4-5
times in
last 4 months.

Kafka setup cofig...


*3 ZK Instances running on 3 different Windows Servers, 7 Kafka Broker
nodes running on single windows machine with different disk for logs
directory *

*My Kafka has 2 Topics with partition size 50 each , and replication factor
of 3.*

*My partition logic selection*: Each message has a unique ID and logic of
selecting partition is ( unique ID % 50), and then calling Kafka producer
API to route a specific message to a particular topic partition .

But of-late there has been a unique case that's cropping out in Kafka
broker nodes,
[2017-12-02 02:47:40,024] ERROR [ReplicaFetcherThread-0-4], Error for
partition [__consumer_offsets,15] to broker
4:org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.
(kafka.server.ReplicaFetcherThread)

The entire server.log is filled with these logs, and its very huge too ,
please help me in understanding under what circumstances these can occur,
and what measures I need to take..

Courtesy
Abhi
!wq

-- 

If you can't succeed, call it version 1.0


Re: Running Kafka 1.0 binaries with inter.broker.protocol.version = 0.10

2017-12-05 Thread Manikumar
Hi,

1. inter.broker.protocol.version should be higher than or equal to
log.message.format.version.
So with 0.10 inter.broker.protocol.version, we can not use latest message
format and broker wont start.

2. Since other brokers in the cluster don't understand latest protocol, we
can not directly
set inter.broker.protocol.version = 1.0 and restart the broker. In first
restart, we will update the binaries
and in second restart we will change the protocol.

we should follow the steps given in the docs.

On Wed, Dec 6, 2017 at 11:21 AM, Debraj Manna 
wrote:

> Hi
>
> Anyone any thoughts?
>
>
>
> On Tue, Dec 5, 2017 at 8:38 PM, Debraj Manna 
> wrote:
>
> > Hi
> >
> > Regarding  the Kafka Rolling Upgrade steps as mentioned in the doc
> > 
> >
> > Can you let me know how is Kafka supposed to behave if the binaries are
> > upgraded to the latest 1.0 but inter.broker.protocol.version still points
> > to 0.10 in all the brokers? What features will I be missing in Kafka 1.0
> > and what problem I am expected to behave?
> >
> > Also can you let me know in rolling upgrade (from 0.10 to 1.0) if I
> follow
> > the below steps how are Kafka supposed to behave
> >
> >
> >1. Add inter.broker.protocol.version = 1.0 in a broker update the
> >binary and restart it.
> >2. Then go to the other brokers one by one and repeat the above steps
> >
> >
>


Re: Running Kafka 1.0 binaries with inter.broker.protocol.version = 0.10

2017-12-05 Thread Debraj Manna
Hi

Anyone any thoughts?



On Tue, Dec 5, 2017 at 8:38 PM, Debraj Manna 
wrote:

> Hi
>
> Regarding  the Kafka Rolling Upgrade steps as mentioned in the doc
> 
>
> Can you let me know how is Kafka supposed to behave if the binaries are
> upgraded to the latest 1.0 but inter.broker.protocol.version still points
> to 0.10 in all the brokers? What features will I be missing in Kafka 1.0
> and what problem I am expected to behave?
>
> Also can you let me know in rolling upgrade (from 0.10 to 1.0) if I follow
> the below steps how are Kafka supposed to behave
>
>
>1. Add inter.broker.protocol.version = 1.0 in a broker update the
>binary and restart it.
>2. Then go to the other brokers one by one and repeat the above steps
>
>


Re: FW: secure kafka clsuter setup

2017-12-05 Thread Manikumar
We should pass necessary ssl configs using --command-config  command-line
option
 to kafka-console-consumer.sh script

>>security.protocol=SSL
>>ssl.truststore.location=/var/private/ssl/client.truststore.jks
>>ssl.truststore.password=test1234

http://kafka.apache.org/documentation.html#security_configclients

On Wed, Dec 6, 2017 at 1:34 AM, Ashok Kumar 
wrote:

>
>
>
>
> *From:* Ashok Kumar
> *Sent:* Tuesday, December 5, 2017 3:02 PM
> *To:* 'users@kafka.apache.org' 
> *Subject:* secure kafka clsuter setup
>
>
>
> hi
>
> I am getting an issue in ssl configuration of kafka. I have attached the
> word file regarding my issue.
>
>
>
> thanks
>


Re: Comparing Pulsar and Kafka: unified queuing and streaming

2017-12-05 Thread Khurrum Nasim
Jason,

Comments inline.

On Tue, Dec 5, 2017 at 10:59 AM, Jason Gustafson  wrote:

> > I believe a lot of users are using the kafka high level consumers, it is
> > effectively an **unordered** messaging/streaming pattern. People using
> high
> > level consumers don't actually need any ordering guarantees. In this
> sense,
> > a *shared* subscription in Apache Pulsar seems to be better than current
> > Kafka's consumer group model, as it allows the consumption rate not
> limited
> > by the number of partitions, can actually grow beyond the number of
> > partitions. We do see a lot of operational pain points on production
> coming
> > from consumer lags, which I think it is very commonly seen during
> partition
> > rebalancing in a consumer group. Selective acking seems to provide a
> finer
> > granularity on acknowledgment, which can be actually good for avoiding
> > consumer lags and avoid reprocessing messages during partition rebalance.
>
>
> Yeah, I'm not sure about this. I'd be interested to understand the design
> of this feature a little better. In practice, when ordering is unimportant,
> adding partitions seems not too big of a deal.


I think it depends. You probably can address the problem by adding more
partitions, if the topic is only used by one consume group exclusive.

However it still have pain points as follows:

- in a shared organization, a topic might be shared between multiple teams.
sometimes it is really hard or not simple to increase partitions for a
topic. especially if some team wants to consume messages in order.
- even say you can easily increase the number of partitions, but it doesn't
address the consumer lag issue. because without selective acking, some of
the *acknowledged* or *processed* messages will be  redelivered again after
partitions are bounced to other consumers.



> Also, I'm aware of active
> efforts to make rebalancing less of a pain point for our users ;)
>

Can you point me the KIPs of these efforts? would love to keep an eye on
them.


>
> The last question, from users perspective, since both kafka and pulsar are
> > distributed pub/sub messaging systems and both of them at the ASF, is
> there
> > any possibility for these two projects to collaborate, e.g. kafka adopts
> > pulsar's messaging model, pulsar can use kafka streams and kafka
> connect. I
> > believe a lot of people in the mailing list might have same or similar
> > question. From end-user perspective, if such collaboration can happen,
> that
> > is going to great for users and also the ASF. I would like to hear any
> > thoughts from kafka committers and pmc members.
>
>
> I see this a little differently. Although there is some overlap between the
> projects, they have quite different underlying philosophies (as Marina
> alluded to) and I hope this will take them on different trajectories over
> time. That would ultimately benefit users more than having two competing
> projects solving all the same use cases. We don't need to try to cram
> Pulsar features into Kafka if it's not a good fit and vice versa. At the
> same time, where capabilities do overlap, we can try to learn from their
> experience and they can learn from ours. The example of message retention
> seemed like one of these instances since there are legitimate use cases and
> Pulsar's approach has some benefits.
>

sure. make sense to me.

btw, have you guys taken a look at pulsar's kafka API? I am wondering how
do you guys think about this.

- KN


>
>
> -Jason
>
>
>
> On Tue, Dec 5, 2017 at 9:57 AM, Khurrum Nasim 
> wrote:
>
> > Hi Marina,
> >
> >
> > On Tue, Dec 5, 2017 at 6:58 AM, Marina Popova 
> > wrote:
> >
> > > Hi,
> > > I don't think it would be such a great idea to start modifying the very
> > > foundation of Kafka's design to accommodate more and more extra use
> > cases.
> > > Kafka because so widely adopted and popular because its creator made a
> > > brilliant decision to make it "dumb broker - smart consumer" type of
> the
> > > system, where there is no to minimal dependencies between Kafka brokers
> > and
> > > Consumers. This is what make Kafka blazingly fast and truly scalable -
> > able
> > > to handle thousands of Consumers with no impact on performance.
> > >
> >
> > I am not sure I agree with this. I think from end-user perspective, what
> > users expect is a ultra simple streaming/messaging system: applications
> > sends messages, messaging systems store and dispatch them, consumers
> > consume the messages and tell the systems that they already consumed the
> > messages. IMO whether a centralized management or decentralize management
> > doesn't really matter here if kafka is able to do things without
> impacting
> > performance.
> >
> > sometimes people assume that smarter brokers (like traditional messaging
> > brokers) can not offer high throughput and scalability, because they do
> > "too many things". but I took a look at Pulsar documentation and their
> > presentation. There are a few metrics very impres

kafka-streams punctuate with WALL_CLOCK_TIME triggered immediately

2017-12-05 Thread frederic arno
Hello all,

I am using kafka and kafka-streams 1.0.0

When working on a custom Processor from which I am scheduling a
punctuation using WALL_CLOCK_TIME. I've noticed that whatever the
punctuation interval I set, a call to my Punctuator is always
triggered immediately. Is that a bug?

Having a quick look at kafka-streams' code, I could find that all
PunctuationSchedule's timestamps are matched against the current time
in order to decide whether or not to trigger the punctuator
(org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate).
However, I've only seen code that initializes PunctuationSchedule's
timestamp to 0, which I guess is what is causing an immediate
punctuation. At least when using WALL_CLOCK_TIME, shouldn't the
PunctuationSchedule's timestamp be initialized to current time +
interval?

I am also hitting an OutOfMemoryError when running integration tests:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3181)
at java.util.PriorityQueue.grow(PriorityQueue.java:300)
at java.util.PriorityQueue.offer(PriorityQueue.java:339)
at java.util.PriorityQueue.add(PriorityQueue.java:321)
at 
org.apache.kafka.streams.processor.internals.PunctuationQueue.mayPunctuate(PunctuationQueue.java:55)
at 
org.apache.kafka.streams.processor.internals.StreamTask.maybePunctuateSystemTime(StreamTask.java:619)
at 
org.apache.kafka.streams.processor.internals.AssignedTasks.punctuate(AssignedTasks.java:430)
at 
org.apache.kafka.streams.processor.internals.TaskManager.punctuate(TaskManager.java:324)
at 
org.apache.kafka.streams.processor.internals.StreamThread.punctuate(StreamThread.java:969)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)

I am only using WALL_CLOCK_TIME punctuation type, from a single
processor (4 instances are running as I have 4 partitions on the
processed topic). The punctuation interval is set to 1 minute, and I
am canceling the scheduler at each punctuation, re-scheduling a new
one (dealing with variable intervals).
Although I run my tests in a JVM with little heap space, I am
wondering if there could be a memory leak around there as I've not
seen where canceled PunctuationSchedule are removed from the
PunctuationQueue...

Thank you, Fred


Re: Kafka streams - regarding WordCountInteractiveQueriesExample

2017-12-05 Thread Bill Bejeck
Giridhar,

I can confirm, we'll get a patch for this soon.

Thanks for reporting.

-Bill

On Wed, Nov 29, 2017 at 8:21 AM, Bill Bejeck  wrote:

> Giridhar,
>
> Thanks for reporting this, I'll take a look.
>
> On Wed, Nov 29, 2017 at 5:37 AM, Giridhar Addepalli <
> giridhar1...@gmail.com> wrote:
>
>> Hi,
>>
>> I am newbie to Kafka streams.
>>
>> Tried below example :
>>
>> https://github.com/confluentinc/kafka-streams-examples/blob/
>> 4.0.x/src/main/java/io/confluent/examples/streams/
>> interactivequeries/WordCountInteractiveQueriesExample.java
>>
>> http://localhost:7070/state/instances
>>
>> [
>> {
>> "host": "localhost",
>> "port": 7070,
>> "storeNames": [
>> "windowed-word-count",
>> "word-count"
>> ]
>> },
>> {
>> "host": "localhost",
>> "port": 7071,
>> "storeNames": [
>> "windowed-word-count",
>> "word-count"
>> ]
>> }
>> ]
>>
>>
>>
>> I was able to query for count of a given word :
>>
>> http://localhost:7070/state/keyvalue/word-count/hello
>>
>> {
>> "key": "hello",
>> "value": 444
>> }
>>
>>
>>
>> http://localhost:7071/state/keyvalue/word-count/world
>>
>> {
>> "key": "world",
>> "value": 906
>> }
>>
>>
>> But i am not able to replicate following part(highlighted) of advertised
>> behavior in the example :
>>
>> * 5) Use your browser to hit the REST endpoint of the app instance you
>> started in step 3 to query
>> * the state managed by this application.  Note: If you are running
>> multiple app instances, you can
>> * query them arbitrarily -- if an app instance cannot satisfy a query
>> itself, it will fetch the
>> * results from the other instances.
>>
>>
>> For example , following gave 404 error
>> http://localhost:7070/state/keyvalue/word-count/world
>>
>> HTTP ERROR: 404
>>
>> Problem accessing /state/keyvalue/word-count/world. Reason:
>>
>> Not Found
>>
>> Please let me know where my expectations are going wrong.
>> Please note that i have tried both 3.3.0-post & 4.0.0-post branches.
>>
>> Thanks,
>> Giridhar.
>>
>
>


Re: Kafka streams on Kubernetes

2017-12-05 Thread Artur Mrozowski
I've created it using kubectl create -f yaml-file where I provided my
docker image info.
It looks like it is some general configuration I miss on k8 cluster. I
tried ssh, telnet and ping to the broker and nothing works. I am able to
connect to www, using for example curl.
So it is only ips on my network I am not able to reach.

/Artur

On Tue, Dec 5, 2017 at 6:32 PM, Thomas Stringer 
wrote:

> How did you create Kafka in the k8s cluster? Can you share your config?
>
> On Tue, Dec 5, 2017, 7:49 AM Artur Mrozowski  wrote:
>
> > Hi, has anyone experience with running Kafka streams application on
> > Kuberentes?
> >
> > Do I have to define any in/outbound ports?
> >  Application fails because it cannot connect to the brokers.
> >
> > Best Regards
> > Artur
> >
>


Re: Kafka Streams app error while rebalancing

2017-12-05 Thread Matthias J. Sax
Hard to say.

However, deleting state directories will not have any negative impact as
you don't use stores. Thus, why do you not want to do this?

Another workaround you can do, it to start four applications with 1
thread each -- this would isolate the instances further and avoid the
lock issue (you would need to run on different host, or on the same
host, but with different state directory for the different instances.)


In general, I would recommend to upgrade you applications -- you don't
need to upgrade the brokers for this. 1.0 versions has many additional
bug fixes. This should resolve the issues you are facing.

Otherwise, it's hard to say without logs.

Hope this helps.


-Matthias


On 12/5/17 9:35 AM, Srikanth wrote:
> Hello,
> 
> We noticed that a kafka streams app is stuck in rebalance state with below
> error.
> Two instance of the app were running fine until a rebalace was
> triggered(possibly due to network issue).
> Both app instance are running(no app restart)
> App itself doesn't create/use state store. NUM_STREAM_THREADS_CONFIG=2
> 
> I did see several tickets with similar errors that are marked as fixed. I'm
> using version 0.10.2.1.
> 
> 17/12/04 18:34:57 WARN StreamThread: Could not create task 0_2. Will retry.
> org.apache.kafka.streams.errors.LockException: task [0_2] Failed to lock
> the state directory for task 0_2
>   at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:100)
>   at
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
>   at
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
>   at
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:864)
>   at
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1237)
>   at
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1210)
>   at
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:967)
>   at
> org.apache.kafka.streams.processor.internals.StreamThread.access$600(StreamThread.java:69)
>   at
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:234)
>   at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:259)
>   at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:352)
>   at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
>   at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:290)
>   at
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1029)
>   at
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
>   at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:592)
>   at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
> 17/12/04 18:34:58 WARN StreamThread: Could not create task 0_8. Will retry.
> org.apache.kafka.streams.errors.LockException: task [0_8] Failed to lock
> the state directory for task 0_8
> 
> 
> kafka-consumer-groups --new-consumer --bootstrap-server <...> --describe
> --group GeoTest
> Note: This will only show information about consumers that use the Java
> consumer API (non-ZooKeeper-based consumers).
> 
> Warning: Consumer group 'GeoTest' is rebalancing.
> 
> I keep seeing the above lock exception continuously and app is not making
> any progress. Any idea why it is stuck?
> I read a few suggestions that required me to manually delete state
> directory. I'd like to avoid that.
> 
> Thanks,
> Srikanth
> 



signature.asc
Description: OpenPGP digital signature


Re: Comparing Pulsar and Kafka: unified queuing and streaming

2017-12-05 Thread Jason Gustafson
> I believe a lot of users are using the kafka high level consumers, it is
> effectively an **unordered** messaging/streaming pattern. People using high
> level consumers don't actually need any ordering guarantees. In this sense,
> a *shared* subscription in Apache Pulsar seems to be better than current
> Kafka's consumer group model, as it allows the consumption rate not limited
> by the number of partitions, can actually grow beyond the number of
> partitions. We do see a lot of operational pain points on production coming
> from consumer lags, which I think it is very commonly seen during partition
> rebalancing in a consumer group. Selective acking seems to provide a finer
> granularity on acknowledgment, which can be actually good for avoiding
> consumer lags and avoid reprocessing messages during partition rebalance.


Yeah, I'm not sure about this. I'd be interested to understand the design
of this feature a little better. In practice, when ordering is unimportant,
adding partitions seems not too big of a deal. Also, I'm aware of active
efforts to make rebalancing less of a pain point for our users ;)

The last question, from users perspective, since both kafka and pulsar are
> distributed pub/sub messaging systems and both of them at the ASF, is there
> any possibility for these two projects to collaborate, e.g. kafka adopts
> pulsar's messaging model, pulsar can use kafka streams and kafka connect. I
> believe a lot of people in the mailing list might have same or similar
> question. From end-user perspective, if such collaboration can happen, that
> is going to great for users and also the ASF. I would like to hear any
> thoughts from kafka committers and pmc members.


I see this a little differently. Although there is some overlap between the
projects, they have quite different underlying philosophies (as Marina
alluded to) and I hope this will take them on different trajectories over
time. That would ultimately benefit users more than having two competing
projects solving all the same use cases. We don't need to try to cram
Pulsar features into Kafka if it's not a good fit and vice versa. At the
same time, where capabilities do overlap, we can try to learn from their
experience and they can learn from ours. The example of message retention
seemed like one of these instances since there are legitimate use cases and
Pulsar's approach has some benefits.


-Jason



On Tue, Dec 5, 2017 at 9:57 AM, Khurrum Nasim 
wrote:

> Hi Marina,
>
>
> On Tue, Dec 5, 2017 at 6:58 AM, Marina Popova 
> wrote:
>
> > Hi,
> > I don't think it would be such a great idea to start modifying the very
> > foundation of Kafka's design to accommodate more and more extra use
> cases.
> > Kafka because so widely adopted and popular because its creator made a
> > brilliant decision to make it "dumb broker - smart consumer" type of the
> > system, where there is no to minimal dependencies between Kafka brokers
> and
> > Consumers. This is what make Kafka blazingly fast and truly scalable -
> able
> > to handle thousands of Consumers with no impact on performance.
> >
>
> I am not sure I agree with this. I think from end-user perspective, what
> users expect is a ultra simple streaming/messaging system: applications
> sends messages, messaging systems store and dispatch them, consumers
> consume the messages and tell the systems that they already consumed the
> messages. IMO whether a centralized management or decentralize management
> doesn't really matter here if kafka is able to do things without impacting
> performance.
>
> sometimes people assume that smarter brokers (like traditional messaging
> brokers) can not offer high throughput and scalability, because they do
> "too many things". but I took a look at Pulsar documentation and their
> presentation. There are a few metrics very impressive:
>
> https://image.slidesharecdn.com/apachepulsar-171113225233/
> 95/bdam-multitenant-and-georeplication-messaging-with-
> apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-
> 638.jpg?cb=1510613990
>
>  95/bdam-multitenant-and-georeplication-messaging-with-
> apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-
> 638.jpg?cb=1510613990>-
> 1.8 million messages/second per topic partition
> - 99pct producing latency less than 5ms with stronger durability
> - support millions of topics
> - it also supports at-least-once and effectively-once producing
>
> Those metrics sound appealing to me if pulsar supports both streaming and
> queuing. I am wondering if anyone in the community tries to do a
> performance testing or benchmark between Pulsar and Kafka. I would love to
> see such results that can help people understand both systems, pros and
> cons.
>
>
> - KN
>
>
>
> >
> > One unfortunate consequence of becoming so popular - is that more and
> more
> > people are trying to fit Kafka into their architectures not because it
> > really fits, but because everybody else

Re: Comparing Pulsar and Kafka: unified queuing and streaming

2017-12-05 Thread Khurrum Nasim
Hi Marina,


On Tue, Dec 5, 2017 at 6:58 AM, Marina Popova 
wrote:

> Hi,
> I don't think it would be such a great idea to start modifying the very
> foundation of Kafka's design to accommodate more and more extra use cases.
> Kafka because so widely adopted and popular because its creator made a
> brilliant decision to make it "dumb broker - smart consumer" type of the
> system, where there is no to minimal dependencies between Kafka brokers and
> Consumers. This is what make Kafka blazingly fast and truly scalable - able
> to handle thousands of Consumers with no impact on performance.
>

I am not sure I agree with this. I think from end-user perspective, what
users expect is a ultra simple streaming/messaging system: applications
sends messages, messaging systems store and dispatch them, consumers
consume the messages and tell the systems that they already consumed the
messages. IMO whether a centralized management or decentralize management
doesn't really matter here if kafka is able to do things without impacting
performance.

sometimes people assume that smarter brokers (like traditional messaging
brokers) can not offer high throughput and scalability, because they do
"too many things". but I took a look at Pulsar documentation and their
presentation. There are a few metrics very impressive:

https://image.slidesharecdn.com/apachepulsar-171113225233/95/bdam-multitenant-and-georeplication-messaging-with-apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-638.jpg?cb=1510613990

-
1.8 million messages/second per topic partition
- 99pct producing latency less than 5ms with stronger durability
- support millions of topics
- it also supports at-least-once and effectively-once producing

Those metrics sound appealing to me if pulsar supports both streaming and
queuing. I am wondering if anyone in the community tries to do a
performance testing or benchmark between Pulsar and Kafka. I would love to
see such results that can help people understand both systems, pros and
cons.


- KN



>
> One unfortunate consequence of becoming so popular - is that more and more
> people are trying to fit Kafka into their architectures not because it
> really fits, but because everybody else is doing so :) And this causes many
> requests to support more and more reacher functionality to be added to
> Kafka - like transactional messages, more complex acks, centralized
> consumer management, etc.
>
> If you really need those feature - there are other systems that are
> designed for that.
>
> I truly worry that if all those changes are added to Core Kafka - it will
> become just another "do it all" enterprise-level monster that will be able
> to do it all but at a price of mediocre performance and ten-fold increased
> complexity (and, thus, management and possibility of bugs). Sure, there has
> to be innovation and new features added - but maybe those that require
> major changes to the Kafka's core principles should go into separate
> frameworks, plug-ing (like Connectors) or something in that line, rather
> that packing it all into the Core Kafka.
>
> Just my 2 cents :)
>
> Marina
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>
> >  Original Message 
> > Subject: Re: Comparing Pulsar and Kafka: unified queuing and streaming
> > Local Time: December 4, 2017 2:56 PM
> > UTC Time: December 4, 2017 7:56 PM
> > From: ja...@confluent.io
> > To: d...@kafka.apache.org
> > Kafka Users 
> >
> > Hi Khurrum,
> >
> > Thanks for sharing the article. I think one interesting aspect of Pulsar
> > that stands out to me is its notion of a subscription and how it impacts
> > message retention. In Kafka, consumers are more loosely coupled and
> > retention is enforced independently of consumption. There are some
> > scenarios I can imagine where the tighter coupling might be beneficial.
> For
> > example, in Kafka Streams, we often use intermediate topics to store the
> > data in one stage of the topology's computation. These topics are
> > exclusively owned by the application and once the messages have been
> > successfully received by the next stage, we do not need to retain them
> > further. But since consumption is independent of retention, we either
> have
> > to choose a large retention time and deal with some temporary storage
> waste
> > or we use a low retention time and possibly lose some messages during an
> > outage.
> >
> > We have solved this problem to some extent in Kafka by introducing an API
> > to delete the records in a partition up to a certain offset, but this
> > effectively puts the burden of this use case on clients. It would be
> > interesting to consider whether we could do something like Pulsar in the
> > Kafka broker. For example, we have a consumer group coordinator which is
> > able to tr

Kafka Streams app error while rebalancing

2017-12-05 Thread Srikanth
Hello,

We noticed that a kafka streams app is stuck in rebalance state with below
error.
Two instance of the app were running fine until a rebalace was
triggered(possibly due to network issue).
Both app instance are running(no app restart)
App itself doesn't create/use state store. NUM_STREAM_THREADS_CONFIG=2

I did see several tickets with similar errors that are marked as fixed. I'm
using version 0.10.2.1.

17/12/04 18:34:57 WARN StreamThread: Could not create task 0_2. Will retry.
org.apache.kafka.streams.errors.LockException: task [0_2] Failed to lock
the state directory for task 0_2
  at
org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:100)
  at
org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
  at
org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
  at
org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:864)
  at
org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1237)
  at
org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1210)
  at
org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:967)
  at
org.apache.kafka.streams.processor.internals.StreamThread.access$600(StreamThread.java:69)
  at
org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:234)
  at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:259)
  at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:352)
  at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
  at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:290)
  at
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1029)
  at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
  at
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:592)
  at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
17/12/04 18:34:58 WARN StreamThread: Could not create task 0_8. Will retry.
org.apache.kafka.streams.errors.LockException: task [0_8] Failed to lock
the state directory for task 0_8


kafka-consumer-groups --new-consumer --bootstrap-server <...> --describe
--group GeoTest
Note: This will only show information about consumers that use the Java
consumer API (non-ZooKeeper-based consumers).

Warning: Consumer group 'GeoTest' is rebalancing.

I keep seeing the above lock exception continuously and app is not making
any progress. Any idea why it is stuck?
I read a few suggestions that required me to manually delete state
directory. I'd like to avoid that.

Thanks,
Srikanth


Re: Kafka streams on Kubernetes

2017-12-05 Thread Thomas Stringer
How did you create Kafka in the k8s cluster? Can you share your config?

On Tue, Dec 5, 2017, 7:49 AM Artur Mrozowski  wrote:

> Hi, has anyone experience with running Kafka streams application on
> Kuberentes?
>
> Do I have to define any in/outbound ports?
>  Application fails because it cannot connect to the brokers.
>
> Best Regards
> Artur
>


Running Kafka 1.0 binaries with inter.broker.protocol.version = 0.10

2017-12-05 Thread Debraj Manna
Hi

Regarding  the Kafka Rolling Upgrade steps as mentioned in the doc


Can you let me know how is Kafka supposed to behave if the binaries are
upgraded to the latest 1.0 but inter.broker.protocol.version still points
to 0.10 in all the brokers? What features will I be missing in Kafka 1.0
and what problem I am expected to behave?

Also can you let me know in rolling upgrade (from 0.10 to 1.0) if I follow
the below steps how are Kafka supposed to behave


   1. Add inter.broker.protocol.version = 1.0 in a broker update the binary
   and restart it.
   2. Then go to the other brokers one by one and repeat the above steps


Re: Comparing Pulsar and Kafka: unified queuing and streaming

2017-12-05 Thread Marina Popova
Hi,
I don't think it would be such a great idea to start modifying the very 
foundation of Kafka's design to accommodate more and more extra use cases.
Kafka because so widely adopted and popular because its creator made a 
brilliant decision to make it "dumb broker - smart consumer" type of the 
system, where there is no to minimal dependencies between Kafka brokers and 
Consumers. This is what make Kafka blazingly fast and truly scalable - able to 
handle thousands of Consumers with no impact on performance.

One unfortunate consequence of becoming so popular - is that more and more 
people are trying to fit Kafka into their architectures not because it really 
fits, but because everybody else is doing so :) And this causes many requests 
to support more and more reacher functionality to be added to Kafka - like 
transactional messages, more complex acks, centralized consumer management, etc.

If you really need those feature - there are other systems that are designed 
for that.

I truly worry that if all those changes are added to Core Kafka - it will 
become just another "do it all" enterprise-level monster that will be able to 
do it all but at a price of mediocre performance and ten-fold increased 
complexity (and, thus, management and possibility of bugs). Sure, there has to 
be innovation and new features added - but maybe those that require major 
changes to the Kafka's core principles should go into separate frameworks, 
plug-ing (like Connectors) or something in that line, rather that packing it 
all into the Core Kafka.

Just my 2 cents :)

Marina

Sent with [ProtonMail](https://protonmail.com) Secure Email.

>  Original Message 
> Subject: Re: Comparing Pulsar and Kafka: unified queuing and streaming
> Local Time: December 4, 2017 2:56 PM
> UTC Time: December 4, 2017 7:56 PM
> From: ja...@confluent.io
> To: d...@kafka.apache.org
> Kafka Users 
>
> Hi Khurrum,
>
> Thanks for sharing the article. I think one interesting aspect of Pulsar
> that stands out to me is its notion of a subscription and how it impacts
> message retention. In Kafka, consumers are more loosely coupled and
> retention is enforced independently of consumption. There are some
> scenarios I can imagine where the tighter coupling might be beneficial. For
> example, in Kafka Streams, we often use intermediate topics to store the
> data in one stage of the topology's computation. These topics are
> exclusively owned by the application and once the messages have been
> successfully received by the next stage, we do not need to retain them
> further. But since consumption is independent of retention, we either have
> to choose a large retention time and deal with some temporary storage waste
> or we use a low retention time and possibly lose some messages during an
> outage.
>
> We have solved this problem to some extent in Kafka by introducing an API
> to delete the records in a partition up to a certain offset, but this
> effectively puts the burden of this use case on clients. It would be
> interesting to consider whether we could do something like Pulsar in the
> Kafka broker. For example, we have a consumer group coordinator which is
> able to track the progress of the group through its committed offsets. It
> might be possible to extend it to automatically delete records in a topic
> after offsets are committed if the topic is known to be exclusively owned
> by the consumer group. We already have the DeleteRecords API that need, so
> maybe this is "just" a matter of some additional topic metadata. I'd be
> interested to hear whether this kind of use case is common among our users.
>
> -Jason
>
> On Sun, Dec 3, 2017 at 10:29 PM, Khurrum Nasim khurrumnas...@gmail.com
> wrote:
>
>> Dear Kafka Community,
>> I happened to read this blog post comparing the messaging model between
>> Apache Pulsar and Apache Kafka. It sounds interesting. Apache Pulsar claims
>> to unify streaming (kafka) and queuing (rabbitmq) in one unified API.
>> Pulsar also seems to support Kafka API. Have anyone taken a look at Pulsar?
>> How does the community think about this? Pulsar is also an Apache project.
>> Is there any collaboration can happen between these two projects?
>> https://streaml.io/blog/pulsar-streaming-queuing/
>> BTW, I am a Kafka user, loving Kafka a lot. Just try to see what other
>> people think about this.
>>
>> - KN

Kafka streams on Kubernetes

2017-12-05 Thread Artur Mrozowski
Hi, has anyone experience with running Kafka streams application on
Kuberentes?

Do I have to define any in/outbound ports?
 Application fails because it cannot connect to the brokers.

Best Regards
Artur


Need help in distributing large files using kafka

2017-12-05 Thread Santosh Kumar J P
Hi,

We are using Kafka to distribute content to multiple clients where each
individual client will have unique group-id. We have a requirement of
distributing 25MB files to each clients. Following are the some of the
issue I feel I may face.

1.  If too many clients ( 1 devices) try to connect to Kafka for
fetching 25MB of multiple files, there could be connectivity issue with
Kafka server
2. Network bandwidth issue may occur as too many clients try to fetch data
from Kafka.

Please let me know your suggestion.

Thanks,
Santosh


Can we restrict number of connection/consumer to kafka topic partition

2017-12-05 Thread Santosh Kumar J P
Hi,

Can we restrict number of consumer groups to subscribe to a Kafka topic
partition.

Thanks,
Santosh