Re: Question on Kafka Streams

2019-04-01 Thread Guozhang Wang
Hello Mark,

That's a very general question, and the answer depends on your hardware,
your computational logic etc. Could you elaborate a bit more on your use
case?


Guozhang

On Mon, Apr 1, 2019 at 11:16 AM Mark Fursht 
wrote:

> Hello,
>
> I would like to know, how many opened windows Kafka streams can hold?
>
>
>
> Sincerely, Mark
>
> [image: cid:image001.png@01D3B960.F400ABC0]
>
>
>
>
> This message has been scanned for malware by Websense. www.websense.com
>


-- 
-- Guozhang


Re: leader none, with only one replicat end no ISR

2019-04-01 Thread Harper Henn
Hi Adrien,

What happens when you try running the preferred replica leader election
script? Does this restore leadership for those partitions you listed?
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-1.PreferredReplicaLeaderElectionTool

Harper

On Sat, Mar 30, 2019 at 12:19 AM Adrien Ruffie  wrote:

> Hi all,
>
> I would like to know if nobody is able to answer about several questions
> that I asked for a few months, or if I was just banished from the mailing
> list ...
>
> thank a lot & best regards,
>
> Adrien
>
> -- Forwarded message -
> De : Adrien Ruffie 
> Date: jeu. 28 mars 2019 à 12:18
> Subject: leader none, with only one replicat end no ISR
> To: 
>
>
> Hello all,
>
> since yesterday several of my topics have the following description:
>
>
> ./kafka-topics.sh --zookeeper ctl1.edeal.online:2181 --describe | grep -P
> "none"
>   !2032
> Topic: edeal_cell_dev Partition: 0 Leader: none Replicas: 5 Isr:
> Topic: edeal_number_dev Partition: 0 Leader: none Replicas: 8 Isr:
> Topic: edeal_timeup_dev Partition: 0 Leader: none Replicas: 8 Isr:
>
>
>
> Without leader, only one replicas and no isr ... I tried to delete it by
> --delete from kafka-topics.sh,
> but nothing change.
> After that I tried to do this:
>
> https://medium.com/@contactsunny/manually-delete-apache-kafka-topics-424c7e016ff3
>
> but any against ... the /brokers/topics/edeal_cell{number/timeup}_dev keep
> always, but without partition ...
>
> I'd run out of ideas ...
> could someone please help me?
>
> thank a lot.
>
> Adrian
>


Question on Kafka Streams

2019-04-01 Thread Mark Fursht
Hello,
I would like to know, how many opened windows Kafka streams can hold?

Sincerely, Mark
[cid:image001.png@01D3B960.F400ABC0]



This message has been scanned for malware by Websense. www.websense.com


Strange KafkaConsumer IllegalStateException

2019-04-01 Thread Mark Anderson
Hi list,

I've a question regarding a stack trace I see with the 2.2.0 consumer

java.lang.IllegalStateException: No entry found for connection 0|
  at
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:339)|
  at
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:143)|

  at
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:926)|

  at
org.apache.kafka.clients.NetworkClient.access$700(NetworkClient.java:67)|
  at
org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1090)|

  at
org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:976)|

  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:533)|
  at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:265)|

  at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)|

  at
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1256)|

  at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1200)|
at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1176)

>From looking at the code in NetworkClient.initiateConnect I'm struggling to
see how the nodeConnectionId could have been removed from the connection
states between connecting call at line 917 and the exception being thrown
at line 926. Can anyone shed any light?

Thanks,
Mark


Re: [External] Re: Need help to find references to antipatterns/pitfalls/incorrect ways to use Kafka

2019-04-01 Thread Tauzell, Dave
If somebody insists on using Kafka as a database you might be able to do the 
following:

1. Create a "compacted topic".   The key for the topic should be the  point of 
sales id.
2. Create a webservice which takes a point of sale id and can read or update 
the topic
3. Have the point of sale apps periodically call the webservice


This won't work if the price list is truly a stream of things rather than "just 
get the latest list".

-Dave

On 3/31/19, 7:01 PM, "Peter Bukowinski"  wrote:

I don’t want to be a downer, but because kafka is relatively new, the 
reference material you seek probably doesn’t exist. Kafka is flexible and can 
be made to work in many different scenarios — not all of the ideal.

It sounds like you’ve already reached a conclusion that kafka is the wrong 
solution for your requirements. Please share with us the evidence that you used 
to reach this conclusion. It would be helpful if you described the technical 
problems you encountered in your experiments so that others can give their 
opinion on whether can can be resolved or whether they are deal-breakers.

--
Peter

> On Mar 31, 2019, at 4:24 PM,   
wrote:
>
> Hello!
>
>
>
> I ask for your help in connection with the my recent task:
>
> - Price lists are delivered to 20,000 points of sale with a frequency of 
<10
> price lists per day.
>
> - The order in which the price lists follow is important. It is also
> important that the price lists are delivered to the point of sale online.
>
> - At each point of sale, an agent application is deployed, which processes
> the received price lists.
>
>
>
> This task is not particularly difficult. Help in solving the task is not
> required.
>
>
>
> The difficulty is that Kafka in our company is a new "silver bullet", and
> the project manager requires me to implement the following technical
> decision:
>
> deploy 20,000 Kafka consumer instances (one instance for each point of 
sale)
> for one topic partitioned into 20,000 partitions - one partition per
> consumer.
>
> Technical problems obtained in experiments with this technical decision do
> not convince him.
>
>
>
> Please give me references to the books/documents/blogposts. which clearly
> shows that Kafka not intended for this way to use (references to other
> anti-patterns/pitfalls will be useful).
>
> My own attempts to find such references were unsuccessful.
>
>
>
> Thank you!
>
>
>



This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


Re: Need help to find references to antipatterns/pitfalls/incorrect ways to use Kafka

2019-04-01 Thread Alexander Kuterin
Thanks, Vincent!

Yes, i totally agree about coupling/decoupling. The need for the producer
to send messages to a specific partition and the consumer to consume from a
specific partition creates high coupling between them. So in the future
changing the topic partitioning scheme will be impossible without
reconfiguring/restarting consumers.

пн, 1 апр. 2019 г., 13:48 Vincent Maurin :

> Maybe you can find some litterature about messaging patterns ?
>
> Usually, a single kafka topic is used to do PubSub pattern, i.e decoupling
> producers and consumers.
> In your case, it seems that the situation is quite coupled, i.e you need to
> generate and send 20k price lists to 20k specific consumers (so it is
> looking like more what is describe as PAIR for 0mq
>
> https://learning-0mq-with-pyzmq.readthedocs.io/en/latest/pyzmq/patterns/pair.html
> )
>
> I could still see Kafka as an option, but I am not sure about using
> partitioning for the split. I would rather go for one topics per point of
> sales with like the POS id in the name, with a single partition each.
> You guarantee the order, you know where to produce, where to consume from
> and you don't have any complicated consumption logic to implement. You just
> probably need to take care of the topic management programmatically.
>
> Best
>
>
> On Mon, Apr 1, 2019 at 11:26 AM Dimitry Lvovsky 
> wrote:
>
> > Going off of what Hans mentioned, I don't see any reason for 200,000
> > partitions...you don't need one partition per POS. You can have all of
> your
> > pos listening to one partition and each pos agent having a unique group
> > id.  The POS agent only processes the messages that are relevant to him,
> > and simply ignores the rest.  If you have bandwidth or processing power
> > concerns could also consider subdividing topics ( not partitions ) per
> > regions.
> >
> > Dimitry
> >
> > On Mon, Apr 1, 2019 at 8:16 AM Hans Jespersen  wrote:
> >
> > > Yes but you have more than 1 POS terminal per location so you still
> don't
> > > need 20,000 partitions. Just one per location. How many locations do
> you
> > > have?
> > >
> > > In doesn’t matter anyway since you can build a Kafka cluster with up to
> > > 200,000 partitions if you use the latest versions of Kafka.
> > >
> > >
> >
> https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions
> > >
> > > “As a rule of thumb, we recommend each broker to have up to 4,000
> > > partitions and each cluster to have up to 200,000 partitions”
> > >
> > > -hans
> > >
> > > > On Apr 1, 2019, at 2:02 AM, Alexander Kuterin 
> > > wrote:
> > > >
> > > > Thanks, Hans!
> > > > We use location specific SKU pricing and send specific price lists to
> > the
> > > > specific POS terminal.
> > > >
> > > > пн, 1 апр. 2019 г., 3:01 Hans Jespersen :
> > > >
> > > >> Doesn’t every one of the 20,000 POS terminals want to get the same
> > price
> > > >> list messages? If so then there is no need for 20,000 partitions.
> > > >>
> > > >> -hans
> > > >>
> > > >>> On Mar 31, 2019, at 7:24 PM,  <
> > akute...@gmail.com>
> > > >> wrote:
> > > >>>
> > > >>> Hello!
> > > >>>
> > > >>>
> > > >>>
> > > >>> I ask for your help in connection with the my recent task:
> > > >>>
> > > >>> - Price lists are delivered to 20,000 points of sale with a
> frequency
> > > of
> > > >> <10
> > > >>> price lists per day.
> > > >>>
> > > >>> - The order in which the price lists follow is important. It is
> also
> > > >>> important that the price lists are delivered to the point of sale
> > > online.
> > > >>>
> > > >>> - At each point of sale, an agent application is deployed, which
> > > >> processes
> > > >>> the received price lists.
> > > >>>
> > > >>>
> > > >>>
> > > >>> This task is not particularly difficult. Help in solving the task
> is
> > > not
> > > >>> required.
> > > >>>
> > > >>>
> > > >>>
> > > >>> The difficulty is that Kafka in our company is a new "silver
> bullet",
> > > and
> > > >>> the project manager requires me to implement the following
> technical
> > > >>> decision:
> > > >>>
> > > >>> deploy 20,000 Kafka consumer instances (one instance for each point
> > of
> > > >> sale)
> > > >>> for one topic partitioned into 20,000 partitions - one partition
> per
> > > >>> consumer.
> > > >>>
> > > >>> Technical problems obtained in experiments with this technical
> > decision
> > > >> do
> > > >>> not convince him.
> > > >>>
> > > >>>
> > > >>>
> > > >>> Please give me references to the books/documents/blogposts. which
> > > clearly
> > > >>> shows that Kafka not intended for this way to use (references to
> > other
> > > >>> anti-patterns/pitfalls will be useful).
> > > >>>
> > > >>> My own attempts to find such references were unsuccessful.
> > > >>>
> > > >>>
> > > >>>
> > > >>> Thank you!
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > >
> >
>


Re: Need help to find references to antipatterns/pitfalls/incorrect ways to use Kafka

2019-04-01 Thread Alexander Kuterin
Thanks, Dmitry.

It seems your solution is the the most appropriate for me, because the
reasons why consumers (POS terminals) will be added/removed are different
from the reasons why partitions will be added/removed.

I think that topic division to partitions and POS terminals division to
logical groups (each POS terminal from one group consumes one partition)
must be independent from each other. Creating specific partition for
specific POS terminal is a pitfall.


пн, 1 апр. 2019 г., 12:26 Dimitry Lvovsky :

> Going off of what Hans mentioned, I don't see any reason for 200,000
> partitions...you don't need one partition per POS. You can have all of your
> pos listening to one partition and each pos agent having a unique group
> id.  The POS agent only processes the messages that are relevant to him,
> and simply ignores the rest.  If you have bandwidth or processing power
> concerns could also consider subdividing topics ( not partitions ) per
> regions.
>
> Dimitry
>
> On Mon, Apr 1, 2019 at 8:16 AM Hans Jespersen  wrote:
>
> > Yes but you have more than 1 POS terminal per location so you still don't
> > need 20,000 partitions. Just one per location. How many locations do you
> > have?
> >
> > In doesn’t matter anyway since you can build a Kafka cluster with up to
> > 200,000 partitions if you use the latest versions of Kafka.
> >
> >
> https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions
> >
> > “As a rule of thumb, we recommend each broker to have up to 4,000
> > partitions and each cluster to have up to 200,000 partitions”
> >
> > -hans
> >
> > > On Apr 1, 2019, at 2:02 AM, Alexander Kuterin 
> > wrote:
> > >
> > > Thanks, Hans!
> > > We use location specific SKU pricing and send specific price lists to
> the
> > > specific POS terminal.
> > >
> > > пн, 1 апр. 2019 г., 3:01 Hans Jespersen :
> > >
> > >> Doesn’t every one of the 20,000 POS terminals want to get the same
> price
> > >> list messages? If so then there is no need for 20,000 partitions.
> > >>
> > >> -hans
> > >>
> > >>> On Mar 31, 2019, at 7:24 PM,  <
> akute...@gmail.com>
> > >> wrote:
> > >>>
> > >>> Hello!
> > >>>
> > >>>
> > >>>
> > >>> I ask for your help in connection with the my recent task:
> > >>>
> > >>> - Price lists are delivered to 20,000 points of sale with a frequency
> > of
> > >> <10
> > >>> price lists per day.
> > >>>
> > >>> - The order in which the price lists follow is important. It is also
> > >>> important that the price lists are delivered to the point of sale
> > online.
> > >>>
> > >>> - At each point of sale, an agent application is deployed, which
> > >> processes
> > >>> the received price lists.
> > >>>
> > >>>
> > >>>
> > >>> This task is not particularly difficult. Help in solving the task is
> > not
> > >>> required.
> > >>>
> > >>>
> > >>>
> > >>> The difficulty is that Kafka in our company is a new "silver bullet",
> > and
> > >>> the project manager requires me to implement the following technical
> > >>> decision:
> > >>>
> > >>> deploy 20,000 Kafka consumer instances (one instance for each point
> of
> > >> sale)
> > >>> for one topic partitioned into 20,000 partitions - one partition per
> > >>> consumer.
> > >>>
> > >>> Technical problems obtained in experiments with this technical
> decision
> > >> do
> > >>> not convince him.
> > >>>
> > >>>
> > >>>
> > >>> Please give me references to the books/documents/blogposts. which
> > clearly
> > >>> shows that Kafka not intended for this way to use (references to
> other
> > >>> anti-patterns/pitfalls will be useful).
> > >>>
> > >>> My own attempts to find such references were unsuccessful.
> > >>>
> > >>>
> > >>>
> > >>> Thank you!
> > >>>
> > >>>
> > >>>
> > >>
> >
>


Re: Need help to find references to antipatterns/pitfalls/incorrect ways to use Kafka

2019-04-01 Thread Vincent Maurin
Maybe you can find some litterature about messaging patterns ?

Usually, a single kafka topic is used to do PubSub pattern, i.e decoupling
producers and consumers.
In your case, it seems that the situation is quite coupled, i.e you need to
generate and send 20k price lists to 20k specific consumers (so it is
looking like more what is describe as PAIR for 0mq
https://learning-0mq-with-pyzmq.readthedocs.io/en/latest/pyzmq/patterns/pair.html
)

I could still see Kafka as an option, but I am not sure about using
partitioning for the split. I would rather go for one topics per point of
sales with like the POS id in the name, with a single partition each.
You guarantee the order, you know where to produce, where to consume from
and you don't have any complicated consumption logic to implement. You just
probably need to take care of the topic management programmatically.

Best


On Mon, Apr 1, 2019 at 11:26 AM Dimitry Lvovsky  wrote:

> Going off of what Hans mentioned, I don't see any reason for 200,000
> partitions...you don't need one partition per POS. You can have all of your
> pos listening to one partition and each pos agent having a unique group
> id.  The POS agent only processes the messages that are relevant to him,
> and simply ignores the rest.  If you have bandwidth or processing power
> concerns could also consider subdividing topics ( not partitions ) per
> regions.
>
> Dimitry
>
> On Mon, Apr 1, 2019 at 8:16 AM Hans Jespersen  wrote:
>
> > Yes but you have more than 1 POS terminal per location so you still don't
> > need 20,000 partitions. Just one per location. How many locations do you
> > have?
> >
> > In doesn’t matter anyway since you can build a Kafka cluster with up to
> > 200,000 partitions if you use the latest versions of Kafka.
> >
> >
> https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions
> >
> > “As a rule of thumb, we recommend each broker to have up to 4,000
> > partitions and each cluster to have up to 200,000 partitions”
> >
> > -hans
> >
> > > On Apr 1, 2019, at 2:02 AM, Alexander Kuterin 
> > wrote:
> > >
> > > Thanks, Hans!
> > > We use location specific SKU pricing and send specific price lists to
> the
> > > specific POS terminal.
> > >
> > > пн, 1 апр. 2019 г., 3:01 Hans Jespersen :
> > >
> > >> Doesn’t every one of the 20,000 POS terminals want to get the same
> price
> > >> list messages? If so then there is no need for 20,000 partitions.
> > >>
> > >> -hans
> > >>
> > >>> On Mar 31, 2019, at 7:24 PM,  <
> akute...@gmail.com>
> > >> wrote:
> > >>>
> > >>> Hello!
> > >>>
> > >>>
> > >>>
> > >>> I ask for your help in connection with the my recent task:
> > >>>
> > >>> - Price lists are delivered to 20,000 points of sale with a frequency
> > of
> > >> <10
> > >>> price lists per day.
> > >>>
> > >>> - The order in which the price lists follow is important. It is also
> > >>> important that the price lists are delivered to the point of sale
> > online.
> > >>>
> > >>> - At each point of sale, an agent application is deployed, which
> > >> processes
> > >>> the received price lists.
> > >>>
> > >>>
> > >>>
> > >>> This task is not particularly difficult. Help in solving the task is
> > not
> > >>> required.
> > >>>
> > >>>
> > >>>
> > >>> The difficulty is that Kafka in our company is a new "silver bullet",
> > and
> > >>> the project manager requires me to implement the following technical
> > >>> decision:
> > >>>
> > >>> deploy 20,000 Kafka consumer instances (one instance for each point
> of
> > >> sale)
> > >>> for one topic partitioned into 20,000 partitions - one partition per
> > >>> consumer.
> > >>>
> > >>> Technical problems obtained in experiments with this technical
> decision
> > >> do
> > >>> not convince him.
> > >>>
> > >>>
> > >>>
> > >>> Please give me references to the books/documents/blogposts. which
> > clearly
> > >>> shows that Kafka not intended for this way to use (references to
> other
> > >>> anti-patterns/pitfalls will be useful).
> > >>>
> > >>> My own attempts to find such references were unsuccessful.
> > >>>
> > >>>
> > >>>
> > >>> Thank you!
> > >>>
> > >>>
> > >>>
> > >>
> >
>


Re: Need help to find references to antipatterns/pitfalls/incorrect ways to use Kafka

2019-04-01 Thread Dimitry Lvovsky
Going off of what Hans mentioned, I don't see any reason for 200,000
partitions...you don't need one partition per POS. You can have all of your
pos listening to one partition and each pos agent having a unique group
id.  The POS agent only processes the messages that are relevant to him,
and simply ignores the rest.  If you have bandwidth or processing power
concerns could also consider subdividing topics ( not partitions ) per
regions.

Dimitry

On Mon, Apr 1, 2019 at 8:16 AM Hans Jespersen  wrote:

> Yes but you have more than 1 POS terminal per location so you still don't
> need 20,000 partitions. Just one per location. How many locations do you
> have?
>
> In doesn’t matter anyway since you can build a Kafka cluster with up to
> 200,000 partitions if you use the latest versions of Kafka.
>
> https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions
>
> “As a rule of thumb, we recommend each broker to have up to 4,000
> partitions and each cluster to have up to 200,000 partitions”
>
> -hans
>
> > On Apr 1, 2019, at 2:02 AM, Alexander Kuterin 
> wrote:
> >
> > Thanks, Hans!
> > We use location specific SKU pricing and send specific price lists to the
> > specific POS terminal.
> >
> > пн, 1 апр. 2019 г., 3:01 Hans Jespersen :
> >
> >> Doesn’t every one of the 20,000 POS terminals want to get the same price
> >> list messages? If so then there is no need for 20,000 partitions.
> >>
> >> -hans
> >>
> >>> On Mar 31, 2019, at 7:24 PM,  
> >> wrote:
> >>>
> >>> Hello!
> >>>
> >>>
> >>>
> >>> I ask for your help in connection with the my recent task:
> >>>
> >>> - Price lists are delivered to 20,000 points of sale with a frequency
> of
> >> <10
> >>> price lists per day.
> >>>
> >>> - The order in which the price lists follow is important. It is also
> >>> important that the price lists are delivered to the point of sale
> online.
> >>>
> >>> - At each point of sale, an agent application is deployed, which
> >> processes
> >>> the received price lists.
> >>>
> >>>
> >>>
> >>> This task is not particularly difficult. Help in solving the task is
> not
> >>> required.
> >>>
> >>>
> >>>
> >>> The difficulty is that Kafka in our company is a new "silver bullet",
> and
> >>> the project manager requires me to implement the following technical
> >>> decision:
> >>>
> >>> deploy 20,000 Kafka consumer instances (one instance for each point of
> >> sale)
> >>> for one topic partitioned into 20,000 partitions - one partition per
> >>> consumer.
> >>>
> >>> Technical problems obtained in experiments with this technical decision
> >> do
> >>> not convince him.
> >>>
> >>>
> >>>
> >>> Please give me references to the books/documents/blogposts. which
> clearly
> >>> shows that Kafka not intended for this way to use (references to other
> >>> anti-patterns/pitfalls will be useful).
> >>>
> >>> My own attempts to find such references were unsuccessful.
> >>>
> >>>
> >>>
> >>> Thank you!
> >>>
> >>>
> >>>
> >>
>


Re: Offsets of deleted consumer groups do not get deleted correctly

2019-04-01 Thread Patrik Kleindl
Hi Claudia
Just a sidenote, there is a combined policy for "compact, delete" which
deletes messages older than retention.ms and compacts newer ones if I
remember correctly.
It's still not really in the docs as it seems
https://kafka.apache.org/documentation/#topicconfigs
best regards
Patrik

On Mon, 1 Apr 2019 at 10:02, Claudia Wegmann  wrote:

> Hi,
>
> thanks for your reply.
>
> Also the groups were deleted month ago, there are still valid values
> there. So I guess deleting the group did not produce the tombstone record
> correctly. Your explanation made it clearer for me. I know I should keep
> cleanup policy compact in general. I decided to switch to policy for just
> long enough, so that these old records get deleted. I guess I could have
> produced the tombstone recods to be on the safe side.
>
> Thanks,
> Claudia
>
> -Ursprüngliche Nachricht-
> Von: Vincent Maurin 
> Gesendet: Freitag, 29. März 2019 15:24
> An: users@kafka.apache.org
> Betreff: Re: Offsets of deleted consumer groups do not get deleted
> correctly
>
> Hi,
>
> You should keep the policy compact for the topic __consumer_offsets This
> topic stores for each group/topic/partition the offset consumed. As only
> the latest message for a group/topic/partition is relevant, the policy
> compact will keep only this message. When you delete a group, actually it
> will produce a tombstone to this topic (i.e body NULL). Then when the log
> compaction is running, it will definitively remove the tombstone.
> But to have an effective delete of the tombstones, keep in mind :
> * compaction runs only on rolled out segments
> * deletion of tombstone only occurs if the delete.retention.ms delay is
> expired
>
> Best regards
>
> On Fri, Mar 29, 2019 at 2:16 PM Claudia Wegmann 
> wrote:
>
> > Hey there,
> >
> > I've got the problem that the "__consumer_offsets" topic grows pretty
> > big over time. After some digging, I found offsets for consumer groups
> > that were deleted a long time ago still being present in the topic.
> > Many of them are offsets for console consumers, that have been deleted
> > with "kafka-consumer-groups.sh --delete --group ...".
> >
> > As far as I understand log cleaning, those offsets should have been
> > deleted a long time ago, because these consumers are no longer active.
> > When I query "kafka-consumer-groups.sh --bootstrap-server ...  --list"
> > I don't see those consumers either.
> >
> > Is there a bug in "kafka-consumer-groups.sh --delete --group ..." that
> > let's kafka hang on to those consumer groups?
> >
> > How can I get the log cleaner to delete these old offsets? Is there
> > another way than setting "cleanup.policy" to "delete"?
> >
> > Thanks for our help!
> >
> > Best,
> > Claudia
> >
>


plaintext connection attempts to SSL secured broker

2019-04-01 Thread jorg . heymans
Hi,

We have our brokers secured with these standard properties

listeners=SSL://a.b.c:9030
ssl.truststore.location=...
ssl.truststore.password=...
ssl.keystore.location=...
ssl.keystore.password=...
ssl.key.password=...
ssl.client.auth=required
ssl.enabled.protocols=TLSv1.2

It's a bit surprising to see that when a (java) client attempts to connect 
without having SSL configured, so doing a PLAINTEXT connection instead, it does 
not get a TLS exception indicating that SSL is required. Somehow i would have 
expected a hard transport-level exception making it clear that non-SSL 
connections are not allowed, instead the client sees this (when debug logging 
is enabled)

[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : 
21234bee31165527
[main] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
clientId=consumer-1, groupId=my-test-group] Kafka consumer initialized
[main] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
clientId=consumer-1, groupId=my-test-group] Subscribed to topic(s): events
[main] DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator - 
[Consumer clientId=consumer-1, groupId=my-test-group] Sending FindCoordinator 
request to broker a.b.c:9030 (id: -1 rack: null)
[main] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer 
clientId=consumer-1, groupId=my-test-group] Initiating connection to node 
a.b.c:9030 (id: -1 rack: null) using address /a.b.c
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name 
node--1.bytes-sent
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name 
node--1.bytes-received
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name 
node--1.latency
[main] DEBUG org.apache.kafka.common.network.Selector - [Consumer 
clientId=consumer-1, groupId=my-test-group] Created socket with SO_RCVBUF = 
65536, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node -1
[main] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer 
clientId=consumer-1, groupId=my-test-group] Completed connection to node -1. 
Fetching API versions.
[main] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer 
clientId=consumer-1, groupId=my-test-group] Initiating API versions fetch from 
node -1.
[main] DEBUG org.apache.kafka.common.network.Selector - [Consumer 
clientId=consumer-1, groupId=my-test-group] Connection with /a.b.c disconnected
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:119)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:381)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:342)
at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:609)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:541)
at org.apache.kafka.common.network.Selector.poll(Selector.java:467)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:265)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:231)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:316)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1214)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1179)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1164)
at eu.europa.ec.han.TestConsumer.main(TestConsumer.java:22)
[main] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer 
clientId=consumer-1, groupId=my-test-group] Node -1 disconnected.
[main] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient 
- [Consumer clientId=consumer-1, groupId=my-test-group] Cancelled request with 
header RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=2, 
clientId=consumer-1, correlationId=0) due to node -1 being disconnected
[main] DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator - 
[Consumer clientId=consumer-1, groupId=my-test-group] Coordinator discovery 
failed, refreshing metadata
[main] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer 
clientId=consumer-1, groupId=my-test-group] Give up sending metadata request 
since no node is available
[main] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer 
clientId=consumer-1, groupId=my-test-group] Give up sending metadata request 
since no node is available
[main] DEBUG org.apache.kafka.clients.NetworkClient - 

Re: Kafka Streams upgrade.from config while upgrading to 2.1.0

2019-04-01 Thread Anirudh Vyas
Thanks a lot, Guozhang! That clarifies it to a great extent. I'll try to
figure out who the leader of the group is.

On Wed, Mar 27, 2019 at 9:59 PM Guozhang Wang  wrote:

> Hello Anirudh,
>
> The config `upgrade.from` is recommended for safe and smooth upgrade. In
> your case it is possible that when rolling bounce the instances the first
> upgraded instance happen to be the leader of the group and hence even
> without the config it can recognize other instances; but if you are in bad
> luck and the leader is bounced later, then it would not be able to
> recognize other instances and hence cause it to crash. So in some words,
> setting is config would be a safe choice but it does not mean you are
> doomed to fail the upgrade if you do not execute this way.
>
>
> Guozhang
>
> On Tue, Mar 26, 2019 at 10:38 PM Anirudh Vyas 
> wrote:
>
> > Hey,
> > We run 3 instances.
> >
> > Anirudh
> >
> > On Tue, Mar 26, 2019 at 9:28 PM Matthias J. Sax 
> > wrote:
> >
> > > Not sure. How many instances to do you run? If it's only one, you don't
> > > need the config.
> > >
> > > -Matthias
> > >
> > > On 3/26/19 5:17 AM, Anirudh Vyas wrote:
> > > > Hi,
> > > > I am in the process of upgrading my Kafka streams services from 1.1
> to
> > > > 2.1.0. I am following the upgrade guide:
> > > > https://kafka.apache.org/20/documentation/streams/upgrade-guide .
> > > >
> > > > My service is running on kafka version 2.0 and using kafka streams
> > > 1.1.1. I
> > > > updated my kafka-streams to 2.1.0 but DID NOT pass the config value
> > > > `upgrade.from` (it is null), as can be verified from the logs
> > > > ```
> > > > [INFO ] 2019-03-26 18:05:49,550 [main]
> > > > org.apache.kafka.streams.StreamsConfig:logAll: StreamsConfig values:
> > > >  application.id = application_id
> > > >  application.server =
> > > >  bootstrap.servers = [bootstrap-server-01:6667,
> > > > bootstrap-server-02:6667]
> > > >  buffered.records.per.partition = 1
> > > >  cache.max.bytes.buffering = 10485760
> > > >  client.id =
> > > >  commit.interval.ms = 15000
> > > >  connections.max.idle.ms = 54
> > > >  default.deserialization.exception.handler = class
> > > > org.apache.kafka.streams.errors.LogAndFailExceptionHandler
> > > >  default.key.serde = class
> > > > org.apache.kafka.common.serialization.Serdes$ByteArraySerde
> > > >  default.production.exception.handler = class
> > > > org.apache.kafka.streams.errors.DefaultProductionExceptionHandler
> > > >  default.timestamp.extractor = class
> > > > ziggurat.timestamp_transformer.IngestionTimeExtractor
> > > >  default.value.serde = class
> > > > org.apache.kafka.common.serialization.Serdes$ByteArraySerde
> > > >  max.task.idle.ms = 0
> > > >  metadata.max.age.ms = 30
> > > >  metric.reporters = []
> > > >  metrics.num.samples = 2
> > > >  metrics.recording.level = INFO
> > > >  metrics.sample.window.ms = 3
> > > >  num.standby.replicas = 0
> > > >  num.stream.threads = 3
> > > >  partition.grouper = class
> > > > org.apache.kafka.streams.processor.DefaultPartitionGrouper
> > > >  poll.ms = 100
> > > >  processing.guarantee = at_least_once
> > > >  receive.buffer.bytes = 32768
> > > >  reconnect.backoff.max.ms = 1000
> > > >  reconnect.backoff.ms = 50
> > > >  replication.factor = 1
> > > >  request.timeout.ms = 4
> > > >  retries = 0
> > > >  retry.backoff.ms = 100
> > > >  rocksdb.config.setter = null
> > > >  security.protocol = PLAINTEXT
> > > >  send.buffer.bytes = 131072
> > > >  state.cleanup.delay.ms = 60
> > > >  state.dir = /tmp/kafka-streams
> > > >  topology.optimization = none
> > > >  upgrade.from = null
> > > >  windowstore.changelog.additional.retention.ms = 8640
> > > > ```
> > > >
> > > > The application is consuming messages as expected and is not failing.
> > > >
> > > > I even went through the steps mentioned in the upgrade guide and did
> 2
> > > > rolling bounces with the correct config in place. Then I rolled back
> my
> > > > application to Kafka streams 1.1.1 and it was running as expected.
> > > >
> > > > I went through KIP-268
> > > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-268%3A+Simplify+Kafka+Streams+Rebalance+Metadata+Upgrade
> > > >
> > > > and
> > > > it mentions `Kafka Streams need to be configured with
> > > > upgrade.from=""for startup`, but the service runs fine
> > even
> > > if
> > > > I don't configure it.
> > > > It also mentions `user prepares a second round of rebalance; this
> time,
> > > the
> > > > configuration parameter upgrade.from must be removed for new startup`
> > but
> > > > the application runs as expected even if I don't remove the config
> > > > parameter.
> > > >
> > > > So, my question 

AW: Offsets of deleted consumer groups do not get deleted correctly

2019-04-01 Thread Claudia Wegmann
Hi,

thanks for your reply.

Also the groups were deleted month ago, there are still valid values there. So 
I guess deleting the group did not produce the tombstone record correctly. Your 
explanation made it clearer for me. I know I should keep cleanup policy compact 
in general. I decided to switch to policy for just long enough, so that these 
old records get deleted. I guess I could have produced the tombstone recods to 
be on the safe side.

Thanks,
Claudia

-Ursprüngliche Nachricht-
Von: Vincent Maurin  
Gesendet: Freitag, 29. März 2019 15:24
An: users@kafka.apache.org
Betreff: Re: Offsets of deleted consumer groups do not get deleted correctly

Hi,

You should keep the policy compact for the topic __consumer_offsets This topic 
stores for each group/topic/partition the offset consumed. As only the latest 
message for a group/topic/partition is relevant, the policy compact will keep 
only this message. When you delete a group, actually it will produce a 
tombstone to this topic (i.e body NULL). Then when the log compaction is 
running, it will definitively remove the tombstone.
But to have an effective delete of the tombstones, keep in mind :
* compaction runs only on rolled out segments
* deletion of tombstone only occurs if the delete.retention.ms delay is expired

Best regards

On Fri, Mar 29, 2019 at 2:16 PM Claudia Wegmann  wrote:

> Hey there,
>
> I've got the problem that the "__consumer_offsets" topic grows pretty 
> big over time. After some digging, I found offsets for consumer groups 
> that were deleted a long time ago still being present in the topic. 
> Many of them are offsets for console consumers, that have been deleted 
> with "kafka-consumer-groups.sh --delete --group ...".
>
> As far as I understand log cleaning, those offsets should have been 
> deleted a long time ago, because these consumers are no longer active. 
> When I query "kafka-consumer-groups.sh --bootstrap-server ...  --list" 
> I don't see those consumers either.
>
> Is there a bug in "kafka-consumer-groups.sh --delete --group ..." that 
> let's kafka hang on to those consumer groups?
>
> How can I get the log cleaner to delete these old offsets? Is there 
> another way than setting "cleanup.policy" to "delete"?
>
> Thanks for our help!
>
> Best,
> Claudia
>


Re: Need help to find references to antipatterns/pitfalls/incorrect ways to use Kafka

2019-04-01 Thread Hans Jespersen
Yes but you have more than 1 POS terminal per location so you still don't need 
20,000 partitions. Just one per location. How many locations do you have?

In doesn’t matter anyway since you can build a Kafka cluster with up to 200,000 
partitions if you use the latest versions of Kafka.

https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions

“As a rule of thumb, we recommend each broker to have up to 4,000 partitions 
and each cluster to have up to 200,000 partitions”

-hans 

> On Apr 1, 2019, at 2:02 AM, Alexander Kuterin  wrote:
> 
> Thanks, Hans!
> We use location specific SKU pricing and send specific price lists to the
> specific POS terminal.
> 
> пн, 1 апр. 2019 г., 3:01 Hans Jespersen :
> 
>> Doesn’t every one of the 20,000 POS terminals want to get the same price
>> list messages? If so then there is no need for 20,000 partitions.
>> 
>> -hans
>> 
>>> On Mar 31, 2019, at 7:24 PM,  
>> wrote:
>>> 
>>> Hello!
>>> 
>>> 
>>> 
>>> I ask for your help in connection with the my recent task:
>>> 
>>> - Price lists are delivered to 20,000 points of sale with a frequency of
>> <10
>>> price lists per day.
>>> 
>>> - The order in which the price lists follow is important. It is also
>>> important that the price lists are delivered to the point of sale online.
>>> 
>>> - At each point of sale, an agent application is deployed, which
>> processes
>>> the received price lists.
>>> 
>>> 
>>> 
>>> This task is not particularly difficult. Help in solving the task is not
>>> required.
>>> 
>>> 
>>> 
>>> The difficulty is that Kafka in our company is a new "silver bullet", and
>>> the project manager requires me to implement the following technical
>>> decision:
>>> 
>>> deploy 20,000 Kafka consumer instances (one instance for each point of
>> sale)
>>> for one topic partitioned into 20,000 partitions - one partition per
>>> consumer.
>>> 
>>> Technical problems obtained in experiments with this technical decision
>> do
>>> not convince him.
>>> 
>>> 
>>> 
>>> Please give me references to the books/documents/blogposts. which clearly
>>> shows that Kafka not intended for this way to use (references to other
>>> anti-patterns/pitfalls will be useful).
>>> 
>>> My own attempts to find such references were unsuccessful.
>>> 
>>> 
>>> 
>>> Thank you!
>>> 
>>> 
>>> 
>> 


Re: Need help to find references to antipatterns/pitfalls/incorrect ways to use Kafka

2019-04-01 Thread Alexander Kuterin
Thanks, Hans!
We use location specific SKU pricing and send specific price lists to the
specific POS terminal.

пн, 1 апр. 2019 г., 3:01 Hans Jespersen :

> Doesn’t every one of the 20,000 POS terminals want to get the same price
> list messages? If so then there is no need for 20,000 partitions.
>
> -hans
>
> > On Mar 31, 2019, at 7:24 PM,  
> wrote:
> >
> > Hello!
> >
> >
> >
> > I ask for your help in connection with the my recent task:
> >
> > - Price lists are delivered to 20,000 points of sale with a frequency of
> <10
> > price lists per day.
> >
> > - The order in which the price lists follow is important. It is also
> > important that the price lists are delivered to the point of sale online.
> >
> > - At each point of sale, an agent application is deployed, which
> processes
> > the received price lists.
> >
> >
> >
> > This task is not particularly difficult. Help in solving the task is not
> > required.
> >
> >
> >
> > The difficulty is that Kafka in our company is a new "silver bullet", and
> > the project manager requires me to implement the following technical
> > decision:
> >
> > deploy 20,000 Kafka consumer instances (one instance for each point of
> sale)
> > for one topic partitioned into 20,000 partitions - one partition per
> > consumer.
> >
> > Technical problems obtained in experiments with this technical decision
> do
> > not convince him.
> >
> >
> >
> > Please give me references to the books/documents/blogposts. which clearly
> > shows that Kafka not intended for this way to use (references to other
> > anti-patterns/pitfalls will be useful).
> >
> > My own attempts to find such references were unsuccessful.
> >
> >
> >
> > Thank you!
> >
> >
> >
>