Re: Kafka compatibility with ZK

2020-09-03 Thread Manoj.Agrawal2
We also upgraded kafka 2.2.1 to kafka 2.5.0 and kept same zookeeper . no issued 
reported .
Later we also upgraded zookeeper to 3.5.8 . all good .

On 9/3/20, 8:42 PM, "Andrey Klochkov"  wrote:

[External]


Hello all,
FWIW we upgraded to Kafka 2.4.1 and kept ZK at 3.4.6, no issues noticed.

On Sun, Aug 2, 2020 at 10:04 AM Marina Popova
 wrote:

>
> Actually, I'm very interested in your experience as well I'm about to
> start the same (similar) upgrade - from Kafka 0.11/ZK3.4.13 to Kafka 
2.4/ZK
> 3.5.6
>
> I have Kafka and ZK as separate clusters.
>
> My plan is :
> 1. rolling upgrade the Kafka cluster to 2.4 - using the
> inter.broker.protocol.version set to 0.11 at first
> 2. rolling upgrade ZK cluster to 3.5.6
> 3. set inter.broker.protocol.version=2.4.0 and rolling restart the Kafka
> cluster again
>
> Anybody sees a problem with this approach?
>
>
> thanks,
> Marina
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Thursday, July 23, 2020 4:01 PM, Andrey Klochkov 
> wrote:
>
> > Hello,
> > We're upgrading our Kafka from 1.1.0 to 2.4.1 and I'm wondering if ZK
> needs
> > to be upgraded too (we're currently on 3.4.6). The upgrade guide says
> that
> > "kafka has switched to the XXX version of ZK" but never says if 
switching
> > to a newer ZK is mandatory or not. What are the guidelines on keeping
> Kafka
> > and ZK compatible?
> >
> >
> 
---
> >
> > Andrey Klochkov
>
>
>

--
Andrey Klochkov


This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.


Re: Kafka compatibility with ZK

2020-09-03 Thread Andrey Klochkov
Hello all,
FWIW we upgraded to Kafka 2.4.1 and kept ZK at 3.4.6, no issues noticed.

On Sun, Aug 2, 2020 at 10:04 AM Marina Popova
 wrote:

>
> Actually, I'm very interested in your experience as well I'm about to
> start the same (similar) upgrade - from Kafka 0.11/ZK3.4.13 to Kafka 2.4/ZK
> 3.5.6
>
> I have Kafka and ZK as separate clusters.
>
> My plan is :
> 1. rolling upgrade the Kafka cluster to 2.4 - using the
> inter.broker.protocol.version set to 0.11 at first
> 2. rolling upgrade ZK cluster to 3.5.6
> 3. set inter.broker.protocol.version=2.4.0 and rolling restart the Kafka
> cluster again
>
> Anybody sees a problem with this approach?
>
>
> thanks,
> Marina
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Thursday, July 23, 2020 4:01 PM, Andrey Klochkov 
> wrote:
>
> > Hello,
> > We're upgrading our Kafka from 1.1.0 to 2.4.1 and I'm wondering if ZK
> needs
> > to be upgraded too (we're currently on 3.4.6). The upgrade guide says
> that
> > "kafka has switched to the XXX version of ZK" but never says if switching
> > to a newer ZK is mandatory or not. What are the guidelines on keeping
> Kafka
> > and ZK compatible?
> >
> >
> ---
> >
> > Andrey Klochkov
>
>
>

-- 
Andrey Klochkov


Re: UPGRADING ZOOKEEPER FROM 3.4.13 TO 3.5.7

2020-09-03 Thread Manoj.Agrawal2
Issue has been fixed by copying empty snapshot file to data dir .

Thanks .

On 9/2/20, 10:51 PM, "Enrico Olivelli"  wrote:

[External]


The official way to fix it is here

https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FZOOKEEPER-3056&data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C4620b7fb3801455d8a8808d84fcd6318%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637347090841081188&sdata=ymbjZLn8teCdxSQVQGD084lFd6HKxOAW3b9F%2BLPMn%2BM%3D&reserved=0

Basically we have a flag to allow the boot even in that case.
I suggest you to upgrade to latest 3.5.8 and not to 3.5.7


Enrico

Il Gio 3 Set 2020, 03:51 Rijo Roy  ha scritto:

> Hi Manoj,
> I just faced it yesterday and resolved..
> Hope you are getting this error in one of the follower node, if yesPlease
> create a backup folder in your zookeeper data directory and move version_2
> that holds zookeeper data into the newly created backup folder.
> Starting the zookeeper process will sync create the version_2 folder into
> its data directory.
> Regards,Rijo Roy
> Sent from Yahoo Mail on Android
>
>   On Thu, 3 Sep 2020 at 2:57 am, manoj.agraw...@cognizant.com<
> manoj.agraw...@cognizant.com> wrote:   HI ALL ,
> I’m planning to upgrade the Kafka 2.2.1 to kafka 2.5.0 , I m getting below
> error while upgrading zookeeper version as below . Any idea ?
>
>
>
>
>
> java.io.IOException: No snapshot found, but there are log entries.
> Something is broken!
>
>   at
> 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
>
>   at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>
>   at
> 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)
>
>   at
> 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
>
>   at
> 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
>
>   at
> 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>
> 2020-09-02 21:19:23,877 - ERROR [main:QuorumPeerMain@101] - Unexpected
> exception, exiting abnormally
>
> java.lang.RuntimeException: Unable to run quorum server
>
>   at
> 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:938)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)
>
>   at
> 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
>
>   at
> 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
>
>   at
> 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>
> Caused by: java.io.IOException: No snapshot found, but there are log
> entries. Something is broken!
>
>   at
> 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
>
>   at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>
>   at
> 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)
>
>   ... 4 more
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored. This e-mail and any 
files
> transmitted with it are for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. If you are not the
> intended recipient(s), please reply to the sender and destroy all copies 
of
> the original message. Any unauthorized review, use, disclosure,
> dissemination, forwarding, printing or copying of this email, and/or any
> action taken in reliance on the contents of this e-mail is strictly
> prohibited and may be unlawful. Where permitted by applicable law, this
> e-mail and other e-mail communications sent to and from Cognizant e-mail
> addresses may be monitored.
>
>


This e-mail and any files transmitted with it are for the s

Re: Streams constantly reblancing,

2020-09-03 Thread Steve Jones
There are zero events on the stream in the case with the log files, I've
had the same behaviour when running with 6 events.  Its constantly
reporting rebalancing but from a CPU perspective nothing appears to
actually be doing anything.

I've run it for several hours just to see if something happened (checking
every 30 seconds) and there are no log events from Kafka during those
times.  So the log file (filter.log) shows the logs for about an hour and a
quarter run, my confusion comes from the fact that looking via JMX and at
the CPU logs it appears that nothing much is actually being done on the
Kafka side but there aren't any exceptions.

Help and hints appreciated.

Steve

On Wed, 2 Sep 2020 at 12:27, Sophie Blee-Goldman 
wrote:

> Hey steve,
>
> I guess the first question I have is, is the consumer group actually
> rebalancing
> or does KafkaStreams just appear to be "stuck" in the REBALANCING state? If
> this application has been run before and there is a large amount of data in
> the
> changelog topics, it might take a while to rebuild the local state stores
> from the
> changelog. During this time it would appear to be in REBALANCING, but it's
> actually
> just restoring.
>
> Just one possibility of many. How long did you let it run without seeing it
> stabilize?
> It's difficult to diagnose from just the small snippet of logs above, so I
> would
> try and take a look at the full picture over more than just a minute. There
> *should*
> be something slightly more helpful logged if it really is stuck in a cycle
> of endless
> rebalances (eg timing out on the session interval)
>
> On Wed, Sep 2, 2020 at 11:52 AM Steve Jones 
> wrote:
>
> > Just an addition to this in case someone can help, I'm seeing zero thread
> > activity in the rebalancing but I'm also not seeing anything being logged
> > by Kafka to indicate its doing anything on rebalancing (Logging level set
> > to ALL), producer and consumer code works fine but streams just appears
> to
> > hang.
> >
> > On Mon, 31 Aug 2020 at 12:49, Steve Jones 
> wrote:
> >
> >> When running an application on the Mac it works fine, when running
> >> exactly the same app and config on the Raspberry Pi it constantly says
> it
> >> is "Rebalancing" the streams
> >>
> >> 2020-08-31 12:47:11 INFO
> >> org.apache.kafka.common.utils.AppInfoParser$AppInfo  Kafka
> version:
> >> 2.6.0
> >>
> >> ''2020-08-31 12:47:11 INFO
> >> org.apache.kafka.common.utils.AppInfoParser$AppInfo  Kafka
> commitId:
> >> 62abe01bee039651
> >>
> >> ''2020-08-31 12:47:11 INFO
> >> org.apache.kafka.common.utils.AppInfoParser$AppInfo  Kafka
> >> startTimeMs: 1598903231499
> >>
> >> ''2020-08-31 12:47:11 WARNING org.apache.kafka.streams.StreamsConfig
> >> checkIfUnexpectedUserSpecifiedConsumerConfig Unexpected user-specified
> >> consumer config: enable.auto.commit found. User setting (true) will be
> >> ignored and the Streams default setting (false) will be used
> >>
> >> ''2020-08-31 12:47:11 INFO org.apache.kafka.streams.KafkaStreams
> setState
> >> stream-client [pi-test-84721b40-dfa1-4848-b3de-5c7561048403] State
> >> transition from CREATED to REBALANCING
> >>
> >> ''2020-08-31 12:47:11 INFO
> >> com.businesssoa.home.events.filter.dispatcher.kafka.SensorPipe start
> Pipe
> >> STARTED
> >>
> >> ''2020-08-31 12:47:11 INFO
> >> org.apache.kafka.streams.processor.internals.StreamThread run
> stream-thread
> >> [pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1] Starting
> >>
> >> ''2020-08-31 12:47:11 INFO
> >> org.apache.kafka.streams.processor.internals.StreamThread setState
> >> stream-thread
> [pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1]
> >> State transition from CREATED to STARTING
> >>
> >> ''2020-08-31 12:47:11 INFO
> >> org.apache.kafka.clients.consumer.KafkaConsumer subscribe [Consumer
> >>
> clientId=pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1-consumer,
> >> groupId=pi-test] Subscribed to topic(s): filtered-topic, reading-topic
> >>
> >> ''2020-08-31 12:47:11 INFO com.businesssoa.home.events.filter.App
> >> waitTillStarted Status is REBALANCING , 1
> >>
> >> ''2020-08-31 12:47:12 INFO org.apache.kafka.clients.Metadata update
> >> [Producer
> >>
> clientId=pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1-producer]
> >> Cluster ID: 1AOC2O8zSqq7nuQKACXFxQ
> >>
> >> ''2020-08-31 12:47:12 INFO org.apache.kafka.clients.Metadata update
> >> [Consumer
> >>
> clientId=pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1-consumer,
> >> groupId=pi-test] Cluster ID: 1AOC2O8zSqq7nuQKACXFxQ
> >>
> >> ''2020-08-31 12:47:41 INFO com.businesssoa.home.events.filter.App
> >> waitTillStarted Status is REBALANCING , 2
> >>
> >> ''2020-08-31 12:48:11 INFO com.businesssoa.home.events.filter.App
> >> waitTillStarted Status is REBALANCING , 3
> >>
> >> Any guidance on how I can debug the streams as to why its constantly
> >> rebalancing?
> >>
> >> Steve
> >>
> >>
>


Re: Kafka streams parallelism - why not separate stream task per partition per input topic

2020-09-03 Thread Matthias J. Sax
Well, it depends on your program.

The reason for the current task creating strategy are joins: If you have
two input topic that you want to join, the join happens on a
per-partition basis, ie, topic-p0 is joined to topicB-p0 etc and thus
both partitions must be assigned to the same task (to get co-partitioned
data processed together).

Note, that the following program would create independent tasks as it
consist of two independent sub-topologies:

builder.stream("topicA").filter().to();
builder.stream("topicB").filter().to();

However, the next program would be one sub-topology and thus we apply
the "join" rule (as we don't really know if you actually execute a join
or not when we create tasks):

KStream s1 = builder.stream("topicA");
builser.stream("topicB").merge(s1).filter().to();


Having said that, I agree that it would be a nice improvement to be more
clever about it. However, it not easy to do. There is actually a related
ticket: https://issues.apache.org/jira/browse/KAFKA-9282


Hope this helps.
  -Matthias

On 9/2/20 11:09 PM, Pushkar Deole wrote:
> Hi,
> 
> I came across articles where it is explained how parallelism is handled in
> kafka streams. This is what I collected:
> When the streams application is reading from multiple topics, the topic
> with maximum number of partitions is considered for instantiating stream
> tasks so 1 task is instantiated per partition.
> Now, if the stream task is reading from multiple topics then the partitions
> of multiple topics are shared among those stream tasks.
> 
> For example, Topic A and B has 5 partitions each then 5 tasks are
> instantiated and assigned to 5 stream threads where each task is assigned 1
> partition from Topic A and Topic B.
> 
> The question here is : if I would want 1 task to be created for each
> partition from the input topic then is this possible? e.g. I would want to
> have 5 tasks for topic A and 5 for B and then would want 10 threads to
> handle those. How can this be achieved?
> 



signature.asc
Description: OpenPGP digital signature


Re: Consumer Group rebalancing events

2020-09-03 Thread Devaki, Srinivas
Hi Tess D'erberwill, Таисия Дорошенков,

Thanks a lot for the reply on this, one major issue I found with our
existing metrics is that they are not granular enough i.e if the
rebalancing finishes under 10 seconds then I might lose that event in the
metrics if my metrics scrape interval is 10 seconds.

Any suggestions around how to log the exact events than metrics(major
concern is around losing the events)

Thanks

On Tue, Aug 25, 2020 at 7:59 PM Tess D'erberwill 
wrote:

> Hi, Srinivas Devaki !
>
> There are Kafka JMX metrics - NumGroupsCompletingRebalance and
> NumGroupsPreparingRebalance. I think you can monitor them by some
> application and calculate statistic.
> As for us - we added Jolokia agent to our Kafka docker image and Jolokia
> expose all JMX metrics via HTTP. We run additional service which gets
> necessary metrics from this endpoint and analyse them.
>
> On 2020/08/23 16:39:35, "Devaki, Srinivas" 
> wrote:
> > Hi Folks,
> >
> > I’m trying to find the consumer group rebalancing events, so to plot how
> > much time consumer groups usually take in rebalancing during application
> > deployments.
> >
> > I’ve tried to check the logs of `server.log`, `controller.log`,
> > `state-change.log`. but couldn’t find anything about rebalancing.
> >
> >
> > Thanks & Regards
> > Srinivas Devaki
> > sre @ zomato
> >
>


Re: Advice for Kafka project in Africa...

2020-09-03 Thread Gaurav Bajaj
Hello Cedric,

My 2 cents :

We use Kafka alot but mostly for messaging and event streaming purpose.
Using Kafka as Database is not a usecase i would look Kafka for. Ofcourse
you can use it to store some intermediate states but having it as "system
of records" would be stretched use case for Kafka.

Best Regards,
Gaurav

On Wed, Sep 2, 2020, 9:31 PM cedric sende lubuele 
wrote:

> Let me introduce myself, my name is Cedric and I am a network engineer
> passionate about new technologies and as part of my new activity, I am
> interested in Big Data. Currently, I live in Africa (Congo) and as everyone
> knows, Africa is very late in terms of IT infrastructure (the Cloud is
> difficult, we work a lot on premise).
>
> While doing some research, I came across Kai Waehner's article (Kafka
> replace database?<
> https://www.kai-waehner.de/blog/2020/03/12/can-apache-kafka-replace-database-acid-storage-%20transactions-sql-nosql-data-lake%20/>)
> and I would like to be able to get an idea about the possibilities of Kafka.
>
> Let me explain, I am working on a project for integrating several
> databases (MySQL, MongoDB, Neo4j, ...) and I have to develop with my team,
> an alert system which must detect anomalies on different criteria linked to
> a person in the various departments of the company.
> Would Kafka be a good solution in order to centralize all the data and
> create several analysis scripts to detect an anomaly and send back an alert
> message such as for example a suspect wanted by the police?
>
> Thank you in advance
>
>
>
> Sende Cedric / Network IT
> sende.ced...@hotmail.com / 082/8446954
>
> UPSAIL GROUP
> http://upsail.co/
>
> [http://upsail.co/wp-content/themes/upsail/images/logo.png]
>