Kafka streams parallelism - why not separate stream task per partition per input topic

2020-09-02 Thread Pushkar Deole
Hi,

I came across articles where it is explained how parallelism is handled in
kafka streams. This is what I collected:
When the streams application is reading from multiple topics, the topic
with maximum number of partitions is considered for instantiating stream
tasks so 1 task is instantiated per partition.
Now, if the stream task is reading from multiple topics then the partitions
of multiple topics are shared among those stream tasks.

For example, Topic A and B has 5 partitions each then 5 tasks are
instantiated and assigned to 5 stream threads where each task is assigned 1
partition from Topic A and Topic B.

The question here is : if I would want 1 task to be created for each
partition from the input topic then is this possible? e.g. I would want to
have 5 tasks for topic A and 5 for B and then would want 10 threads to
handle those. How can this be achieved?


Re: UPGRADING ZOOKEEPER FROM 3.4.13 TO 3.5.7

2020-09-02 Thread Enrico Olivelli
The official way to fix it is here
https://issues.apache.org/jira/browse/ZOOKEEPER-3056

Basically we have a flag to allow the boot even in that case.
I suggest you to upgrade to latest 3.5.8 and not to 3.5.7


Enrico

Il Gio 3 Set 2020, 03:51 Rijo Roy  ha scritto:

> Hi Manoj,
> I just faced it yesterday and resolved..
> Hope you are getting this error in one of the follower node, if yesPlease
> create a backup folder in your zookeeper data directory and move version_2
> that holds zookeeper data into the newly created backup folder.
> Starting the zookeeper process will sync create the version_2 folder into
> its data directory.
> Regards,Rijo Roy
> Sent from Yahoo Mail on Android
>
>   On Thu, 3 Sep 2020 at 2:57 am, manoj.agraw...@cognizant.com<
> manoj.agraw...@cognizant.com> wrote:   HI ALL ,
> I’m planning to upgrade the Kafka 2.2.1 to kafka 2.5.0 , I m getting below
> error while upgrading zookeeper version as below . Any idea ?
>
>
>
>
>
> java.io.IOException: No snapshot found, but there are log entries.
> Something is broken!
>
>   at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
>
>   at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>
> 2020-09-02 21:19:23,877 - ERROR [main:QuorumPeerMain@101] - Unexpected
> exception, exiting abnormally
>
> java.lang.RuntimeException: Unable to run quorum server
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:938)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>
> Caused by: java.io.IOException: No snapshot found, but there are log
> entries. Something is broken!
>
>   at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
>
>   at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>
>   at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)
>
>   ... 4 more
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored. This e-mail and any files
> transmitted with it are for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. If you are not the
> intended recipient(s), please reply to the sender and destroy all copies of
> the original message. Any unauthorized review, use, disclosure,
> dissemination, forwarding, printing or copying of this email, and/or any
> action taken in reliance on the contents of this e-mail is strictly
> prohibited and may be unlawful. Where permitted by applicable law, this
> e-mail and other e-mail communications sent to and from Cognizant e-mail
> addresses may be monitored.
>
>


Re: UPGRADING ZOOKEEPER FROM 3.4.13 TO 3.5.7

2020-09-02 Thread Rijo Roy
Hi Manoj,
I just faced it yesterday and resolved..
Hope you are getting this error in one of the follower node, if yesPlease 
create a backup folder in your zookeeper data directory and move version_2 that 
holds zookeeper data into the newly created backup folder.
Starting the zookeeper process will sync create the version_2 folder into its 
data directory.
Regards,Rijo Roy
Sent from Yahoo Mail on Android 
 
  On Thu, 3 Sep 2020 at 2:57 am, 
manoj.agraw...@cognizant.com wrote:   HI ALL ,
I’m planning to upgrade the Kafka 2.2.1 to kafka 2.5.0 , I m getting below 
error while upgrading zookeeper version as below . Any idea ?





java.io.IOException: No snapshot found, but there are log entries. Something is 
broken!

      at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)

      at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)

      at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)

      at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)

      at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)

      at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)

      at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)

2020-09-02 21:19:23,877 - ERROR [main:QuorumPeerMain@101] - Unexpected 
exception, exiting abnormally

java.lang.RuntimeException: Unable to run quorum server

      at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:938)

      at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)

      at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)

      at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)

      at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)

Caused by: java.io.IOException: No snapshot found, but there are log entries. 
Something is broken!

      at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)

      at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)

      at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)

      ... 4 more

This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored. This e-mail and any files transmitted with it are for the sole 
use of the intended recipient(s) and may contain confidential and privileged 
information. If you are not the intended recipient(s), please reply to the 
sender and destroy all copies of the original message. Any unauthorized review, 
use, disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
  


Re: [External] key and subkey sequential processing across competing consumer groups?

2020-09-02 Thread Andre Mermegas
kinda:
A:X:FOO
B:X:FOO
B:Y:BAR
B:Z:BAZ

A:X:FOO and B:X:FOO can be processed in parallel, but B:Z:BAZ must be
processed after B:Y:BAR and after B:X:FOO
where A and B go to some partition for sequencing, but then "X", "Y" and
"Z" go to a different sub topic and partition for processing.

sequencing must maintain across all
Seq->A:X:FOO
Seq->B:X:FOO, B:Y:BAR, B:Z:BAZ



On Wed, Sep 2, 2020 at 5:31 PM Tauzell, Dave 
wrote:

> So if the stream is:
>
> A:1:FOO
> A:3:BAR
> A:3:BAZ
>
> Then A:3* must be processed after A:1  but A:3:BAR and A:3:BAZ can be
> processed in any order?
>
> I don’t think there is a way to do that with topics.
> -Dave
>
>
> From: Andre Mermegas 
> Reply-To: "users@kafka.apache.org" 
> Date: Wednesday, September 2, 2020 at 4:06 PM
> To: "users@kafka.apache.org" 
> Subject: [External] key and subkey sequential processing across competing
> consumer groups?
>
> Hi All,
>
> New to kafka and would love some feedback on how to think about a solution
> for this kind of flow:
>
> So, sequencing must be maintained for events of type X:Y:PAYLOAD
>
> Where X is a grouping mechanism and all X must be processed sequentially
> e.g. account number, (ok i use key here to keep these in sequence)
>
> But Y is a secondary grouping and distribution variable, that will fan out
> to competing consumers for each type of Y
>
> e.g. where y=A it goes to a topic of competing consumers and where y=B it
> goes out to a different topic of competing consumers, etc..Each competing
> consumer group is configured to handle only Ys of its type. All Xs must
> consumed sequentially, across distributed consumer groups even those fanned
> out with a subkey of Y
>
> How do I keep it sequential processing FIFO across? I know I can use a key
> to sequence by X in a topic partition for sequential processing, but it
> breaks down from there for me as I also need to sequence by Y. Is there a
> good built in way to handle this type of problem? Am I thinking about it
> wrong?
>
> I hope this is somewhat clear? thanks for any input!
>
> looks something like this maybe:
>
> [cid:ii_kelvcjxy1]
>
> --
> Thanks,
> -Andre
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>


-- 

Thanks,
-Andre


Re: [External] key and subkey sequential processing across competing consumer groups?

2020-09-02 Thread Tauzell, Dave
So if the stream is:

A:1:FOO
A:3:BAR
A:3:BAZ

Then A:3* must be processed after A:1  but A:3:BAR and A:3:BAZ can be processed 
in any order?

I don’t think there is a way to do that with topics.
-Dave


From: Andre Mermegas 
Reply-To: "users@kafka.apache.org" 
Date: Wednesday, September 2, 2020 at 4:06 PM
To: "users@kafka.apache.org" 
Subject: [External] key and subkey sequential processing across competing 
consumer groups?

Hi All,

New to kafka and would love some feedback on how to think about a solution for 
this kind of flow:

So, sequencing must be maintained for events of type X:Y:PAYLOAD

Where X is a grouping mechanism and all X must be processed sequentially e.g. 
account number, (ok i use key here to keep these in sequence)

But Y is a secondary grouping and distribution variable, that will fan out to 
competing consumers for each type of Y

e.g. where y=A it goes to a topic of competing consumers and where y=B it goes 
out to a different topic of competing consumers, etc..Each competing consumer 
group is configured to handle only Ys of its type. All Xs must consumed 
sequentially, across distributed consumer groups even those fanned out with a 
subkey of Y

How do I keep it sequential processing FIFO across? I know I can use a key to 
sequence by X in a topic partition for sequential processing, but it breaks 
down from there for me as I also need to sequence by Y. Is there a good built 
in way to handle this type of problem? Am I thinking about it wrong?

I hope this is somewhat clear? thanks for any input!

looks something like this maybe:

[cid:ii_kelvcjxy1]

--
Thanks,
-Andre
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


UPGRADING ZOOKEEPER FROM 3.4.13 TO 3.5.7

2020-09-02 Thread Manoj.Agrawal2
HI ALL ,
I’m planning to upgrade the Kafka 2.2.1 to kafka 2.5.0 , I m getting below 
error while upgrading zookeeper version as below . Any idea ?





java.io.IOException: No snapshot found, but there are log entries. Something is 
broken!

   at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)

   at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)

   at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)

   at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)

   at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)

   at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)

   at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)

2020-09-02 21:19:23,877 - ERROR [main:QuorumPeerMain@101] - Unexpected 
exception, exiting abnormally

java.lang.RuntimeException: Unable to run quorum server

   at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:938)

   at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:887)

   at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)

   at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)

   at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)

Caused by: java.io.IOException: No snapshot found, but there are log entries. 
Something is broken!

   at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)

   at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)

   at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:901)

   ... 4 more

This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored. This e-mail and any files transmitted with it are for the sole 
use of the intended recipient(s) and may contain confidential and privileged 
information. If you are not the intended recipient(s), please reply to the 
sender and destroy all copies of the original message. Any unauthorized review, 
use, disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.


key and subkey sequential processing across competing consumer groups?

2020-09-02 Thread Andre Mermegas
Hi All,

New to kafka and would love some feedback on how to think about a solution
for this kind of flow:

So, sequencing must be maintained for events of type X:Y:PAYLOAD

Where X is a grouping mechanism and all X must be processed sequentially
e.g. account number, (ok i use key here to keep these in sequence)

But Y is a secondary grouping and distribution variable, that will fan out
to competing consumers for each type of Y

e.g. where y=A it goes to a topic of competing consumers and where y=B it
goes out to a different topic of competing consumers, etc..Each competing
consumer group is configured to handle only Ys of its type. All Xs must
consumed sequentially, across distributed consumer groups even those fanned
out with a subkey of Y

How do I keep it sequential processing FIFO across? I know I can use a key
to sequence by X in a topic partition for sequential processing, but it
breaks down from there for me as I also need to sequence by Y. Is there a
good built in way to handle this type of problem? Am I thinking about it
wrong?

I hope this is somewhat clear? thanks for any input!

looks something like this maybe:

[image: scheme1.png]

-- 
Thanks,
-Andre


Re: Streams constantly reblancing,

2020-09-02 Thread Sophie Blee-Goldman
Hey steve,

I guess the first question I have is, is the consumer group actually
rebalancing
or does KafkaStreams just appear to be "stuck" in the REBALANCING state? If
this application has been run before and there is a large amount of data in
the
changelog topics, it might take a while to rebuild the local state stores
from the
changelog. During this time it would appear to be in REBALANCING, but it's
actually
just restoring.

Just one possibility of many. How long did you let it run without seeing it
stabilize?
It's difficult to diagnose from just the small snippet of logs above, so I
would
try and take a look at the full picture over more than just a minute. There
*should*
be something slightly more helpful logged if it really is stuck in a cycle
of endless
rebalances (eg timing out on the session interval)

On Wed, Sep 2, 2020 at 11:52 AM Steve Jones  wrote:

> Just an addition to this in case someone can help, I'm seeing zero thread
> activity in the rebalancing but I'm also not seeing anything being logged
> by Kafka to indicate its doing anything on rebalancing (Logging level set
> to ALL), producer and consumer code works fine but streams just appears to
> hang.
>
> On Mon, 31 Aug 2020 at 12:49, Steve Jones  wrote:
>
>> When running an application on the Mac it works fine, when running
>> exactly the same app and config on the Raspberry Pi it constantly says it
>> is "Rebalancing" the streams
>>
>> 2020-08-31 12:47:11 INFO
>> org.apache.kafka.common.utils.AppInfoParser$AppInfo  Kafka version:
>> 2.6.0
>>
>> ''2020-08-31 12:47:11 INFO
>> org.apache.kafka.common.utils.AppInfoParser$AppInfo  Kafka commitId:
>> 62abe01bee039651
>>
>> ''2020-08-31 12:47:11 INFO
>> org.apache.kafka.common.utils.AppInfoParser$AppInfo  Kafka
>> startTimeMs: 1598903231499
>>
>> ''2020-08-31 12:47:11 WARNING org.apache.kafka.streams.StreamsConfig
>> checkIfUnexpectedUserSpecifiedConsumerConfig Unexpected user-specified
>> consumer config: enable.auto.commit found. User setting (true) will be
>> ignored and the Streams default setting (false) will be used
>>
>> ''2020-08-31 12:47:11 INFO org.apache.kafka.streams.KafkaStreams setState
>> stream-client [pi-test-84721b40-dfa1-4848-b3de-5c7561048403] State
>> transition from CREATED to REBALANCING
>>
>> ''2020-08-31 12:47:11 INFO
>> com.businesssoa.home.events.filter.dispatcher.kafka.SensorPipe start Pipe
>> STARTED
>>
>> ''2020-08-31 12:47:11 INFO
>> org.apache.kafka.streams.processor.internals.StreamThread run stream-thread
>> [pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1] Starting
>>
>> ''2020-08-31 12:47:11 INFO
>> org.apache.kafka.streams.processor.internals.StreamThread setState
>> stream-thread [pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1]
>> State transition from CREATED to STARTING
>>
>> ''2020-08-31 12:47:11 INFO
>> org.apache.kafka.clients.consumer.KafkaConsumer subscribe [Consumer
>> clientId=pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1-consumer,
>> groupId=pi-test] Subscribed to topic(s): filtered-topic, reading-topic
>>
>> ''2020-08-31 12:47:11 INFO com.businesssoa.home.events.filter.App
>> waitTillStarted Status is REBALANCING , 1
>>
>> ''2020-08-31 12:47:12 INFO org.apache.kafka.clients.Metadata update
>> [Producer
>> clientId=pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1-producer]
>> Cluster ID: 1AOC2O8zSqq7nuQKACXFxQ
>>
>> ''2020-08-31 12:47:12 INFO org.apache.kafka.clients.Metadata update
>> [Consumer
>> clientId=pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1-consumer,
>> groupId=pi-test] Cluster ID: 1AOC2O8zSqq7nuQKACXFxQ
>>
>> ''2020-08-31 12:47:41 INFO com.businesssoa.home.events.filter.App
>> waitTillStarted Status is REBALANCING , 2
>>
>> ''2020-08-31 12:48:11 INFO com.businesssoa.home.events.filter.App
>> waitTillStarted Status is REBALANCING , 3
>>
>> Any guidance on how I can debug the streams as to why its constantly
>> rebalancing?
>>
>> Steve
>>
>>


Re: Streams constantly reblancing,

2020-09-02 Thread Steve Jones
Just an addition to this in case someone can help, I'm seeing zero thread
activity in the rebalancing but I'm also not seeing anything being logged
by Kafka to indicate its doing anything on rebalancing (Logging level set
to ALL), producer and consumer code works fine but streams just appears to
hang.

On Mon, 31 Aug 2020 at 12:49, Steve Jones  wrote:

> When running an application on the Mac it works fine, when running exactly
> the same app and config on the Raspberry Pi it constantly says it is
> "Rebalancing" the streams
>
> 2020-08-31 12:47:11 INFO
> org.apache.kafka.common.utils.AppInfoParser$AppInfo  Kafka version:
> 2.6.0
>
> ''2020-08-31 12:47:11 INFO
> org.apache.kafka.common.utils.AppInfoParser$AppInfo  Kafka commitId:
> 62abe01bee039651
>
> ''2020-08-31 12:47:11 INFO
> org.apache.kafka.common.utils.AppInfoParser$AppInfo  Kafka
> startTimeMs: 1598903231499
>
> ''2020-08-31 12:47:11 WARNING org.apache.kafka.streams.StreamsConfig
> checkIfUnexpectedUserSpecifiedConsumerConfig Unexpected user-specified
> consumer config: enable.auto.commit found. User setting (true) will be
> ignored and the Streams default setting (false) will be used
>
> ''2020-08-31 12:47:11 INFO org.apache.kafka.streams.KafkaStreams setState
> stream-client [pi-test-84721b40-dfa1-4848-b3de-5c7561048403] State
> transition from CREATED to REBALANCING
>
> ''2020-08-31 12:47:11 INFO
> com.businesssoa.home.events.filter.dispatcher.kafka.SensorPipe start Pipe
> STARTED
>
> ''2020-08-31 12:47:11 INFO
> org.apache.kafka.streams.processor.internals.StreamThread run stream-thread
> [pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1] Starting
>
> ''2020-08-31 12:47:11 INFO
> org.apache.kafka.streams.processor.internals.StreamThread setState
> stream-thread [pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1]
> State transition from CREATED to STARTING
>
> ''2020-08-31 12:47:11 INFO org.apache.kafka.clients.consumer.KafkaConsumer
> subscribe [Consumer
> clientId=pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1-consumer,
> groupId=pi-test] Subscribed to topic(s): filtered-topic, reading-topic
>
> ''2020-08-31 12:47:11 INFO com.businesssoa.home.events.filter.App
> waitTillStarted Status is REBALANCING , 1
>
> ''2020-08-31 12:47:12 INFO org.apache.kafka.clients.Metadata update
> [Producer
> clientId=pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1-producer]
> Cluster ID: 1AOC2O8zSqq7nuQKACXFxQ
>
> ''2020-08-31 12:47:12 INFO org.apache.kafka.clients.Metadata update
> [Consumer
> clientId=pi-test-84721b40-dfa1-4848-b3de-5c7561048403-StreamThread-1-consumer,
> groupId=pi-test] Cluster ID: 1AOC2O8zSqq7nuQKACXFxQ
>
> ''2020-08-31 12:47:41 INFO com.businesssoa.home.events.filter.App
> waitTillStarted Status is REBALANCING , 2
>
> ''2020-08-31 12:48:11 INFO com.businesssoa.home.events.filter.App
> waitTillStarted Status is REBALANCING , 3
>
> Any guidance on how I can debug the streams as to why its constantly
> rebalancing?
>
> Steve
>
>


Advice for Kafka project in Africa...

2020-09-02 Thread cedric sende lubuele
Let me introduce myself, my name is Cedric and I am a network engineer 
passionate about new technologies and as part of my new activity, I am 
interested in Big Data. Currently, I live in Africa (Congo) and as everyone 
knows, Africa is very late in terms of IT infrastructure (the Cloud is 
difficult, we work a lot on premise).

While doing some research, I came across Kai Waehner's article (Kafka replace 
database?)
 and I would like to be able to get an idea about the possibilities of Kafka.

Let me explain, I am working on a project for integrating several databases 
(MySQL, MongoDB, Neo4j, ...) and I have to develop with my team, an alert 
system which must detect anomalies on different criteria linked to a person in 
the various departments of the company.
Would Kafka be a good solution in order to centralize all the data and create 
several analysis scripts to detect an anomaly and send back an alert message 
such as for example a suspect wanted by the police?

Thank you in advance



Sende Cedric / Network IT
sende.ced...@hotmail.com / 082/8446954

UPSAIL GROUP
http://upsail.co/

[http://upsail.co/wp-content/themes/upsail/images/logo.png]


A question related to quota in Kafka

2020-09-02 Thread Tallanel Siriel
Hello everyone.

I send you this mail because I have a question related to quota in Kafka.

I read the following articles there :

   - https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
   - https://kafka.apache.org/documentation/#design_quotas


My Kafka cluster has the version 0.10.1 embedded with the distribution HDP
2.6.2.
I have Kerberos enabled for Kafka.

I set a network bandwidth quota on a specific user to limit its production
throughput.
/usr/hdp/current/kafka-broker/bin/kafka-configs.sh --zookeeper ${ZK}
--alter --add-config 'producer_byte_rate=102400' --entity-type users
--entity-name 
with ZK the zookeeper quorum

With this command, if I understand well,  should be limited at 100
KB/s to produce event in any kafka topics.

Then I used the utility kafka-producer-perf-test.sh to perform some tests
to check if  is really limited to this bandwidth.
So first I take a kerberos ticket for .
And then, here are my tests :
/usr/hdp/current/kafka-broker/bin/kafka-producer-perf-test.sh --topic
 --broker-list ${BL} --messages 1000 --security-protocol
PLAINTEXTSASL
/usr/hdp/current/kafka-broker/bin/kafka-producer-perf-test.sh --topic
 --broker-list ${BL} --messages 1 --security-protocol
PLAINTEXTSASL
/usr/hdp/current/kafka-broker/bin/kafka-producer-perf-test.sh --topic
 --broker-list ${BL} --messages 10 --security-protocol
PLAINTEXTSASL
With
- ${BL} the bootstrap kafka servers with the right ports.
-  : a topic in my kafka cluster

Here are the results in a table :
start.time  end.time  compression  message.size  batch.size
 total.data.sent.in.MB  MB.sec  total.data.sent.in.nMsg  nMsg.sec KB.sec
2020-09-02 10:47:59:455  2020-09-02 10:48:01:035 0 100 200 0,1 0,0604 1000
632,9114 61,8496
2020-09-02 10:49:20:093  2020-09-02 10:49:22:438 0 100 200 0,95 0,4067
1 4264,392

416,4608
2020-09-02 10:49:49:649  2020-09-02 10:51:10:424 0 100 200 9,54 0,1181
10 1238,007 120,9344

What I don't understand is the test with 10 000 and 100 000 messages.
If I understand well the total bandwidth used is 416 KB/s and 120 KB/s,
which is beyond the limit I set for this user.

I wanted to know how this is possible ? Why is the quota not respected ?

Thanks in advance for your help o/

Best regards.

Tallanel


MirrorMaker 2 - is starting mirroring from latest possible?

2020-09-02 Thread Iftach Ben-Yosef
Hello,

Whenever we add a new topic to the mirroring whitelist it starts to mirror
the entire content of the source topic. If the topic is large this can
create a long lag until the entire topic is mirrored, and it can also
create some smaller delays on other mirrored topics (i'm assuming this is
because the cluster is using most resources to handle the new topic). Once
the new topics lag is resolved everything seems fine.

Is it possible to tell MM2 to start mirroring new topics in the whitelist
from the latest messages, and ignore older messages in the topic?



Thanks,
Iftach

-- 
The above terms reflect a potential business arrangement, are provided 
solely as a basis for further discussion, and are not intended to be and do 
not constitute a legally binding obligation. No legally binding obligations 
will be created, implied, or inferred until an agreement in final form is 
executed in writing by all parties involved.


This email and any 
attachments hereto may be confidential or privileged.  If you received this 
communication by mistake, please don't forward it to anyone else, please 
erase all copies and attachments, and please let me know that it has gone 
to the wrong person. Thanks.