Re: Latency requirements for Kafka connect workers

2021-01-15 Thread Malcolm McFarland
Hey Ashish,

If I understand it correctly, the workers are actually using connect groups
via Kafka to coordinate (I believe via the configured
status.storage.topic and the Kafka consumer group.id). See here:
https://docs.confluent.io/5.5.0/connect/userguide.html#connect-userguide-distributed-config

Adding/removing connectors _is_ done via REST though:
https://docs.confluent.io/5.5.0/connect/references/restapi.html#connect-userguide-rest

Cheers,
Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of the
contents of this message is prohibited. The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If you have received this message in error,
please notify the sender immediately and delete the original message.

Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of the
contents of this message is prohibited. The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If you have received this message in error,
please notify the sender immediately and delete the original message.


On Fri, Jan 15, 2021 at 12:03 AM ashish sood  wrote:

> Hi Team,
>
> As per my understanding, the distributed Kafka connect workers communicate
> with each other over the rest interface. Is there any specific latency
> requirements i.e. there should be a maximum latency of 'x' ms to allow
> effective functioning of the Kafka connect distributed worker cluster.
>
> Regards
> Ashish
>


Latency requirements for Kafka connect workers

2021-01-15 Thread ashish sood
Hi Team,

As per my understanding, the distributed Kafka connect workers communicate
with each other over the rest interface. Is there any specific latency
requirements i.e. there should be a maximum latency of 'x' ms to allow
effective functioning of the Kafka connect distributed worker cluster.

Regards
Ashish


Re: Requirements

2019-09-13 Thread Hans Jespersen
Gwen Shapira published a great whitepaper with Reference Architectures for
all Kafka and Confluent components in big and small environements and for
bare metal, VMs, and all 3 major public clouds.

https://www.confluent.io/resources/apache-kafka-confluent-enterprise-reference-architecture/


On Fri, Sep 13, 2019 at 8:26 AM Peter Menapace <
peter.menap...@vanderlande.com> wrote:

> Hi all,
>
> I have a small small question. In my company we would like to use Apache
> Kafka with KSQL.
>
> And my small question is: which hardware requirements do you have to run
> Kafka and KSQL in small and big environments?
>
>
>
> Best regards,
>
>
>
> Peter
>
>
>
> Mit freundlichen Grüßen,
>
>
>
> *Peter Menapace*
>
> Senior IT Architect - ICT Projects WP-DE
>
> T +4923197942200 | M +4915112253549
>
>
>
> [image: cid:image002.jpg@01D3CDB0.7BF22000] <http://www.vanderlande.com/>
>
>
>
> *Vanderlande Industries B.V.*
>
> Vanderlandelaan 2, 5466 RB  Veghel |The Netherlands
>
> T +31 413 49 49 49 | www.vanderlande.com
>
>
>
> [image: LinkedIn.png]
> <http://www.linkedin.com/company/vanderlande-industries>  [image:
> Twitter.png] <https://twitter.com/vanderlande>  [image:
> cid:image005.png@01D3CDB0.7BF22000]
> <https://www.facebook.com/vanderlande.industries/>  [image:
> cid:image006.png@01D3CDB0.7BF22000]
> <https://www.instagram.com/vanderlande.official/>  [image: youtube.png]
> <http://www.youtube.com/user/Vanderlandetube>
>
>
>
> [image: cid:image008.jpg@01D3CDB0.7BF22000] <http://www.vanderlande.com/>
>
>
> --
> Vanderlande Industries GmbH
> Sitz der Gesellschaft:
> Dortmund, Deutschland
>
>
> Amtsgericht Dortmund: HRB 8539
> Geschäftsführer: Rene Veldink
> www.vanderlande.com
>
>
> Diese E-Mail enthält vertrauliche und/oder rechtlichgeschützte
> Informationen.
> Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich
> erhalten haben,
> informieren Sie bitte den Absender und löschen Sie diese Mail.
> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail und
> der darin enthaltenen Informationen sind nicht gestattet.
>
> This e-mail may contain confidential and/or privilegedinformation.
> If you are not the intended recipient (or have received this e-mail in
> error) please notify the sender immediately and delete this e-mail.
> Any unauthorized copying, disclosure or distribution of the material in
> this e-mail is strictly forbidden.
>
> Think green: do you really need to print this e-mail?
>
>


Requirements

2019-09-13 Thread Peter Menapace
Hi all,
I have a small small question. In my company we would like to use Apache Kafka 
with KSQL.
And my small question is: which hardware requirements do you have to run Kafka 
and KSQL in small and big environments?

Best regards,

Peter

Mit freundlichen Grüßen,

Peter Menapace
Senior IT Architect - ICT Projects WP-DE
T +4923197942200 | M +4915112253549

[cid:image002.jpg@01D3CDB0.7BF22000]<http://www.vanderlande.com/>

Vanderlande Industries B.V.
Vanderlandelaan 2, 5466 RB  Veghel |The Netherlands
T +31 413 49 49 49 | www.vanderlande.com<http://www.vanderlande.com/>

[LinkedIn.png]<http://www.linkedin.com/company/vanderlande-industries>  
[Twitter.png] <https://twitter.com/vanderlande>   
[cid:image005.png@01D3CDB0.7BF22000] 
<https://www.facebook.com/vanderlande.industries/>   
[cid:image006.png@01D3CDB0.7BF22000] 
<https://www.instagram.com/vanderlande.official/>   [youtube.png] 
<http://www.youtube.com/user/Vanderlandetube>

[cid:image008.jpg@01D3CDB0.7BF22000]<http://www.vanderlande.com/>


Vanderlande Industries GmbH
Sitz der Gesellschaft:
Dortmund, Deutschland

Amtsgericht Dortmund: HRB 8539
Geschäftsführer: Rene Veldink
www.vanderlande.com


Diese E-Mail enthält vertrauliche und/oder rechtlichgeschützte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben,
informieren Sie bitte den Absender und löschen Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail und der 
darin enthaltenen Informationen sind nicht gestattet.


This e-mail may contain confidential and/or privilegedinformation.
If you are not the intended recipient (or have received this e-mail in error) 
please notify the sender immediately and delete this e-mail.
Any unauthorized copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.


Think green: do you really need to print this e-mail?



RE: Kafka hardware requirements

2019-05-06 Thread Jean-Marc Hyppolite
Thank you Kurt for your insight.



Jean-Marc.



Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10




From: Kurt Rudolph 
Sent: Monday, May 6, 2019 9:22:24 AM
To: users@kafka.apache.org
Subject: Re: Kafka hardware requirements

I am currently running a few clusters on virtual machines connected to a
VSAN.  I have not had any major ssues.  My clusters are small; 3 virtual
machines  (4 cpu and 8 GB memory each).  I am routinely processing 20K
msg/sec with surges of over 35K msg/sec.  Based on my performance testing,
I don't think my cluster would have issues until I start reaching about 80K
msg/sec.  Your message throughput will be highly dependent on your message
size and whether you are compressing messages.  You can test our the
performance of your cluster using kafka-producer-perf-test and
kafka-consumer-perf-test.  This should give you a good idea of what your
configuration  can do.

The only issue I've had running on vms or a vsan is when VMWare itself has
issues.  Our admin had to reshuffle the VSAN one day and that caused high
IOWait on my cluster and caused a number of partitions to become out
of sync with the leader.

On Sun, May 5, 2019 at 2:50 PM Jean-Marc Hyppolite <
jean.marc.hyppol...@outlook.com> wrote:

> Hello,
>
> I would like to know the impact of running kafka in production on virtual
> machines connected to a SAN (Storage Area Network)? (I mean the impact on
> kafka performance, what would be a max limit in terms of number of messages
> per second, number of producers and consumers, kafka can deal with ...). If
> we have no choice of running kafka on that kind of "hardware" what would be
> the deployment guidelines along the with the max limit of number of
> messages per second, of producers and consumers?
>
> Thank you.
>
> Jean-Marc.
>


Re: Kafka hardware requirements

2019-05-06 Thread Kurt Rudolph
I am currently running a few clusters on virtual machines connected to a
VSAN.  I have not had any major ssues.  My clusters are small; 3 virtual
machines  (4 cpu and 8 GB memory each).  I am routinely processing 20K
msg/sec with surges of over 35K msg/sec.  Based on my performance testing,
I don't think my cluster would have issues until I start reaching about 80K
msg/sec.  Your message throughput will be highly dependent on your message
size and whether you are compressing messages.  You can test our the
performance of your cluster using kafka-producer-perf-test and
kafka-consumer-perf-test.  This should give you a good idea of what your
configuration  can do.

The only issue I've had running on vms or a vsan is when VMWare itself has
issues.  Our admin had to reshuffle the VSAN one day and that caused high
IOWait on my cluster and caused a number of partitions to become out
of sync with the leader.

On Sun, May 5, 2019 at 2:50 PM Jean-Marc Hyppolite <
jean.marc.hyppol...@outlook.com> wrote:

> Hello,
>
> I would like to know the impact of running kafka in production on virtual
> machines connected to a SAN (Storage Area Network)? (I mean the impact on
> kafka performance, what would be a max limit in terms of number of messages
> per second, number of producers and consumers, kafka can deal with ...). If
> we have no choice of running kafka on that kind of "hardware" what would be
> the deployment guidelines along the with the max limit of number of
> messages per second, of producers and consumers?
>
> Thank you.
>
> Jean-Marc.
>


Kafka hardware requirements

2019-05-05 Thread Jean-Marc Hyppolite
Hello,

I would like to know the impact of running kafka in production on virtual 
machines connected to a SAN (Storage Area Network)? (I mean the impact on kafka 
performance, what would be a max limit in terms of number of messages per 
second, number of producers and consumers, kafka can deal with ...). If we have 
no choice of running kafka on that kind of "hardware" what would be the 
deployment guidelines along the with the max limit of number of messages per 
second, of producers and consumers?

Thank you.

Jean-Marc.


Consumer ID requirements and usage clarification

2018-12-06 Thread David Luu
Can someone clarify the usage and requirements of consumer ID when
used/specified?

I came across this topic and brought that up in a comment in the thread,
someone else suggested it being worthy of another post

https://stackoverflow.com/questions/34550873/difference-between-groupid-and-consumerid-in-kafka-consumer/34553058?noredirect=1#comment9338_34553058

To summarize w/o going to that post/link:

Reading the documentation, for consumer.id, description states "Generated
automatically if not set." I assume this probably means if manually set,
consumer.id should be unique for each consumer? Curious to wonder what
happens if you reuse the consumer ID across consumers (like how one shares
the consumer group ID) - problems arise or it will work fine but just messy
for tracking/debugging active consumers in the group?


Re: Discussion on requirements for Data Encryption functionality in Kafka (KIP-317)

2018-10-08 Thread Sönke Liebau
Hi Mike,

that sounds good! I've not yet received any other feedback, but worst case
scenario is that just the two of us discuss this over a cup of coffee :)

I'll talk to the Summit organizers again about some sort of venue and get
back to you once I know more. Maybe we can get a few more people to join
the discussion once some more details are known.

Other than that, I am also very happy to gather feedback on the mailing
list from people who won't be able to make it to the Summit. So anybody who
can come up with some thoughts or requirements around encryption
functionality for Kafka, please don't hesitate to chime in!

Best regards,
Sönke

On Wed, Oct 3, 2018 at 3:03 AM mikegray...@gmail.com 
wrote:

> Hi Sönke,
>
> I would be very interested in participating in this conversation.  Very
> interested in how TDE might work in Kafka!  I’m coming with several
> colleagues and will see if they’re interested in participating as well.
>
> Thanks,
> Mike Grayson
>
> On 2018/10/02 11:19:36, Sönke Liebau 
> wrote:
> > Hi all,
> >
> > I have created KIP-317 [1] a while ago, which outlines an implementation
> > proposal to add transparent data encryption functionality to Kafka. The
> KIP
> > in its current form is somewhat rigid in its implementation, I will
> rework
> > this to become extensible over the next few days to allow for additional
> > implementations.
> >
> > I have discussed the current method of providing keys with a colleague
> and
> > while we agree that this is a valid use case for some people, there are
> > certainly a lot of other valid use cases out there as well.
> > To ensure that the initial implementation provides the necessary
> > flexibility I'd like some feedback from the community on what
> requirements
> > they would have around data encryption and key management.
> >
> > The following questions should serve as a starting point for the
> > discussion, please feel free to address anything that comes to mind
> which I
> > have not mentioned here:
> >
> > - Should encryption be configurable rather on the client or on the broker
> > and be pushed down to the client?
> > - Where should keys be stored?
> > - How much flexibility around keys is necessary - is there for example a
> > use case that would decide on a per message basis which key to use?
> > (imagine a topic containing top secret, secret and public data with three
> > different keys)
> > - Do we need functionality to prohibit publishing unencrypted messages to
> > topics based on that topics setup?
> >
> > Of course the mailing list is the first place that discussions like these
> > should take place, but sometimes I find a face to face discussion can be
> > quite useful as well, especially when discussing non-trivial topics (like
> > encryption). I have reached out to the organizers of the upcoming Kafka
> > Summit in SF and there might be a chance for us to get a room with a
> > whiteboard at some point (probably during lunch, when the room is
> otherwise
> > unused). Would people be interested in meeting up for 20 minutes to
> discuss
> > this in person? I'd be happy to provide a summary on the mailing list
> > afterwards of course.
> >
> > Look forward to hearing from all of you!
> >
> > Best regards,
> > Sönke
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+transparent+data+encryption+functionality
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: Discussion on requirements for Data Encryption functionality in Kafka (KIP-317)

2018-10-02 Thread mikegray831
Hi Sönke,

I would be very interested in participating in this conversation.  Very 
interested in how TDE might work in Kafka!  I’m coming with several colleagues 
and will see if they’re interested in participating as well.

Thanks,
Mike Grayson

On 2018/10/02 11:19:36, Sönke Liebau  
wrote: 
> Hi all,
> 
> I have created KIP-317 [1] a while ago, which outlines an implementation
> proposal to add transparent data encryption functionality to Kafka. The KIP
> in its current form is somewhat rigid in its implementation, I will rework
> this to become extensible over the next few days to allow for additional
> implementations.
> 
> I have discussed the current method of providing keys with a colleague and
> while we agree that this is a valid use case for some people, there are
> certainly a lot of other valid use cases out there as well.
> To ensure that the initial implementation provides the necessary
> flexibility I'd like some feedback from the community on what requirements
> they would have around data encryption and key management.
> 
> The following questions should serve as a starting point for the
> discussion, please feel free to address anything that comes to mind which I
> have not mentioned here:
> 
> - Should encryption be configurable rather on the client or on the broker
> and be pushed down to the client?
> - Where should keys be stored?
> - How much flexibility around keys is necessary - is there for example a
> use case that would decide on a per message basis which key to use?
> (imagine a topic containing top secret, secret and public data with three
> different keys)
> - Do we need functionality to prohibit publishing unencrypted messages to
> topics based on that topics setup?
> 
> Of course the mailing list is the first place that discussions like these
> should take place, but sometimes I find a face to face discussion can be
> quite useful as well, especially when discussing non-trivial topics (like
> encryption). I have reached out to the organizers of the upcoming Kafka
> Summit in SF and there might be a chance for us to get a room with a
> whiteboard at some point (probably during lunch, when the room is otherwise
> unused). Would people be interested in meeting up for 20 minutes to discuss
> this in person? I'd be happy to provide a summary on the mailing list
> afterwards of course.
> 
> Look forward to hearing from all of you!
> 
> Best regards,
> Sönke
> 
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+transparent+data+encryption+functionality
> 


Discussion on requirements for Data Encryption functionality in Kafka (KIP-317)

2018-10-02 Thread Sönke Liebau
Hi all,

I have created KIP-317 [1] a while ago, which outlines an implementation
proposal to add transparent data encryption functionality to Kafka. The KIP
in its current form is somewhat rigid in its implementation, I will rework
this to become extensible over the next few days to allow for additional
implementations.

I have discussed the current method of providing keys with a colleague and
while we agree that this is a valid use case for some people, there are
certainly a lot of other valid use cases out there as well.
To ensure that the initial implementation provides the necessary
flexibility I'd like some feedback from the community on what requirements
they would have around data encryption and key management.

The following questions should serve as a starting point for the
discussion, please feel free to address anything that comes to mind which I
have not mentioned here:

- Should encryption be configurable rather on the client or on the broker
and be pushed down to the client?
- Where should keys be stored?
- How much flexibility around keys is necessary - is there for example a
use case that would decide on a per message basis which key to use?
(imagine a topic containing top secret, secret and public data with three
different keys)
- Do we need functionality to prohibit publishing unencrypted messages to
topics based on that topics setup?

Of course the mailing list is the first place that discussions like these
should take place, but sometimes I find a face to face discussion can be
quite useful as well, especially when discussing non-trivial topics (like
encryption). I have reached out to the organizers of the upcoming Kafka
Summit in SF and there might be a chance for us to get a room with a
whiteboard at some point (probably during lunch, when the room is otherwise
unused). Would people be interested in meeting up for 20 minutes to discuss
this in person? I'd be happy to provide a summary on the mailing list
afterwards of course.

Look forward to hearing from all of you!

Best regards,
Sönke

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+transparent+data+encryption+functionality


Re: Windowed aggregations memory requirements

2017-05-03 Thread Eno Thereska
This is a timely question and we've updated the documentation here on capacity 
planning and sizing for Kafka Streams jobs: 
http://docs.confluent.io/current/streams/sizing.html 
. Any feedback welcome. 
It has scenarios with windowed stores too.

Thanks
Eno
> On 3 May 2017, at 18:51, Garrett Barton  wrote:
> 
> That depends on if your using event, processing or ingestion time.
> 
> My understanding is that if you play a record through that is T-6, the only
> way that 
> TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(TimeUnit.MINUTES.toMillis(5))
> would actually process that record in your window is if your using
> processing time.  Otherwise the record is skipped and no data is
> generated/calculated for that operation.  So depending on what your doing
> you would not increase any more memory usage than when consuming from
> real-time.
> 
> On Wed, May 3, 2017 at 3:37 AM, João Peixoto 
> wrote:
> 
>> The base question I'm trying to answer is "how much memory does my instance
>> need".
>> 
>> Considering a use case where I want to keep a rolling average on a tumbling
>> window of 1 minute size allowing for late arrivals up to 5 minutes (lower
>> bound) we would have something like this:
>> 
>> TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(
>> TimeUnit.MINUTES.toMillis(5))
>> 
>> The aggregate key size is 8 bytes, the average value is 8 bytes and for
>> de-duplication purposes we need to keep track of which messages we saw
>> already, so a list of keys averaging 10 entries.
>> 
>> If I understand correctly this means that each window will be on average 96
>> bytes in size.
>> 
>> A single topic generates 100 messages/minute, which aggregate into 10
>> independent windows.
>> 
>> On any given point in time the windowed aggregates require 960 bytes of
>> memory at least.
>> 
>> Here's the confusing part. Lets say I found an issue with my averaging
>> operation and I want to reprocess the last 10 hours worth of messages.
>> 
>> 1. Windows will be regenerated, since most likely they were cleaned up
>> already
>> 2. The retention policy now has different semantics? If I had a late
>> arrival of 6 minutes, all of the sudden the reprocessing will account for
>> it right? Since the window is now active due to recreation (Assuming my app
>> is capable of processing all messages under 5 minutes)
>> 3. I'll be keeping 10 windows * (60 * 10) minutes for the first 5 minutes,
>> so my memory requirement is now 576,000 bytes? This is orders of magnitude
>> bigger.
>> 
>> I hope this gets my doubts across clearly, feel free to ask more details.
>> And thanks in advance
>> 



Re: Windowed aggregations memory requirements

2017-05-03 Thread Garrett Barton
That depends on if your using event, processing or ingestion time.

My understanding is that if you play a record through that is T-6, the only
way that 
TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(TimeUnit.MINUTES.toMillis(5))
would actually process that record in your window is if your using
processing time.  Otherwise the record is skipped and no data is
generated/calculated for that operation.  So depending on what your doing
you would not increase any more memory usage than when consuming from
real-time.

On Wed, May 3, 2017 at 3:37 AM, João Peixoto 
wrote:

> The base question I'm trying to answer is "how much memory does my instance
> need".
>
> Considering a use case where I want to keep a rolling average on a tumbling
> window of 1 minute size allowing for late arrivals up to 5 minutes (lower
> bound) we would have something like this:
>
> TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(
> TimeUnit.MINUTES.toMillis(5))
>
> The aggregate key size is 8 bytes, the average value is 8 bytes and for
> de-duplication purposes we need to keep track of which messages we saw
> already, so a list of keys averaging 10 entries.
>
> If I understand correctly this means that each window will be on average 96
> bytes in size.
>
> A single topic generates 100 messages/minute, which aggregate into 10
> independent windows.
>
> On any given point in time the windowed aggregates require 960 bytes of
> memory at least.
>
> Here's the confusing part. Lets say I found an issue with my averaging
> operation and I want to reprocess the last 10 hours worth of messages.
>
> 1. Windows will be regenerated, since most likely they were cleaned up
> already
> 2. The retention policy now has different semantics? If I had a late
> arrival of 6 minutes, all of the sudden the reprocessing will account for
> it right? Since the window is now active due to recreation (Assuming my app
> is capable of processing all messages under 5 minutes)
> 3. I'll be keeping 10 windows * (60 * 10) minutes for the first 5 minutes,
> so my memory requirement is now 576,000 bytes? This is orders of magnitude
> bigger.
>
> I hope this gets my doubts across clearly, feel free to ask more details.
> And thanks in advance
>


Windowed aggregations memory requirements

2017-05-03 Thread João Peixoto
The base question I'm trying to answer is "how much memory does my instance
need".

Considering a use case where I want to keep a rolling average on a tumbling
window of 1 minute size allowing for late arrivals up to 5 minutes (lower
bound) we would have something like this:

TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(TimeUnit.MINUTES.toMillis(5))

The aggregate key size is 8 bytes, the average value is 8 bytes and for
de-duplication purposes we need to keep track of which messages we saw
already, so a list of keys averaging 10 entries.

If I understand correctly this means that each window will be on average 96
bytes in size.

A single topic generates 100 messages/minute, which aggregate into 10
independent windows.

On any given point in time the windowed aggregates require 960 bytes of
memory at least.

Here's the confusing part. Lets say I found an issue with my averaging
operation and I want to reprocess the last 10 hours worth of messages.

1. Windows will be regenerated, since most likely they were cleaned up
already
2. The retention policy now has different semantics? If I had a late
arrival of 6 minutes, all of the sudden the reprocessing will account for
it right? Since the window is now active due to recreation (Assuming my app
is capable of processing all messages under 5 minutes)
3. I'll be keeping 10 windows * (60 * 10) minutes for the first 5 minutes,
so my memory requirement is now 576,000 bytes? This is orders of magnitude
bigger.

I hope this gets my doubts across clearly, feel free to ask more details.
And thanks in advance