Re: Support for Uni-directional data-diode?

2020-12-22 Thread Danny - Terafence
Thank you Jan,

The aim is to secure the sending side infrastructure and assets. Deny any known 
and unkown attacks from the "outside" while maintaining real-time data flowing 
outbound.
Data integrity may be maintained in various ways if the forwarded protocol has 
such options.

I wonder if KAFKA can run over UDP... for starters..

Anyone knows?

On Dec 22, 2020 23:25, jan  wrote:
Dunno if it helps (if in doubt, probably not) but a search for the
term gets some useful articles (inc.
) and a company

who may be worth contacting (I'm not affiliated in any way).

The first question I'd ask myself is, would a burn-to-dvd solution
work? Failing that, basic stuff like email?
In any case, what if the data's corrupted, how can the server's detect
and re-request? What are you protecting against exactly? Stuff like
that.

jan

On 22/12/2020, Danny - Terafence  wrote:
> Hello,
>
> Merry Christmas,
>
> My name is Danny Michaeli, I am Terafence’s Technical Services Manager.
>
> One of our customers is using KAFKA to gather ICS SEIM data to collect and
> forward to AI servers.
>
> They have requested us to propose a uni-directional solution to avoid being
> exposed from the AI server site.
>
> Can you, please advise as to if and how can this be done?
>
> B. Regards,
>
> Danny Michaeli
> Technical Services Manager
> [Logo]
> Tel.: +972-73-3791191
> Cell: +972-52-882-3108
>
>


Producer batch size

2020-12-22 Thread Dhirendra Singh
Hi,
I have a question related to batch.size producer configuration.
What happens when batch.size has reached and the producer app thread sends
more data ?
Does the thread block till space becomes available in the buffer
containing the batch ?

Thanks,
Dhirendra.


Re: Support for Uni-directional data-diode?

2020-12-22 Thread jan
Dunno if it helps (if in doubt, probably not) but a search for the
term gets some useful articles (inc.
) and a company

who may be worth contacting (I'm not affiliated in any way).

The first question I'd ask myself is, would a burn-to-dvd solution
work? Failing that, basic stuff like email?
In any case, what if the data's corrupted, how can the server's detect
and re-request? What are you protecting against exactly? Stuff like
that.

jan

On 22/12/2020, Danny - Terafence  wrote:
> Hello,
>
> Merry Christmas,
>
> My name is Danny Michaeli, I am Terafence’s Technical Services Manager.
>
> One of our customers is using KAFKA to gather ICS SEIM data to collect and
> forward to AI servers.
>
> They have requested us to propose a uni-directional solution to avoid being
> exposed from the AI server site.
>
> Can you, please advise as to if and how can this be done?
>
> B. Regards,
>
> Danny Michaeli
> Technical Services Manager
> [Logo]
> Tel.: +972-73-3791191
> Cell: +972-52-882-3108
>
>


Support for Uni-directional data-diode?

2020-12-22 Thread Danny - Terafence
Hello,

Merry Christmas,

My name is Danny Michaeli, I am Terafence’s Technical Services Manager.

One of our customers is using KAFKA to gather ICS SEIM data to collect and 
forward to AI servers.

They have requested us to propose a uni-directional solution to avoid being 
exposed from the AI server site.

Can you, please advise as to if and how can this be done?

B. Regards,

Danny Michaeli
Technical Services Manager
[Logo]
Tel.: +972-73-3791191
Cell: +972-52-882-3108



Re: Kafka Scaling Ideas

2020-12-22 Thread Haruki Okada
Hm, it's an optimization for "first layer", so if the bottleneck is in
"second layer" (i.e. DB write) as you mentioned, it shouldn't make much
difference I think.

2020年12月22日(火) 16:02 Yana K :

> I thought about it but then we don't have much time - will it optimize
> performance?
>
> On Mon, Dec 21, 2020 at 4:16 PM Haruki Okada  wrote:
>
> > About "first layer" right?
> > Then it's better to make sure that not get() the result of
> Producer#send()
> > for each message, because in that way, it spoils the ability of
> > producer-batching.
> > Kafka producer batches messages by default and it's very efficient, so if
> > you produce in async way, it rarely becomes a bottleneck in general.
> > > Also are there any producer optimizations
> >
> > By the way, if "first layer" just filters then produces messages without
> > interacting with any other external DB, using KafkaStreams should be much
> > easier.
> >
> > 2020年12月22日(火) 3:27 Yana K :
> >
> > > Thanks!
> > >
> > > Also are there any producer optimizations anyone can think of in this
> > > scenario?
> > >
> > >
> > >
> > > On Mon, Dec 21, 2020 at 8:58 AM Joris Peeters <
> > joris.mg.peet...@gmail.com>
> > > wrote:
> > >
> > > > I'd probably just do it by experiment for your concrete data.
> > > >
> > > > Maybe generate a few million synthetic data rows, and for-each-batch
> > > insert
> > > > them into a dev DB, with an outer grid search over various candidate
> > > batch
> > > > sizes. You're looking to optimise for flat-out rows/s, so whichever
> > batch
> > > > size wins (given a fixed number of total rows) is near-optimal.
> > > > You can repeat the exercise with N simultaneous threads to inspect
> how
> > > > batch sizes and multiple partitions P would interact (which might
> well
> > be
> > > > sublinear in P in case of e.g. transactions etc).
> > > >
> > > > On Mon, Dec 21, 2020 at 4:48 PM Yana K  wrote:
> > > >
> > > > > Thanks Haruki and Joris.
> > > > >
> > > > > Haruki:
> > > > > Thanks for the detailed calculations. Really appreciate it. What
> > > tool/lib
> > > > > is used to load test kafka?
> > > > > So we've one consumer group and running 7 instances of the
> > application
> > > -
> > > > > that should be good enough - correct?
> > > > >
> > > > > Joris:
> > > > > Great point.
> > > > > DB insert is a bottleneck (and hence moved it to its own layer) -
> and
> > > we
> > > > > are batching but wondering what is the best way to calculate the
> > batch
> > > > > size.
> > > > >
> > > > > Thanks,
> > > > > Yana
> > > > >
> > > > > On Mon, Dec 21, 2020 at 1:39 AM Joris Peeters <
> > > > joris.mg.peet...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Do you know why your consumers are so slow? 12E6msg/hour is
> > > msg/s,
> > > > > > which is not very high from a Kafka point-of-view. As you're
> doing
> > > > > database
> > > > > > inserts, I suspect that is where the bottleneck lies.
> > > > > >
> > > > > > If, for example, you're doing a single-row insert in a SQL DB for
> > > every
> > > > > > message then this would incur a lot of overhead. Yes, you can
> > > somewhat
> > > > > > alleviate that by parallellising - i.e. increasing the partition
> > > count
> > > > -
> > > > > > but it is also worth looking at batch inserts, if you aren't yet.
> > > Say,
> > > > > each
> > > > > > consumer waits for 1000 messages or 5 seconds to have passed
> > > (whichever
> > > > > > comes first) and then does a single bulk insert of the msgs it
> has
> > > > > > received, followed by a manual commit.
> > > > > >
> > > > > > [A] you might already be doing this and [B] your DB of choice
> might
> > > not
> > > > > > support bulk inserts (although most do), but otherwise I'd expect
> > > this
> > > > to
> > > > > > work a lot better than increasing the partition count.
> > > > > >
> > > > > > On Mon, Dec 21, 2020 at 8:10 AM Haruki Okada <
> ocadar...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > About load test:
> > > > > > > I think it'd be better to monitor per-message process latency
> and
> > > > > > estimate
> > > > > > > required partition count based on it because it determines the
> > max
> > > > > > > throughput per single partition.
> > > > > > > - Say you have to process 12 million messages/hour = 
> > > > messages/sec
> > > > > .
> > > > > > > - If you have 7 partitions (thus 7 parallel consumers at
> > maximum),
> > > > > single
> > > > > > > consumer should process  / 7 = 476 messages/sec
> > > > > > > - It means, process latency per single message should be lower
> > than
> > > > 2.1
> > > > > > > milliseconds (1000 / 476)
> > > > > > >   => If you have 14 partitions, it becomes 4.2 milliseconds
> > > > > > >
> > > > > > > So required partition count can be calculated by per-message
> > > process
> > > > > > > latency. (I think Spring-Kafka can be easily integrated with
> > > > prometheus
> > > > > > so
> > > > > > > you can use it to measure that)
> > > > > > >
> > > > > > > About increasing instance count:
> > > > > > 

Re: --override option for bin/connect-distributed.sh

2020-12-22 Thread Aki Yoshida
Hi Tom,
thank for your suggestion. I'll follow the KIP process.
regards, aki

El mar, 22 dic 2020 a las 10:18, Tom Bentley () escribió:
>
> Hi Aki,
>
> Since this is a change to a public API of the project it would need to be
> done through the KIP process [1]. Since writing the KIP in this case isn't
> much work, I suggest you write it up as a proposal and start a KIP
> discussion thread on the dev@ mailing list, then interested people can
> comment there.
>
> Kind regards,
>
> Tom
>
> [1]:
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>
> On Mon, Dec 21, 2020 at 8:57 PM Aki Yoshida  wrote:
>
> > Hi Kafka team,
> > I think the --override option of Kafka is very practical in starting
> > Kafka for various situations without changing the properties file. I
> > missed this feature in Kafka-Connect and I wanted to have it, so I
> > created a patch in this commit in my forked repo.
> >
> > https://github.com/elakito/kafka/commit/1e54536598d1ce328d0aee10edb728270cc04af1
> >
> > Could someone tell me if this is a good idea or a bad idea? If bad, is
> > there some alternative way to customise the properties? If good, can I
> > create a PR?
> > I would appreciate for your suggestion.
> > Thanks.
> > regards, aki
> >
> >


Re: --override option for bin/connect-distributed.sh

2020-12-22 Thread Tom Bentley
Hi Aki,

Since this is a change to a public API of the project it would need to be
done through the KIP process [1]. Since writing the KIP in this case isn't
much work, I suggest you write it up as a proposal and start a KIP
discussion thread on the dev@ mailing list, then interested people can
comment there.

Kind regards,

Tom

[1]:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

On Mon, Dec 21, 2020 at 8:57 PM Aki Yoshida  wrote:

> Hi Kafka team,
> I think the --override option of Kafka is very practical in starting
> Kafka for various situations without changing the properties file. I
> missed this feature in Kafka-Connect and I wanted to have it, so I
> created a patch in this commit in my forked repo.
>
> https://github.com/elakito/kafka/commit/1e54536598d1ce328d0aee10edb728270cc04af1
>
> Could someone tell me if this is a good idea or a bad idea? If bad, is
> there some alternative way to customise the properties? If good, can I
> create a PR?
> I would appreciate for your suggestion.
> Thanks.
> regards, aki
>
>