Re: Kafka Connect Integration

Sagar Thu, 23 Aug 2018 11:02:34 -0700

Hi Chirstofer,

Thanks for the detailed responses. I would like to ask a couple of more
questions(which may be borderline naive or stupid :D ).


First thing that I would like to know- ignore my lack of knowledge on PLCs-
but from what I understand are devices which are small devices used to
execute program instructions. These would have very small memory footprints
as well I believe? Also, when you say the Siemens one can handle 20
connections, would it be from different devices connecting to it? The
reason I ask these questions are these ->

a) The way the kafka-connect framework is executed is by installing the
whole framework with all the relevant jars needed on the classpath. So, if
you talk about the JDBC connector for K-Connect, it would need the mysql
driver jar(for example) and other jars needed to support the framework. If
we say choose to use avro, then we would need more jars to support that.
Would we be able to install all that?

b) Also, if multiple devices do connect to it, then won't we have events
arriving out of order from them? Does the ordering matter amongst events
that are being pushed?

Regarding the infinite loop question, the reason JDBC connector uses that
is that it creates tasks for a given table and fires queries to find
deltas. So, if the polling frequency is 2 seconds, and it last ran on
12.00.00 then it would run at 12.00.02 to figure out what changed in that
time frame. So, the way PlcReaders read() runs, would it keep returning
newer data?

We can skip over the rest of the parts, but looking at parts a and b above,
would it make sense to have something like a kafka-connect framework for
pushing data to Kafka? Also, from the github link, the drivers are to be
supported in 3 languages as well. How would that play out?

Again- apologies if the questions seem stupid.

Thanks!
Sagar.

On Wed, Aug 22, 2018 at 10:39 PM Christofer Dutz <christofer.d...@c-ware.de>
wrote:

> Hi Sagar,
>
> great that you managed to have a look ... I'll try to answer your
> questions.
> (I like to answer them postfix as whenever emails are sort of answered
> in-line, they are extremely hard to read and follow on mobile email clients
> __ )
>
> First of all I created the original plugin via the archetype for
> kafka-connect plugins. The next thing I did, was to have a look at the code
> of the JDBC Kafka Connect plugin (as you might have guessed) as I thought
> that it would have similar structure as we do. Unfortunately I think the
> JDBC plugin is far more complex than the plc4x connector will have to be. I
> sort of picked some of the things I liked with the archetype and some I
> liked with the jdbc ... if there was a third, even cooler option ... I will
> definitely have missed that. So if you think there is a thing worth
> changing ... you can change anything you like.
>
> 1)
> The code of the jdbc plugin showed such a while(true) loop, however I
> think this was because the jdbc query could return a lot of rows and hereby
> Kafka events. In our case we have one request and get one response. The
> code in my example directly calls "get()" on the request and is hereby
> blocking. I don't know if this is good, but from reading the jdbc example,
> this should be blocking too ...
> So the PlcReaders read() method returns a completable future ... this
> could be completed asynchronously and the callback could fire the kafka
> events, but I didn't know if this was ok with kafka. If it is possible,
> please have a look at this example code:
> https://github.com/apache/incubator-plc4x/blob/master/plc4j/protocols/s7/src/test/java/org/apache/plc4x/java/s7/S7PlcReaderSample.java
> It demonstrates with comments the different usage types.
>
> While at it ... is there also an option for a Kafka connector that is able
> to push data? So if an incoming event arrives, this is automatically pushed
> without a fixed polling interval?
>
> 2)
> I have absolutely no idea as I am not quite familiar with the concepts
> inside kafka. What I do know is that probably the partition-key should be
> based upon the connection url. The problem is, that with kafka I could have
> 1000 nodes connecting to one PLC. While Kafka wouldn't have problems with
> that, the PLCs have very limited resources. So as far as I decoded the
> responses of my Siemens S7 1200 it can handle up to 20 connections (Usually
> a control-system already consuming 2-3 of them) ... I think it would be
> ideal, if on one Kafka node (or partition) there would be one PlcConnection
> ... this connection should then be shared among all requests to a PLC with
> a shared connection url (I hope I'm not writing nonsense). So if a
> workerTask is responsible for managing all request to one partition, then
> I'd say it should be 1 ... otherwise the number could be bigger.
>
> If it makes things easier, I'm absolutely fine with using those
> ConnectorUtils
>
> Regarding the connector offsets ... are you referring to that counter
> Kafka uses to let the clients know the sequence of events and which they
> use to sort of say: "Hi, I have number 237367 of topic 'ABC', plese
> continue" ... is that what you are referring to? If it is, well ... I have
> to admit ... I don't know ... ok ... if it isn't then probably also ;-)
> How do other plugins do this?
>
> 3)
> Well I guess both options would be cool ... JSON is definitely simpler,
> but for high volume transports the binary counterparts definitely are worth
> consideration. Currently PLC4X tries to deliver what you request, but
> that's actually something we're currently discussing on refactoring. But
> for the moment - as shown in the example code I referenced a few lines
> above - you do a TypedRequest and for example ask for an Integer, then you
> will receive an array (probably of size 1) of Integers.
>
> 4)
> Well I agree ... well at least I can't even say that I make a secret about
> where I stole things from ;-)
>
> If I can be of any assistance ... just ask.
>
> Thanks for taking the time.
>
> Chris
>
>
>
> Am 22.08.18, 17:55 schrieb "Sagar" <sagarmeansoc...@gmail.com>:
>
>     Hi All,
>
>     I was going through the K-Connect stubs created by Chris in the kafka
>     feature branch.
>
>     Some of the findings I found are here(let me know if they are valid or
> not):
>
>     1)
>
> https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L98
>
>     Should this block of code be within an infinite loop like while(true)?
> I am
>     not exactly sure of the semantics of the PlcReader hence asking this
>     question.
>
>     2) Another question is, what are the maxTasks that we envision here?
>
> https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/Plc4xSourceConnector.java#L46
>
>     Also, as part of documentation, there's a utility called ConnectorUtils
>     which typically should be used to create the configs(not a hard and
> fast
>     rule though):
>
>
> https://docs.confluent.io/current/connect/javadocs/index.html?org/apache/kafka/connect/util/ConnectorUtils.html
>
>     If we go that route, then we also need to specify how the offsets
> would be
>     stored in the offsets topic(by using the task name). So, if it can be
>     figured out as to how would the connectors be setup, then that'll be
>     helpful.
>
>     3) While building the SourceRecord ->
>
>
> https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L109
>
>     , we would also need some DataConverter layer to have them mapped to
> the
>     connect types. Also, which message types would be supported? Json or
> binary
>     protocols like Avro/protobuf etc or some other protocols? Those things
>     might also need to be factored in.
>
>     4) Lastly, need to remove the JdbcSourceTask from the catch block here
> :) ->
>
>
> https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L67
>
>     Thanks!
>     Sagar.
>
>
>

Re: Kafka Connect Integration

Reply via email to