Re: Kafka Connect Integration

Sagar Tue, 28 Aug 2018 06:45:25 -0700

Hi Christofer,

Just thought I will remind you with this one. There's no rush from my side
but I see you're caught up with a lot of things so sending a reminder :)


Thanks!
Sagar.

On Thu, Aug 23, 2018 at 11:22 PM Sagar <sagarmeansoc...@gmail.com> wrote:

> Hi Chirstofer,
>
> Thanks for the detailed responses. I would like to ask a couple of more
> questions(which may be borderline naive or stupid :D ).
>
> First thing that I would like to know- ignore my lack of knowledge on
> PLCs- but from what I understand are devices which are small devices used
> to execute program instructions. These would have very small memory
> footprints as well I believe? Also, when you say the Siemens one can handle
> 20 connections, would it be from different devices connecting to it? The
> reason I ask these questions are these ->
>
> a) The way the kafka-connect framework is executed is by installing the
> whole framework with all the relevant jars needed on the classpath. So, if
> you talk about the JDBC connector for K-Connect, it would need the mysql
> driver jar(for example) and other jars needed to support the framework. If
> we say choose to use avro, then we would need more jars to support that.
> Would we be able to install all that?
>
> b) Also, if multiple devices do connect to it, then won't we have events
> arriving out of order from them? Does the ordering matter amongst events
> that are being pushed?
>
> Regarding the infinite loop question, the reason JDBC connector uses that
> is that it creates tasks for a given table and fires queries to find
> deltas. So, if the polling frequency is 2 seconds, and it last ran on
> 12.00.00 then it would run at 12.00.02 to figure out what changed in that
> time frame. So, the way PlcReaders read() runs, would it keep returning
> newer data?
>
> We can skip over the rest of the parts, but looking at parts a and b
> above, would it make sense to have something like a kafka-connect framework
> for pushing data to Kafka? Also, from the github link, the drivers are to
> be supported in 3 languages as well. How would that play out?
>
> Again- apologies if the questions seem stupid.
>
> Thanks!
> Sagar.
>
> On Wed, Aug 22, 2018 at 10:39 PM Christofer Dutz <
> christofer.d...@c-ware.de> wrote:
>
>> Hi Sagar,
>>
>> great that you managed to have a look ... I'll try to answer your
>> questions.
>> (I like to answer them postfix as whenever emails are sort of answered
>> in-line, they are extremely hard to read and follow on mobile email clients
>> __ )
>>
>> First of all I created the original plugin via the archetype for
>> kafka-connect plugins. The next thing I did, was to have a look at the code
>> of the JDBC Kafka Connect plugin (as you might have guessed) as I thought
>> that it would have similar structure as we do. Unfortunately I think the
>> JDBC plugin is far more complex than the plc4x connector will have to be. I
>> sort of picked some of the things I liked with the archetype and some I
>> liked with the jdbc ... if there was a third, even cooler option ... I will
>> definitely have missed that. So if you think there is a thing worth
>> changing ... you can change anything you like.
>>
>> 1)
>> The code of the jdbc plugin showed such a while(true) loop, however I
>> think this was because the jdbc query could return a lot of rows and hereby
>> Kafka events. In our case we have one request and get one response. The
>> code in my example directly calls "get()" on the request and is hereby
>> blocking. I don't know if this is good, but from reading the jdbc example,
>> this should be blocking too ...
>> So the PlcReaders read() method returns a completable future ... this
>> could be completed asynchronously and the callback could fire the kafka
>> events, but I didn't know if this was ok with kafka. If it is possible,
>> please have a look at this example code:
>> https://github.com/apache/incubator-plc4x/blob/master/plc4j/protocols/s7/src/test/java/org/apache/plc4x/java/s7/S7PlcReaderSample.java
>> It demonstrates with comments the different usage types.
>>
>> While at it ... is there also an option for a Kafka connector that is
>> able to push data? So if an incoming event arrives, this is automatically
>> pushed without a fixed polling interval?
>>
>> 2)
>> I have absolutely no idea as I am not quite familiar with the concepts
>> inside kafka. What I do know is that probably the partition-key should be
>> based upon the connection url. The problem is, that with kafka I could have
>> 1000 nodes connecting to one PLC. While Kafka wouldn't have problems with
>> that, the PLCs have very limited resources. So as far as I decoded the
>> responses of my Siemens S7 1200 it can handle up to 20 connections (Usually
>> a control-system already consuming 2-3 of them) ... I think it would be
>> ideal, if on one Kafka node (or partition) there would be one PlcConnection
>> ... this connection should then be shared among all requests to a PLC with
>> a shared connection url (I hope I'm not writing nonsense). So if a
>> workerTask is responsible for managing all request to one partition, then
>> I'd say it should be 1 ... otherwise the number could be bigger.
>>
>> If it makes things easier, I'm absolutely fine with using those
>> ConnectorUtils
>>
>> Regarding the connector offsets ... are you referring to that counter
>> Kafka uses to let the clients know the sequence of events and which they
>> use to sort of say: "Hi, I have number 237367 of topic 'ABC', plese
>> continue" ... is that what you are referring to? If it is, well ... I have
>> to admit ... I don't know ... ok ... if it isn't then probably also ;-)
>> How do other plugins do this?
>>
>> 3)
>> Well I guess both options would be cool ... JSON is definitely simpler,
>> but for high volume transports the binary counterparts definitely are worth
>> consideration. Currently PLC4X tries to deliver what you request, but
>> that's actually something we're currently discussing on refactoring. But
>> for the moment - as shown in the example code I referenced a few lines
>> above - you do a TypedRequest and for example ask for an Integer, then you
>> will receive an array (probably of size 1) of Integers.
>>
>> 4)
>> Well I agree ... well at least I can't even say that I make a secret
>> about where I stole things from ;-)
>>
>> If I can be of any assistance ... just ask.
>>
>> Thanks for taking the time.
>>
>> Chris
>>
>>
>>
>> Am 22.08.18, 17:55 schrieb "Sagar" <sagarmeansoc...@gmail.com>:
>>
>>     Hi All,
>>
>>     I was going through the K-Connect stubs created by Chris in the kafka
>>     feature branch.
>>
>>     Some of the findings I found are here(let me know if they are valid
>> or not):
>>
>>     1)
>>
>> https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L98
>>
>>     Should this block of code be within an infinite loop like
>> while(true)? I am
>>     not exactly sure of the semantics of the PlcReader hence asking this
>>     question.
>>
>>     2) Another question is, what are the maxTasks that we envision here?
>>
>> https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/Plc4xSourceConnector.java#L46
>>
>>     Also, as part of documentation, there's a utility called
>> ConnectorUtils
>>     which typically should be used to create the configs(not a hard and
>> fast
>>     rule though):
>>
>>
>> https://docs.confluent.io/current/connect/javadocs/index.html?org/apache/kafka/connect/util/ConnectorUtils.html
>>
>>     If we go that route, then we also need to specify how the offsets
>> would be
>>     stored in the offsets topic(by using the task name). So, if it can be
>>     figured out as to how would the connectors be setup, then that'll be
>>     helpful.
>>
>>     3) While building the SourceRecord ->
>>
>>
>> https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L109
>>
>>     , we would also need some DataConverter layer to have them mapped to
>> the
>>     connect types. Also, which message types would be supported? Json or
>> binary
>>     protocols like Avro/protobuf etc or some other protocols? Those things
>>     might also need to be factored in.
>>
>>     4) Lastly, need to remove the JdbcSourceTask from the catch block
>> here :) ->
>>
>>
>> https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L67
>>
>>     Thanks!
>>     Sagar.
>>
>>
>>

Re: Kafka Connect Integration

Reply via email to