Re: Kafka Connect Integration

Christofer Dutz Wed, 29 Aug 2018 07:21:50 -0700

Hi Sagar,

Great that we seem to be on the same page now ;-)


Regarding the "Kafka Connecting" ... what I meant is that the 
Kafca-Connect-PLC4X-Instance connects ... I was assuming the driver to be 
running on a Kafka Node, but that's just due to my limited knowledge of 
everything ;-)

Well the code for actively reading stuff from a PLC should already be in my 
example implementation. It should work out of the box this way ... As I have 
seen several Mock Drivers implemented in PLC4X, I am currently thinking of 
implementing one that you should be able to just import and use ... however I'm 
currently working hard on refactoring the API completely, so I would postpone 
that to after these changes are in there. But rest assured ... I would handle 
the refactoring so you could just assume that it works. 

Alternatively I could have you an account for our IoT VPN created. Then you 
could log-in to our VPN and talk to some real PLCs ...

I think I wanted to create an account for Julian, but my guy responsible for 
creating them was on holidays ... will re-check this.

Chris



Am 29.08.18, 16:11 schrieb "Sagar" <sagarmeansoc...@gmail.com>:

    Hi Chris,
    
    That's perfectly fine :)
    
    So, the way I understand this now is, we will have a bunch of worker
    nodes(in kafka connect terminology, a worker is a JVM process which runs a
    set of connectors/tasks to poll a source and push data to Kafka).
    
    So, vis-a-vis a JDBC connection, we will have a connection URL which will
    let us connect to these PLC devices poll(poll in the sense you meant it
    above), and then push data to Kafka. If this looks fine, then can you give
    me some documentation to refer to and also how can I start testing these?
    
    And just one thing I wanted to clarify when you say Kafka nodes connecting
    to devices. That's something which doesn't happen. Kafka doesn't connect to
    any device. I think you just mean it in a more abstract way right?
    
    @Julian,
    
    Thanks, I was going through the link you sent. So, you're saying via
    scraping, we can push events to Kafka? Is that already happening and we
    should look to move this functionality out?
    
    Thanks!
    Sagar.
    
    
    On Wed, Aug 29, 2018 at 11:05 AM Julian Feinauer <
    j.feina...@pragmaticminds.de> wrote:
    
    > Hey Sagar,
    >
    > hey Chris,
    >
    >
    >
    > I want to join your discussion for part b3 as this is something we usually
    > require.
    >
    > We use a module we call the plc-scraper for that task (the term scraping
    > in that content is borrowed from Prometheus where it is used extensively 
in
    > this context [1]).
    >
    > Generally speaking the scraper takes a config containing addresses,
    > addresses and scrape rates and runs than as daemon and pushes the scrape
    > results downstream (usually Kafka but we also use other "Queues").
    >
    >
    >
    > As we are currently rewriting this scraper I already considered donating
    > it to plc4x in the form of an example or perhaps even as a standalone
    > module.
    >
    >
    >
    > So I agree with Chris that this should not be part of the PlcDriver Level
    > but rather on another layer "on top" and I would be more interested in the
    > specification of a "line protocol" which describes how message are
    > serialized for Kafka (or other sources).
    >
    > Can we come up with a common "schema" which fits many use cases?
    >
    >
    >
    > Our messages contain the following informations:
    >
    > - timestamp
    >
    > - source
    >
    > - values
    >
    > - additional tags
    >
    >
    >
    > Best
    >
    > Julian
    >
    >
    >
    > [1] https://prometheus.io/docs/prometheus/latest/getting_started/
    >
    >
    >
    > Am 28.08.18, 22:53 schrieb "Christofer Dutz" <christofer.d...@c-ware.de>:
    >
    >
    >
    >     Hi Sagar,
    >
    >
    >
    >     sorry for not responding ... your mail must have skipped my eye ...
    > sorry for that.
    >
    >
    >
    >     a) PLC4X works exactly the same way ... it consists of plc4x-core,
    > which only contains the DriverManager and plc4x-api which contains the API
    > clases. So with these two jars you can build a full PLC4X application.
    >
    >     In order to connect to a PLC you need to add the jar containing the
    > required driver to the classpath.
    >
    >
    >
    >     b) Well if using something other than the connection url as partition
    > key as partition key, it is possible that multiple kafka connect nodes
    > would connect to the same PLC. In that case it could be a problem to
    > control the order. I guess using the timestamp when receiving a response
    > (Probably generated by the KC plc4x driver) could be a valid approach.
    >
    >
    >
    >     b2) Regarding the infinite loop ... I think we won't need such a
    > mechanism. If we think of a set of fields from a PLC, we can think of a 
PLC
    > as a one-row database table. Producing diffs should be a lot simpler that
    > way.
    >
    >
    >
    >     b3) Regarding push events ... PLC4X has a subscription mode next to
    > the polling. So it would be possible to also define PLC4X datasources that
    > actively push events to kafka ... I have to admit that this would be the
    > mode I would prefer most. But as not all protocols and PLCs support this
    > mode, I think it would be safest to use polling and to add push support
    > after that.
    >
    >
    >
    >     Regarding different languages: Currently we are concentrating mainly
    > on the Java implementation as it's the biggest challenge to understand and
    > implement the protocols. Porting them to other languages (especially C and
    > C++) shouldn't be as hard as implementing the first version. But that's
    > currently a base uncovered as we don't have the resources to implement all
    > of them at once.
    >
    >
    >
    >     And there are no stupid questions :-)
    >
    >
    >
    >     Hope I could answer all of yours. If not, just ask and I'll probably
    > not miss that one ;-)
    >
    >
    >
    >     Chris
    >
    >
    >
    >
    >
    >     Am 23.08.18, 19:52 schrieb "Sagar" <sagarmeansoc...@gmail.com>:
    >
    >
    >
    >         Hi Chirstofer,
    >
    >
    >
    >         Thanks for the detailed responses. I would like to ask a couple of
    > more
    >
    >         questions(which may be borderline naive or stupid :D ).
    >
    >
    >
    >         First thing that I would like to know- ignore my lack of knowledge
    > on PLCs-
    >
    >         but from what I understand are devices which are small devices
    > used to
    >
    >         execute program instructions. These would have very small memory
    > footprints
    >
    >         as well I believe? Also, when you say the Siemens one can handle 
20
    >
    >         connections, would it be from different devices connecting to it?
    > The
    >
    >         reason I ask these questions are these ->
    >
    >
    >
    >         a) The way the kafka-connect framework is executed is by
    > installing the
    >
    >         whole framework with all the relevant jars needed on the
    > classpath. So, if
    >
    >         you talk about the JDBC connector for K-Connect, it would need the
    > mysql
    >
    >         driver jar(for example) and other jars needed to support the
    > framework. If
    >
    >         we say choose to use avro, then we would need more jars to support
    > that.
    >
    >         Would we be able to install all that?
    >
    >
    >
    >         b) Also, if multiple devices do connect to it, then won't we have
    > events
    >
    >         arriving out of order from them? Does the ordering matter amongst
    > events
    >
    >         that are being pushed?
    >
    >
    >
    >         Regarding the infinite loop question, the reason JDBC connector
    > uses that
    >
    >         is that it creates tasks for a given table and fires queries to
    > find
    >
    >         deltas. So, if the polling frequency is 2 seconds, and it last ran
    > on
    >
    >         12.00.00 then it would run at 12.00.02 to figure out what changed
    > in that
    >
    >         time frame. So, the way PlcReaders read() runs, would it keep
    > returning
    >
    >         newer data?
    >
    >
    >
    >         We can skip over the rest of the parts, but looking at parts a and
    > b above,
    >
    >         would it make sense to have something like a kafka-connect
    > framework for
    >
    >         pushing data to Kafka? Also, from the github link, the drivers are
    > to be
    >
    >         supported in 3 languages as well. How would that play out?
    >
    >
    >
    >         Again- apologies if the questions seem stupid.
    >
    >
    >
    >         Thanks!
    >
    >         Sagar.
    >
    >
    >
    >         On Wed, Aug 22, 2018 at 10:39 PM Christofer Dutz <
    > christofer.d...@c-ware.de>
    >
    >         wrote:
    >
    >
    >
    >         > Hi Sagar,
    >
    >         >
    >
    >         > great that you managed to have a look ... I'll try to answer 
your
    >
    >         > questions.
    >
    >         > (I like to answer them postfix as whenever emails are sort of
    > answered
    >
    >         > in-line, they are extremely hard to read and follow on mobile
    > email clients
    >
    >         > __ )
    >
    >         >
    >
    >         > First of all I created the original plugin via the archetype for
    >
    >         > kafka-connect plugins. The next thing I did, was to have a look
    > at the code
    >
    >         > of the JDBC Kafka Connect plugin (as you might have guessed) as
    > I thought
    >
    >         > that it would have similar structure as we do. Unfortunately I
    > think the
    >
    >         > JDBC plugin is far more complex than the plc4x connector will
    > have to be. I
    >
    >         > sort of picked some of the things I liked with the archetype and
    > some I
    >
    >         > liked with the jdbc ... if there was a third, even cooler option
    > ... I will
    >
    >         > definitely have missed that. So if you think there is a thing
    > worth
    >
    >         > changing ... you can change anything you like.
    >
    >         >
    >
    >         > 1)
    >
    >         > The code of the jdbc plugin showed such a while(true) loop,
    > however I
    >
    >         > think this was because the jdbc query could return a lot of rows
    > and hereby
    >
    >         > Kafka events. In our case we have one request and get one
    > response. The
    >
    >         > code in my example directly calls "get()" on the request and is
    > hereby
    >
    >         > blocking. I don't know if this is good, but from reading the
    > jdbc example,
    >
    >         > this should be blocking too ...
    >
    >         > So the PlcReaders read() method returns a completable future ...
    > this
    >
    >         > could be completed asynchronously and the callback could fire
    > the kafka
    >
    >         > events, but I didn't know if this was ok with kafka. If it is
    > possible,
    >
    >         > please have a look at this example code:
    >
    >         >
    > 
https://github.com/apache/incubator-plc4x/blob/master/plc4j/protocols/s7/src/test/java/org/apache/plc4x/java/s7/S7PlcReaderSample.java
    >
    >         > It demonstrates with comments the different usage types.
    >
    >         >
    >
    >         > While at it ... is there also an option for a Kafka connector
    > that is able
    >
    >         > to push data? So if an incoming event arrives, this is
    > automatically pushed
    >
    >         > without a fixed polling interval?
    >
    >         >
    >
    >         > 2)
    >
    >         > I have absolutely no idea as I am not quite familiar with the
    > concepts
    >
    >         > inside kafka. What I do know is that probably the partition-key
    > should be
    >
    >         > based upon the connection url. The problem is, that with kafka I
    > could have
    >
    >         > 1000 nodes connecting to one PLC. While Kafka wouldn't have
    > problems with
    >
    >         > that, the PLCs have very limited resources. So as far as I
    > decoded the
    >
    >         > responses of my Siemens S7 1200 it can handle up to 20
    > connections (Usually
    >
    >         > a control-system already consuming 2-3 of them) ... I think it
    > would be
    >
    >         > ideal, if on one Kafka node (or partition) there would be one
    > PlcConnection
    >
    >         > ... this connection should then be shared among all requests to
    > a PLC with
    >
    >         > a shared connection url (I hope I'm not writing nonsense). So if
    > a
    >
    >         > workerTask is responsible for managing all request to one
    > partition, then
    >
    >         > I'd say it should be 1 ... otherwise the number could be bigger.
    >
    >         >
    >
    >         > If it makes things easier, I'm absolutely fine with using those
    >
    >         > ConnectorUtils
    >
    >         >
    >
    >         > Regarding the connector offsets ... are you referring to that
    > counter
    >
    >         > Kafka uses to let the clients know the sequence of events and
    > which they
    >
    >         > use to sort of say: "Hi, I have number 237367 of topic 'ABC',
    > plese
    >
    >         > continue" ... is that what you are referring to? If it is, well
    > ... I have
    >
    >         > to admit ... I don't know ... ok ... if it isn't then probably
    > also ;-)
    >
    >         > How do other plugins do this?
    >
    >         >
    >
    >         > 3)
    >
    >         > Well I guess both options would be cool ... JSON is definitely
    > simpler,
    >
    >         > but for high volume transports the binary counterparts
    > definitely are worth
    >
    >         > consideration. Currently PLC4X tries to deliver what you
    > request, but
    >
    >         > that's actually something we're currently discussing on
    > refactoring. But
    >
    >         > for the moment - as shown in the example code I referenced a few
    > lines
    >
    >         > above - you do a TypedRequest and for example ask for an
    > Integer, then you
    >
    >         > will receive an array (probably of size 1) of Integers.
    >
    >         >
    >
    >         > 4)
    >
    >         > Well I agree ... well at least I can't even say that I make a
    > secret about
    >
    >         > where I stole things from ;-)
    >
    >         >
    >
    >         > If I can be of any assistance ... just ask.
    >
    >         >
    >
    >         > Thanks for taking the time.
    >
    >         >
    >
    >         > Chris
    >
    >         >
    >
    >         >
    >
    >         >
    >
    >         > Am 22.08.18, 17:55 schrieb "Sagar" <sagarmeansoc...@gmail.com>:
    >
    >         >
    >
    >         >     Hi All,
    >
    >         >
    >
    >         >     I was going through the K-Connect stubs created by Chris in
    > the kafka
    >
    >         >     feature branch.
    >
    >         >
    >
    >         >     Some of the findings I found are here(let me know if they
    > are valid or
    >
    >         > not):
    >
    >         >
    >
    >         >     1)
    >
    >         >
    >
    >         >
    > 
https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L98
    >
    >         >
    >
    >         >     Should this block of code be within an infinite loop like
    > while(true)?
    >
    >         > I am
    >
    >         >     not exactly sure of the semantics of the PlcReader hence
    > asking this
    >
    >         >     question.
    >
    >         >
    >
    >         >     2) Another question is, what are the maxTasks that we
    > envision here?
    >
    >         >
    >
    >         >
    > 
https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/Plc4xSourceConnector.java#L46
    >
    >         >
    >
    >         >     Also, as part of documentation, there's a utility called
    > ConnectorUtils
    >
    >         >     which typically should be used to create the configs(not a
    > hard and
    >
    >         > fast
    >
    >         >     rule though):
    >
    >         >
    >
    >         >
    >
    >         >
    > 
https://docs.confluent.io/current/connect/javadocs/index.html?org/apache/kafka/connect/util/ConnectorUtils.html
    >
    >         >
    >
    >         >     If we go that route, then we also need to specify how the
    > offsets
    >
    >         > would be
    >
    >         >     stored in the offsets topic(by using the task name). So, if
    > it can be
    >
    >         >     figured out as to how would the connectors be setup, then
    > that'll be
    >
    >         >     helpful.
    >
    >         >
    >
    >         >     3) While building the SourceRecord ->
    >
    >         >
    >
    >         >
    >
    >         >
    > 
https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L109
    >
    >         >
    >
    >         >     , we would also need some DataConverter layer to have them
    > mapped to
    >
    >         > the
    >
    >         >     connect types. Also, which message types would be supported?
    > Json or
    >
    >         > binary
    >
    >         >     protocols like Avro/protobuf etc or some other protocols?
    > Those things
    >
    >         >     might also need to be factored in.
    >
    >         >
    >
    >         >     4) Lastly, need to remove the JdbcSourceTask from the catch
    > block here
    >
    >         > :) ->
    >
    >         >
    >
    >         >
    >
    >         >
    > 
https://github.com/apache/incubator-plc4x/blob/feature/apache-kafka/integrations/apache-kafka/src/main/java/org/apache/plc4x/kafka/source/Plc4xSourceTask.java#L67
    >
    >         >
    >
    >         >     Thanks!
    >
    >         >     Sagar.
    >
    >         >
    >
    >         >
    >
    >         >
    >
    >
    >
    >
    >
    >
    >
    >
    >

Re: Kafka Connect Integration

Reply via email to