Re: Nifi integration record oriented processor for reading

Otto Fowler Fri, 30 Apr 2021 05:50:20 -0700
Sounds like a good plan!

> On Apr 30, 2021, at 05:26, Iñigo Angulo <[email protected]> wrote:
> 
> Hi Otto, Chris,
> 
> we have been reviewing the comments on the pull request, and started to think 
> about the approach of extracting values from response directly. During next 
> week, we will work on this and make some updates to the code with the 
> suggestions you made. We keep you informed
> 
> thank you,
> 
> iñigo
> 
> ----------------------------------------- 
> Iñigo Angulo 
> 
> ZYLK.net :: consultoría.openSource 
> telf.: 747412337 
> Ribera de Axpe, 11 
> Edificio A, modulo 201-203 
> 48950 Erandio (Bizkaia) 
> -----------------------------------------
> 
> ----- Mensaje original -----
> De: "Christofer Dutz" <[email protected]>
> Para: "dev" <[email protected]>
> Enviados: Viernes, 23 de Abril 2021 18:10:35
> Asunto: AW: AW: Nifi integration record oriented processor for reading
> 
> Hi Inigo,
> 
> especially if you have a look at the KNX protocol. This doesn't define the 
> usual IEC datatypes we tried to use for all normal PLC drivers.
> Here we have hundreds of datatypes that don't match any other protocol. I 
> think the PLCValue approach would be the simplest.
> The one thing you have to keep in mind, is that you should check a PLCValue, 
> if it's a list (Array type) or a Structure (Which sort of relates to komplex 
> types with a more sophisticated structure).
> 
> Chris
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Iñigo Angulo <[email protected]> 
> Gesendet: Freitag, 23. April 2021 15:34
> An: dev <[email protected]>
> Betreff: Re: AW: Nifi integration record oriented processor for reading
> 
> Hi Otto, Chris,
> 
> Yes, I think the approach you propose will be best. By now, we are generating 
> the schema ourselves. We have a record writer who is in charge of reading PLC 
> values. Schema is defined previously to reading the values. We build this 
> schema getting the protocol from the 'connectionString' (S7, Modbus) and the 
> specified variable type from the 'PLC resource address String' containing the 
> list of variable to read. From this we deduce the expected Avro datatype when 
> reading, for instance, a word in S7 or a coil in Modbus. 
> 
> However, as you mentioned, the other approach will be much clearer and 
> useful. Ideally, getting the actual datatype from PLCValue when getting the 
> response. Regarding this, we tried to keep the previously described 'mapping' 
> separated from the rest of the code, so that hopefully it can be easily 
> replaced..
> 
> We have done the pull request, hope you can take a look at the code and let 
> us know what you think. We will fill the ICLA document too.
> 
> thank you
> iñigo
> 
> 
> 
> -----------------------------------------
> Iñigo Angulo 
> 
> ZYLK.net :: consultoría.openSource
> telf.: 747412337
> Ribera de Axpe, 11
> Edificio A, modulo 201-203
> 48950 Erandio (Bizkaia)
> -----------------------------------------
> 
> ----- Mensaje original -----
> De: "Christofer Dutz" <[email protected]>
> Para: "dev" <[email protected]>
> Enviados: Jueves, 22 de Abril 2021 17:12:49
> Asunto: AW: Nifi integration record oriented processor for reading
> 
> Hi all,
> 
> Well, you get PlcValues from the response that wrap the different datatypes. 
> So generally you shouldn't care about the detail type.
> 
> However, you can call getObject() which returns the core value the plc-value 
> has ... so if it's the PLCValue for a Short, getObject will return a short 
> value.
> 
> Does that help?
> 
> Chris
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Otto Fowler <[email protected]>
> Gesendet: Donnerstag, 22. April 2021 15:21
> An: [email protected]
> Betreff: Re: Nifi integration record oriented processor for reading
> 
> So, you are generating the schema yourself, such that downstream if they 
> inherit schema they will just get what you generate?  And you are trying to 
> do that by the connection string?  If so, a different way I could imagine 
> doing would be to get the ’types’ of the data from the responses themselves.  
> This would be more generic.  The flow I could imagine ( in OnTrigger ):
> 
> DO READ
> IF NOT HAS SCHEMA
>       GENERATE SCHEMA FROM RESPONSE AND CACHE IN ATOMIC WRITE WITH SCHEMA
> 
> Maybe Chris can speak to how to get the types from the responses.
> 
> 
>> On Apr 22, 2021, at 05:48, Iñigo Angulo <[email protected]> wrote:
>> 
>> Hi Chris, Otto,
>> 
>> Regarding the Record Processor concept, i will try to give an overview. In 
>> Nifi, information packages are called Flowfiles, and these are the actual 
>> units of information that are exchanged between Procesors, all along the 
>> dataflow. Flowfiles have two sections where we can manage data: Attributes 
>> and Content. In the "traditional" Nifi approach, you work with both 
>> sections, extracting information from the Content to the Attributes and 
>> viceversa to perform operations. This approach could have one limitation 
>> when you are processing batch data (lines from a CSV file for instance), 
>> where you need to split each of the lines into different Flowfiles. Thus, a 
>> 1000 line CSV file leads to 1000 Flowfiles to process, each of them 
>> containing a single record.
>> 
>> On later versions of the product, they introduced the Record oriented 
>> approach. This approach allows you to manage multiple records on a single 
>> Flowfile's Content, as long as these records have all the same schema. This 
>> means that the operations defined by the Processors are applied 
>> simultaneously to the whole content at once. Following with the previous 
>> example, a 1000 line CSV file could produce a single Flowfile with a content 
>> of 1000 records. 
>> 
>> To do this, Nifi uses Avro, to serialize the Flowfile's Content. Then, the 
>> Record Oriented Processors use Writers and Readers to present this 
>> information in the desired format (such as Avro, Json, CSV, etc). Basically, 
>> with the record oriented approach, Nifi introduced multiple new Processors, 
>> and also included the Record version of many of the "old" ones. Using this 
>> Record approach, Nifi perfomance enhances notably, specially when working 
>> with large structured information.
>> 
>> The work we did was creating a Record Oriented Processor, based on the 
>> previously existing one Plc4xSourceProcessor, to read values from the 
>> devices. We have also included a README on the 
>> plc4x/plc4j/integrations/apache-nifi module explaining the Processor 
>> configuration and giving an example. Moreover, we put a nifi template with a 
>> dataflow for testing these processors, if useful.
>> 
>> Otto, regarding the idea behind this new Processor, that is right. We added 
>> the writer capability to the existing PLC4XSourceProcessor, so that it 
>> formats the output to the desired configuration in a record manner. At the 
>> actual implementation, we did this "protocol adaptation" from the sintax of 
>> the particular properties on Processor's configuration. For example, from 
>> connection string 's7://IP:PORT', we extract the S7 idenifier and map 
>> variable datatypes to the actual Avro datatypes for build the record output 
>> schema. However, here we dont have vast experience with PLC4X libraries, and 
>> for sure there will be better ways for doing this.
>> Also about the Base Processor, we were thinking that maybe the best approach 
>> could be to have this Base Processor, and then implement readers for 
>> particular protocols as Controller Services. But here also, it could be very 
>> helpful to have your opinion.
>> 
>> Lastly, regarding the pull request, do you have any documentation on 
>> how to do this? I mean, maybe you have defined some naming 
>> conventions, or expected structure to facilitate later work. At the 
>> present, we have a fork of the project where we have been working on 
>> these Nifi changes. We updated the content of our fork (fetch/merge
>> upstream) about 2 weeks ago, and commited our changes to the 'develop' 
>> branch. Do we better create a new branch with our commits? how do you 
>> prefer to receive the code? (we are not very experts on git, just in 
>> case we could cause some problems...)
>> 
>> thank you in advance
>> 
>> iñigo
>> 
>> 
>> -----------------------------------------
>> Iñigo Angulo
>> 
>> ZYLK.net :: consultoría.openSource
>> telf.: 747412337
>> Ribera de Axpe, 11
>> Edificio A, modulo 201-203
>> 48950 Erandio (Bizkaia)
>> -----------------------------------------
>> 
>> ----- Mensaje original -----
>> De: "Christofer Dutz" <[email protected]>
>> Para: "dev" <[email protected]>
>> Enviados: Miércoles, 21 de Abril 2021 20:01:15
>> Asunto: AW: Nifi integration record oriented processor for reading
>> 
>> The more I think of it,
>> 
>> Perhaps we should also think of potentially providing some information on 
>> supported configuration options.
>> Wouldn't it be cool if the driver could say: "I generally have these options 
>> and they have these datatypes and mean this"
>> Additionally, the transports could too say: "I generally have these options 
>> and they have these datatypes and mean this"
>> 
>> I would be our StreamPipes friends would love something like that? Right?
>> 
>> Chris
>> 
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Otto Fowler <[email protected]>
>> Gesendet: Mittwoch, 21. April 2021 17:46
>> An: [email protected]
>> Betreff: Re: Nifi integration record oriented processor for reading
>> 
>> Hi Inigo,
>> 
>> I’m a committer on Apache Nifi as well as PLC4X, I would be happy to review 
>> your processor.
>> If I understand what you are saying correctly, you have a single processor 
>> which supports record writing output?
>> 
>> plc4x -> records
>> 
>> And that you have, for configuration purposes for that processor created 
>> support on a per protocol basis for configuration and validation?
>> 
>> If there is per protocol configuration / validation etc, it may be better to 
>> have a base processor, and derived processors per protocol to handle those 
>> differences.
>> 
>> I look forward to seeing the code.
>> 
>> 
>>> On Apr 21, 2021, at 04:05, Iñigo Angulo <[email protected]> wrote:
>>> 
>>> Hi all,
>>> 
>>> I am writing as we have been working on the Apache Nifi integration part of 
>>> the project. We have created a Record oriented processor for reading PLC 
>>> data. It is based on the previous existing SourceProcessor, but works with 
>>> records, using a Nifi Writer (such as Avro, Json, and so on) to write data 
>>> on flowfiles content. 
>>> 
>>> We updated the code on our fork with the actual PLC4X git repo about 2 
>>> weeks ago, and tested it reading values with S7 from a S7-1200 CPU from 
>>> Nifi. Also, one of our customers has recently started to use it for 
>>> validation. 
>>> 
>>> Currently, it works with S7 and Modbus over TCP. This is because we had to 
>>> write some classes to map connectionString and variableList properties 
>>> (sintax) of the processor to the actual protocol, to be able to build then 
>>> avro schema for output flowfile, taking into account variable datatypes, 
>>> etc. We only did this for S7 and Modbus. I am sure that there is a better 
>>> way to do this, so at this point you maybe could take a look to find the 
>>> best solution and avoid needing to do this mapping. 
>>> 
>>> If you find this useful, we could do a pull request to the main PLC4x repo. 
>>> Let us know what you think. 
>>> 
>>> best regards,
>>> iñigo
>>> 
>>> -----------------------------------------
>>> Iñigo Angulo
>>> 
>>> ZYLK.net :: consultoría.openSource
>>> telf.: 747412337
>>> Ribera de Axpe, 11
>>> Edificio A, modulo 201-203
>>> 48950 Erandio (Bizkaia)
>>> -----------------------------------------
Re: Nifi integration record oriented processor for reading

Reply via email to